Project P20:
Semi-Supervised Microscopy Image Classification

Project Goal

Academic microscopists (as opposed to their industrial counterparts) are often confronted with the problem of dealing with "medium throughput" data. There is too much data to easily analyze by hand, but too little data to justify spending time developing an ad-hoc data analysis system.

Moreover, in many cases, no similar data will ever be created again, due to the one-shot nature of many academic biology experiments. (That is, if an experiment works, it should not need to be repeated more than is necessary for statistical validity.) Thus, only industrial practicioners and academics in true high-throughput situations have the motivation to develop automated data analysis systems.

This project will produce (the skeleton of) an interactive system geared at meeting the needs of academic biologists. Specifically, this tool will automate the first step in analyzing biological image data: classifying or categorizing the images (or segmented regions thereof).

Provided an initial labeling, the tool will produce an initial classification of the images, as well as reliabilities for each classification. Optionally, the tool will ask users to manually classify the "hard cases" (as determined by the reliability estimates) and then retrain the classifier until it reaches a level of precision that the user is satisfied with.

Project Scope

This project involves two main components: developing an interactive machine learner (as outlined above) and decomposing microscopy images into relevant features over which the learner can operate.

The interactive learner will be based on recent work on transductive learning and cotraining, in which classifiers are trained on a labeled training set *and* an unlabeled working set consisting of the rest of the data to be classified. Various authors provide expectation-maximization algorithms for training such learners and also "active learning" algorithms where a classifier can specifically request that individual data points be labeled by the user.

The specific classifier be a support vector machine due to their robustness against overfitting in data-limited (or feature-rich) environments, as well as the fact that much of the transduction literature is specific to SVMs.

The feature decomposition will generally involve producing as many features as possible, including but not limited to PCA/ICA components, Harlick textures and Zernike polynomial coefficients (all shown in the literature to work well with microscopy images.) Additional features discussed in class may be added as well -- the more the better, as far as the SVM is concerned. Specifically, rotationally-invariant features would be the best, given that cells can appear in any orientation. Perhaps even SIFT features, if time permits.

Tasks

  • Task 1: Acquire data for classification. Having several smaller sets of data rather than a single larger set would be better. I already have some data to start working with, and my coworkers have promised more. Completed by Feb. 5.
  • Task 2: Pre-process data. It will probably be best to work on pre-segmented data, since the segmentation issue is not specifically germaine to this task. I will manually or automatically perform simple foreground/background segmentation on the image data sets and extract foreground elements (e.g. cells) for training on. Completed by Feb. 10.
  • Task 3: Decompose images into features. (See above). Completed by Feb. 25.
  • Task 4: Write a simple transductive SVM, a la Joachims's implementation (essentially a simple wrapper around a standard SVM, for which I have efficient C code). Can proceed independently from tasks 1-3. Completed by Feb. 25.
  • Task 5: Enhance the SVM with reliability estimation and interactive retraining. (Also following specific implementations provided in the literature.) Completed by Mar. 5.
  • Task 6: Test the SVM against simpler non-transductive and non-interactive versions. Completed by Mar. 10.

Project Status

Zachary Pincus

Point of Contact

Sebastian Thrun

Midterm Report

submitted

Final Report

submitted






















































































Course overview
Announcements
Time and location
Course materials
Schedule
Instructors
Assignments
Projects
Policies
Links