Personal tools
You are here: Home Tutorials Robert F. Murphy, Carnegie Mellon University, USA

Robert F. Murphy, Carnegie Mellon University, USA

Document Actions
Tutorial 1: Automated Proteome-Wide Determination and Modeling of Subcellular Location

A short CV


Robert F. Murphy is the Ray and Stephanie Lane Professor of Computational Biology and director of the Ray and Stephanie Lane Center for Computational Biology at Carnegie Mellon University. He also is Professor of Biological Sciences, Biomedical Engineering, and Machine Learning, and directs (with Ivet Bahar) the joint CMU-Pitt Ph.D. Program in Computational Biology. Dr. Murphy's career has centered on combining fluorescence-based cell measurement methods with quantitative and computational methods. His group at Carnegie Mellon pioneered the application of machine learning methods to high-resolution fluorescene microscope images in the mid 1990's, leading to the development of the first systems for automatically recognizing all major organelle patterns in 2D and 3D images. He is President of the International Society for the Advancement of Cytometry, and is currently visiting Germany through an Alexander von Humboldt Stiftung Forschungspreis.

Tutorial Abstract

Automated Proteome-Wide Determination and Modeling of Subcellular Location

Since little is known about the subcellular distribution of many of the proteins (or putative proteins) identified by genome sequence analysis, and since subcellular location is critical to protein function, approaches are neede to determine the subcellular location of all proteins in all cell types. The focus of this introductory tutorial will be on the emerging computational methods for assigning subcellular locations to proteins on a proteome-wide scale. These combine large-scale protein tagging methods, automated fluorescene microscopy, and machine learning methods. The ultimate goal of research in this area is the ability to combine a number of approaches to produce robust systems for annotating genomes with respect to location and incorporating the information into system models. The current generation of location prediction systems is limited by the training data available to them. This limitation comes largely from the fact that datailed location information is not available for most proteins, and the current practice of using words (e.g., GO terms) to describe location does not provide the necessary complexity and specification to represent the hundreds of distinct patterns that proteins may display in eukaryotic cells. Therefore, large scale projects have recently been initiated to collect high resolution images of the subcellular distributions of essentially all proteins expressed in a given cell type. In parallel, systems for automated analysis and interpretation of the images resulting from these projects have been developed and shown to be significantly more sensitive than visual inspection.
The location assignments resulting from these methods can be used as input to a new generation of systems for predicting subcellular location at high resolution. They also can be used as input to new tools under development to build generative models of location. Such models can be learned directly from images to provide an accurate representation of the variation in pattern observed within a collection of cells or tissues. The models can be represented in a form compatible with systems biology markup languages and thus provide a bridge between automated microscopy and systems biology so that location information can be accurately incorporated into cell simulations.

Topics covered will include:

Introduction to methods for protein tagging for fluorescence  microscopy
  • cDNA tagging
  • Directed genomic tagging
  • Random genomic tagging
  • Basics of fluorescence microscop
Automated classification of subcellular patterns
  • Feature-based vs. model-based approaches
  • Feature reduction methods
  • Classification methods
Clustering proteins by high-resolution location pattern
  • Complexity of the problem
  • Distance metrics and tree construction methods
  • Tree comparison and objective functions
Models of subcellular location
  • Unmixing of complex location patterns into combinations of fundamental patterns
  • Graphical models for patterns in multi-cell images
  • Portable generative models to capture patterns for subsequent modeling
Future challenges


Links:

Relevant publications http://murphylab.web.cmu.edu/publications, especially

E. Glory and R.F. Murphy (2007). Automated Subcellular Location Determination and High Throughput Microscopy. Developmental Cell 12:7-16. http://murphylab.web.cmu.edu/publications/132-glory2008.pdf

T. Zhao and R. F. Murphy (2007). Automated learning of generative models for subcellular location: Building blocks for systems biology. Cytometry 71A:978-990. http://www3.interscience.wiley.com/cgi-bin/fulltext/116835310/PDFSTART

S.-C. Chen, G. J. Gordon, and R.F. Murphy (2008). Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns. J. Machine Learning Res. 9:.651-682. http://murphylab.web.cmu.edu/publications/145-chen2008.pdf

Relevant software http://murphylab.web.cmu.edu/software, especially

Code for building generative models for subcellular patterns http://murphylab.web.cmu.edu/software/2007_Cytometry_GenModel/

Code for automated analysis of images of yeast localization patterns http://murphylab.web.cmu.edu/software/2007_Bioinformatics_Yeast/

Publicly available image collections

http://murphylab.web.cmu.edu/data/, especially

3D images of 10 subcellular patterns in HeLa cells http://murphylab.web.cmu.edu/data/3Dhela_images.html

3D images of 90 proteins in 3T3 cells http://murphylab.web.cmu.edu/data/3T3_images.html

http://yeastgfp.ucsf.edu/ (Yeast GFP-tagged protein image library)

http://www.proteinatlas.org/ (Human Protein Atlas)

Protein Subcellular Location Image Database http://pslid.cbi.cmu.edu