Please read below for a description of the current project being run with Xgrid... This page will grow over time, in response to your questions, that can be sent to charles•parnot@gmail•com.

For visual support, you can also check the slides adapted from the WWDC talk (June 2004) and the slides from the bcats talk (October 16, 2004). These are Quicktime movies with clickable transitions (only tested in Safari).

Finally, you may find more info about the project itself on this interview published in macnews.de .

Introduction

The G-protein coupled receptors

One of the key aspect of human physiology is the ability of the different organs and cells to communicate with each other and with the outside world. This process involves receptors that recognize messengers like hormones or stimuli like odors. With about 700 genes in the human genome, the G protein-coupled receptors (GPCRs) form the largest family of these biological sensors. GPCRs are proteins. Each GPCR is found in a different part of the body, and each recognizes a different molecule. All together, they conduct the majority of responses to hormones and neurotransmitters, and mediate the senses of sight, smell and taste.

Even though each individual GPCR recognizes a different ligand, they all share some common properties. They all have the same three-dimensional shape characterized by seven highly hydrophobic segments. These helical segments anchor the receptor in the lipid membrane that forms the cell boundary. All GPCRs transmit a signal from outside the cell to inside the cell, through common intracellular effectors, in particular G-proteins, and common molecular messenger, like calcium ions and cyclic AMP. Importantly, drugs specific for one GPCR can be used to specifically block or enhance its signalling. Because GPCRs are so ubiquitous, about half of today's prescription drugs act on GPCR targets.

Cardiac regulation by the beta adrenergic receptors

For example, the brain regulates cardiac function through sympathetic nerves which deliver adrenaline to cardiac myocytes. The myocytes are cardiac cells that forms the heart muscle. The adrenaline binds to and activates the beta1 and the beta2 adrenergic receptors, two GPCRs found within the membrane of the myocytes. This binding event triggers the formation of cyclic AMP in the cell, which results in heart rate acceleration. By using drugs known as 'beta blockers', the adrenaline response can be specifically blocked at the level of the beta adrenergic receptors, without affecting other body functions. Beta blockers such as propranolol, metoprolol and carvidolol represent the major class of prescription drugs used to treat hypertension, coronary artery disease and heart failure.

The structure of the GPCRs

The GPCRs thus form an important family of biological sensors involved in most physiological functions and targeted by many current prescription drugs. However, little is known about the functioning of GPCRs at the molecular level. It is still unclear how changes in their structure transmit a signal from the ligand binding oustide the cell to the effector inside the cell.

To address this problem we must know the three dimensional structure of GPCRs and determine how the structure changes upon ligand binding. For proteins the size of GPCRs, a high-resolution map can only be obtained by x-ray diffraction of crystals. This technique is very challenging in the case of GPCRs. As a result, only rhodpsin structure has been resolved to date. Because of the high degree of homology between rhodopsin and other GPCRs, it is possible to use molecular modeling techniques to generate three-dimensional models.

Crystallography and homology modeling mostly provide a static view of the receptor but do not show the dynamics of the activation. Other experimental approaches have to be used, such as mutagenesis and biophysical techniques. These approaches however have much lower resolution or yield indirect structural information. They are best interpretated in light of a static high resolution structure obtained from a crystal.

In conclusion, while some progress has been made towards a static 3D model, little is known about the structural changes during the activation process. Most people in the field agree that GPCRs can exist in different conformations or 'states', including at least one 'inactive' state and one 'active' state. The receptor is constantly switching from one state to another because of thermal fluctuations, and some of these states are energetically favored. Ligands are then revealing new favorable states, or changing the likelihood of existing states. However, the number of states, the structural determinants of these states, the kinetics of state transitions and the effect of ligands are still very poorly understood. Determining such a model of GPCR activation would thus have a major impact on drug development.

Research in the Kobilka lab

Overview

Research in the Kobilka lab is directed at understanding the structural basis for the functional properties of GPCRs. More specifically, the lab is working on the adrenergic receptors, which respond to the neurotransmitters adrenaline and noradrenaline, and are a key component of cardiovascular function. In medicine, the beta adrenergic receptors are the target of 'beta blockers', the major class of medications used in hypertension, coronary artery disease and heart failure, as well as 'beta agonists", medications used to treat asthma. Thus, understanding the functioning of the adrenergic receptors will not only provide important information about all GPCRs. It will also allow the developement of new drugs for the treatment of these prevalent diseases.

The lab is working on the structural, as well as the physiological aspects of adrenergic receptor function. Physiological studies include the development of knockout mice for five adrenergic receptor subtypes (the corresponding genes have been inactivated or 'knocked-out'). These mice have helped to determine the specific function of adrenergic receptor subtypes in vivo. For instance, other projects in the Kobilka lab involves the isolation of cardiac cells from neonatal mice to distinguish the role of the beta1 and the beta2 adrenergic receptors in regulating myocyte function.

The in vivo studies represent about 30% of the work done in the lab. The other 70% is aimed at determining the structure of GPCRs and understanding how ligands can affect this structure. It is rare to find a lab that performs both structural and in vivo studies. This unique combination of technologies keeps structural studies focused on relevant biological questions.

Structural studies

To study GPCR structure, the lab has focused most of its efforts so far on the beta2 adrenergic receptor. This receptor has been extensively studied by many groups in the world and has an important physiological role (see above). It is thus both a useful model of the whole GPCR family and an important actor of human health.

The studies performed in the Kobilka lab are aimed at determining:

In summary, we want to understand how the structure of the receptor is modified by neurotransmitters or drugs in such a way that a signal gets transmitted to the signalling machinery inside the cell. This kind of information will be invaluable for drug development. For instance, drugs targeting specifically one state should have useful applications for the treatment of heart diseases.

Fluorescent studies

To answer these questions, different experimental methods have been developed in the lab. The main approach is to introduce in the receptor some biophysical probes that allow to monitor the conformational changes of the protein. These probes are fluorescent molecules, and to attach them to the receptor, we use cysteines. Cysteine is one of the 20 different amino acids that make up proteins in vivo. It has a sulfur group with very characteristic chemical properties that make it easier to target for labeling.

Cysteines have been introduced at different locations in the sequence of the receptor. After purification, the targeted cysteine can be specifically reacted with a fluorescent probe, which allows to monitor structural changes in precise locations in the receptor. Because the chemistry is always the same, this technique can be applied to the large repertoire of commercialy available fluorophores, with various spectral properties useful for structural studies:

In addition to these techniques that apply to a single probe in a single location, we can also combine two fluorophores on two different positions. This allows the measurement of relative distances between two regions of the receptor by fluorescence resonance energy transfer.

The project

The models

As explained above, one of the goal of the research done in the lab is to determine the number of conformational states and the kinetics of transitions between states for the beta2 adrenergic receptor. For instance, based on the data from the fluorescent studies, we have intially proposed a two-steps, three-states, model (Swaminath et al). More formally, such a biochemical model would be written:

R + A <--> R'A <--> R*A <--> R + A

This means the receptor can adopt 3 'states' or 3 'conformations', namely R, R' and R*. The R state is always unliganded, while the R' and R* are bound to the ligand A. The model assumes two steps in the activation process. In the first step, the receptor binds the ligand A and changes its conformation from R to R'. In a second step, the ligand-bound receptor R'A changes its conformation to R*A. In addition, the ligand can come off the R* conformation of the receptor, which results in the receptor to go back to the R state. Finally, all these transitions are reversible, so the arrows can be followed both ways.

What is the link between such a biochemical model and the biological role of the receptor? The idea is that different conformations will have different effects in the cell. For example, when no ligand is present, the receptor adopts the R conformation, which does not do much. But when a ligand binds, and the receptor goes to the R' conformation, it activates a first effector E inside the cell. When the receptor then reaches the R* conformation, yet another effector F is activated. Interestingly, one could imagine that ligand A triggers both the R' and R* states and activates both E and F, but that another ligand B would only trigger R' and would thus have quite different cellular effects. Designing drugs that are specific for one state, like this hypothetical ligand B, would have very interesting applications in human health. Understanding the behaviour of a receptor, and describing such biochemical models is the first step towards this goal.

The reality of such models have been demonstrated in the lab , particularly in Swaminath et al. Of course, one can imagine even more ore elaborate models, with 4, 5 or even 32 states. In fact, we expect the 'real' receptor to behave in a quite complex way. Following Acham's razor principle, our goal is to find the simplest model that can explain the data (it is unlikely that a 32 states model will be used!). Our hope is that this model will be complicated enough that it will give interesting insights in the pharmacology of the receptor.

The data

The data that we use to test different biochemical models is described in Swaminath et al. The purified receptor is labeled on its endogenous cysteine at position 265 with tetramethylrhodamine-maleimide. This generates a fluorescent receptor, that can be analyzed in a spectrofluorimeter, exciting at 550 nm, and detecting the fluorescence at 570 nm. The data is generated by monitoring the fluroescence response to isoproterenol, a potent drug that fully activates the receptor in vivo. Different concentrations are used, ranging from 100 nM to 1 mM.

The computational methods

To simulate and fit the models, a custom program was written for the mac. It has a graphical front-end based on the Cocoa libraries that ship with Mac OS X. In addition, a command-line version can be used to run batch jobs and is used with Xgrid.

A biochemical model can be written as a set of ordinary differential equations. For example, in the 3 states model described above, there are 4 'species' with concentrations described by the following equations:

dA/dt   = - k1 . A . R  +  k2 . R'A  +  k5 . R*A  -  k6 . A . R
dR/dt   = - k1 . A . R  +  k2 . R'A  +  k5 . R*A  -  k6 . A . R
dR'A/dt = + k1 . A . R  -  k2 . R'A  -  k3 . R'A  +  k4 . R*A
dR*A/dt = + k3 . R'A    -  k4 . R*A  -  k5 . R*A  +  k6 . A . R

The parameters k1, k2,..., k6 are the kinetic rates of the different reactions (back and forth). To simulate the evolution of species concentrations, we thus integrate this system using a fifth-order Runge-Kutta method with adaptative stepsize. To fully simulate the data, one also have to choose values for the 'brightnesses' of the different species, that indicate the levels of fluorescence. The ligand A is not fluorescent, only the receptor. Depending on the conformation (R, R' or R*), the fluorescent level vary.

Once the kinetic rates and the brightnesses are defined, the computer can simulate a model and plot the predicted fluorescent changes in response to the different concentrations of ligand. The goal is then to find the values of the parameters (rates and brightnesses) that will make the model look exactly like the data. One could change the values manually and try to come up with a reasonable overlap. The automatic way is to implement an algorithm for fitting. We are using the Levenberg-Marquardt method for this (as an alternative, a simulated annealing method will also be used in the future). All is needed then is to give the computer some 'guess' values as a starting point. The program will then try to minimize the distance between the data and the model (by minimizing the chi-square), taking successive steps based on the derivatives with respect to the parameters (which are implemented as another set of differential equations).

Why use Xgrid

As explained above, the computer has to start with some 'guess' values for the parameters. Choosing an adequate starting point for the fit is critical for success. If this starting point is not close enough to the best fit, the algorithm is very likely to stop in a 'local minimum', thinking that the job is done and that the best fit has been found. It can also get unstable and reach off-limit values with no biological or physical meaning.

There are two problems with the biochemical models we are trying to fit. First, they have quite a lot of parameters, which increases the complexity of the parameter space considerably. Second, it is usually difficult to predict what the effect of one parameter will be (as opposed to well-known functions like exponentials, where the value of a parameter has a predictable simple effect). In fact, some of the parameters may not change the output much, while some may have dramatic effects, and these behaviours will be dependent on the values of the other parameters. It is thus difficult to make a good guess for fitting. As a result, if the fit does not give good results, it is difficult to know if it is because the model can not explain the data, or because the initial values given to the algorithm are too far from the best fit.

This is the problem we first faced with the 3 states model described above. One way around it is to try many different combinations of initial values for the fit, and run all the corresponding fits. This requires much more computational power than what is required for 'hand-started' fits. For example, if one wants to try 10 different values for each of the rate of the 3 states model, this means one has to start 10 to the power 6 different fits, and this is one million fits. This 'brute force' approach would rapidly face astronomic values with models with more than 10-20 parameters. We estimate that with a reasonable number of computers, we should be able to test biochemical models of the complexity needed to explain the data (of course, this is just an educated guess, we can never be sure of that).

One exciting outcome of this kind of analysis is also the ability to dismiss models, and not just find one model that fits the data. In fact, dismissing models is almost more interesting than finding a good model. If a model does not fit, we are certain that is does not explain the receptor behaviour. On the contrary, if a model does fit, it does not mean that this is the 'real' way that the receptor functions; other models may also fit. In other words, a model will look more credible if attempts with other models of similar complexity have failed.

Preliminary results

After extensive scanning of the parameter space (using the present cluster), we are confident that the 3 states model described above can not explain the data. This is an interesting result, because it suggests that more complex models are needed, which also means that we may get even more information than expected from the data. We are now exploring other models...

Glossary

Cell

Some organisms, such as bacteria and yeasts, are made of only one cell. Animals and plants are made of multiple interacting cells. For instance, a human being starts as one cell at fecondation and is made up of about 100 trillion cells when it reaches adult size. You can't divide a cell without killing it. Thus, a cell is the smallest living entity you can find (viruses are much smaller but can't reproduce themselves without a cell). Of course, a cell is itself a very complex thing that you can break apart and study. This is what cellular biologists (and others) do. One of the main component of cells are proteins, that make up most of the structure and achieve most of its functions. Another important part of the cell is the nucleus that contains the DNA. Finally, the cell is delimited by a membrane.

Protein

A protein is a chain of amino acids. There are 20 different amino acids in life. These amino acids are small molecules with 3-11 carbon atoms, plus some hydrogens, oxygens and nitrogens. You can't see a protein under a microscope, it is too small. The cell builds a protein by linking amino acids with each other in a linear way, just like you make a necklace with beads. However, in the end, a protein does not look like a neklace at all. The amino acids do not just stay next to each other in a nice line. Because of the diverse chemical properties of the 20 possible amino acids, a protein can adopt almost any shape. The chain of amino acids folds itself in what could look like a big mess, but is actually a very precise structure with a very precise function. Each protein has a different role. For example, in humans, there are ~30,000 different possible chains of amino acids (plus infinite variations that are beyond the scope of this glossary). Not all of them are produced by all the cells of the body (this is how a neuron is different from a cardiac cell for example), but one cell probably contains 1000-5000 different proteins (of course, there are several copies of each, from several dozens to several hundreds of thousands).

DNA

To make a protein, a cell has to know which amino acid to use and in which order. This information is stored in the DNA of the cell (DeoxyRibonucleic Acid), itself stored in the nucleus. The DNA is a chain of nucleotide bases. There are 4 possible bases: guanine (G), adenine (A), cytosine (C), and thymine (T). Thus, a chain of DNA can be written as a list of letters, e.g. ...A G T T C G A C G G T T A G A C A G T A A G A... A complex machinery is responsible for the translation from DNA to protein. Each group of three letters (codon) codes an amino acid (for example ATG means 'methionin').

Membrane

Membranes are very important components of the cell structure. They are lipidic films that separate different compartments, and are essentially 'waterproof'. The cytoplasmic membrane, for example, separates the inside of the cell from the outside. There is also a membrane around the nucleus. An important characteristic of membranes is their composition: lipids. Some proteins are inserted in the membrane, usually to serve as transporters (like getting nutrients inside the cells) or as receptors (like recognizing hormones floating around in the bloodstream).