The digital human

23 August 2005

digital human opening spreadWhat should have been the scientific advance of the new millennium - and put forward as the key to human life - turned up more mysteries than it solved. In 2003, the Human Genome Project delivered a long list of amino acid sequences and the genes that those sequences comprised. In principle, molecular biologists had everything they needed to tie genes to functions in the body.

The workhorses of the body are proteins. They are long chains of atoms that fold up in unusual ways to expose tiny chemically reactive areas. The active ingredient of the blood cell, haemoglobin, is a protein that lets four molecules of oxygen bind to it on the way out of the lungs. Once it has carried the oxygen to a muscle to be used, it takes molecules of carbon dioxide back to be exhaled.

Body functions are almost all handled in some way by at least one protein, and genes provide the basic blueprints for those proteins. Cells are made out of a combination of complex molecules from basic chemicals and it falls to proteins building most of those molecules. Before the human genome was sequenced, what biologists knew was that genes tell some of these builder proteins what to make. By unlocking the secrets of the genome, biologists could surely work out how the body works.

Life is nowhere near that simple, it seems. Scientists found there was a problem with the genome that had been sequenced: there were fewer genes in the human DNA sequence than expected. Estimates had put the figure at 100,000 at the start of the project in 1990. By October last year, researchers concluded there could be fewer than 25,000. This figure is ten times lower than the types of protein in each person's body. Other projects, such as one to uncover the way that the malaria parasite operates, have turned up similar results.

The sequenced human genome also contained large chunks of 'junk' DNA: these were bits of the genome that did not seem to have any function at all. They just filled in between the 'real genes'. Yet, some experiments indicated that this apparently useless DNA does get involved in processes in the body.

As they tried to work out what the genome project revealed about the body, biologists began to understand that more complex processes needed to be invoked explain to determine how each of the active genes produces the proteins that appear in each different type of cell that makes up the human body.

Prof Denis Noble of the University of Oxford physiology department, said relying on the genome alone "is like looking at a telephone directory and thinking you have the secrets of the city".

Biologists use the term 'gene expression' to describe the conversion of genetic information into proteins and ultimately into cell building blocks. This is a complex process that involves a number of steps and many different proteins. Gene expression is not just down to the structure of the DNA, but the makeup of the cell and what it is doing at the time. Proteins will interfere with the process as well as taking part in it.

Noble said a bottom-up view, working up from the genome, is not the way forward because it is higher-level processes that determine the behaviour of the cell and how the genetic material gets involved. "I liken it to an organ keyboard: the music comes from player not the keyboard," he said. And gene expression is far from being static. "I tell my students, if you go to the theatre and have a good laugh or a cry, your gene expression will have changed for the week."

Following the initial sequencing of a representative human genome, researchers are turning to those of known individuals. One company, 454 Life Sciences has said it could sequence the genes of one of the scientists who uncovered the structure of the DNA molecule, James Watson, in one year. That is a huge reduction on the 15 years it took for the first attempt. In the future, researchers believe it will be a routine exercise to sequence DNA as costs fall way below $1m to $200, about the same as the cost of a magnetic resonance imaging (MRI) scan.

The question is what can be done with this genetic information if it does define what biologists need to know about the body? So, the research is moving up to the level of proteins, cells, organs and the body overall. The key for a loose collaboration of researchers around the world is a model to tie the various representations of biological processes together. The aim of the Physiome Project, initiated by the International Union of Physiological Sciences (IUPS), is to build a computer model of the human body. The models will extend from the genetic level through protein reactions, cell functions, tissue behaviour through to the way that organs behave. Using such an extensive model, doctors could, in principle, work out how badly a patient might react to a new drug simply by providing their DNA sequence and simulating with it. It might even reveal whether patients are lying about taking their medication or what they are eating as such a model would show what effects different drugs or foods have on gene expression, and on their ability to recover from a disease.

The Physiome Project is not the only one of its type. The US defence research agency DARPA has kicked off a set of projects that go under the banner of Virtual Soldier. If the project is successful, soldiers of the future will wear electronic dog tags that contain genetic and physiological information about themselves. Field doctors will put the tags of injured soldiers into a computer and use full-body models, which DARPA has called 'holomers', to guide them on the best course of treatment. However, the ability to simulate the body processes of a human is years away. Even the most optimistic predictions put this ability 20 to 30 years into the future.

There are some apparent similarities between the Physiome Project and the Human Genome Project. The main one is the scale of the new project. It is potentially a massive undertaking that will involve many teams from around the world. The systems that are being built today are taking into the account the fact that researchers will need to share each others' models and even compute resources.

But the workers do not expect the same level of attention that surrounded the Human Genome Project as the worldwide team neared the completion of its task. "There is no instant win with the physiome, although the work has yielded tangible results already," said Noble. "But few have realised the scale of what needs to be done. In the way they recognised the scale of the Human Genome Project. And I don't expect that to happen with the Physiome Project. With the human genome project they could at least declare they were 98per cent there. You can't do that with the processes of the body because we will always be filling in the detail."

The quest to build a computer model of the human body from genes up is one that researchers know to be impossible. "It is easy to show that, to build up from the genome that there is not enough material in the solar system to do that on a computer," said Prof Denis Noble of the University of Oxford physiology department and one of the pioneers of the Physiome Project. "To compute exhaustively is beyond anything we could ever build. I did a calculation that showed you would require 1027 Blue Gene supercomputers to simulate all of the molecular interactions in a single cell. And there are a billion such cells just in the human heart.

"We will never have that much computing power. But that is mindless simulation. You need simulation with insight. You need to determine which part of simulated reality is the explanatory bit. It is a restriction that forces us to think. Computing will always be a bottleneck but a good disciplinary bottleneck."

Noble began devising mathematical models the way that the heart works some 40 years ago and has developed over the years progressively more advanced representations of the vital organ using differential equations to model much of the behaviour of tissues and cells. The heart models run on supercomputers attached to the UK's computing grid, simulating seconds of real time in the space of hours of compute time.

Scale is a big problem for modelling biological systems. "The physical scales vary by four orders of magnitude. A signal between cells may start at the protein level and the response may be at the full cell level. You also have reactions that take fractions of a second for a protein but the results of that reaction may take days or months to have an effect," said Henry Kelly, president of the Federation of American Scientists (FAS), a group that kicked off a number of projects to model body functions under the banner Digital Human in 2001.

The key to developing the models further is to find alternative representations that are easier for computers to work with. "I have worked with some people who have done a brilliant job of seeing the same equations that don't require so much heavy computing. You need mathematical insight to see how complex differential equations behave," said Noble. "It is possible that we will find how to reduce models at one level to faster, more computable representations as we move up to the higher-level models."

Kay Howell, vice-president of IT at the Federation of American Scientists (FAS), said it is unclear how far mathematical transformations will get researchers: "Computational biology is very new. We don't have a mature set of computational methods and algorithms for tackling the problems as yet. But it will be a huge growth area. The complexity of what researchers are looking at means it is too hard to do any other way."

Heart target

The Physiome Project and similar efforts may be years away from delivering a digital model of the human body, but benefits from the work could be realised much sooner.

"Some goals are immediately in view. My audiences are sometimes surprised when I say it is already being used in pharmaceutical research and in regulatory bodies," said Prof Denis Noble of the University of Oxford. "For people designing and screening drugs, the problem for them is working out which drugs will cause cardiac arrhythmia and which don't."

"Nearly 40 per cent of compounds researched by the pharmaceutical industry hit the heart. They react with one of the transporters, called hERG, and cause arrhythmia in the heart."

This problem has helped the cost of drug research soar. The Tufts Center for the Study of Drug Development reported a couple of years ago that the average cost to develop a new drug is $802m. Withdrawals are much more expensive. Merck's withdrawal of the painkiller Vioxx because of its effect on the hearts of some patients saw more than $25bn wiped off the company's share price in one day and consumers lined up a series of class-action suits.

Today, the methods for determining the effect of a drug on the heart during trials is based on analysis of one feature found in electrocardiogram (ECG) traces. "People look at the T-wave of the ECG for prolongation. There is almost an industry in measuring that interval. But a very poor marker for what you want to know: will the drug kill someone?" said Noble.

Modelling on its own would not solve the problem but, in combination with experiments to determine the behaviour of drugs on cells in the heart, Noble said the process of screening out potentially lethal drugs could be made more efficient. "It would have one of the biggest impacts on healthcare costs. If we moved from 98per cent attrition to 95per cent, we would more than double the output of the pharmaceutical industry," argued Noble. "I discuss this quite frequently with the pharmaceutical industry to see if we can get round the problem. If it works something will happen that will be of immediate benefit and we would find computing becomes used in the same way it is used in the automotive or aerospace industries."

In the US, another shorter-term project is to use modelling to improve the training of medical and biology students. The Federation of American Scientists (FAS) is building software that will ultimately run on games machines such as the Playstation 3 that will run interactive simulations of biological processes. "These new games devices are supercomputers. You can have spectacularly impressive visualisations," said Henry Kelly, president of the FAS. "The question is: do you want to? Cellular processes are visually boggling. In engineering, things do simple things like twist along one axis. They don?t fall apart, reassemble and then turn inside out. It is the mother of all visualisation challenges. Everything is in motion.

"You can show the full detail but you might want a more cartoon-like view. This is an interesting research issue."

This feature is an excerpt from an eight-page special report published in the August/September 2005 issue of Information Professional on the Physiome Project.