Special Report: Fighting The New Coronavirus
On March 5, local time, the artificial intelligence company DeepMind stated in its blog that it used AlphaFold to generate six protein structure prediction results that may be related to the new coronavirus.
As the world's top and most concerned artificial intelligence company, what DeepMind has done in this global outbreak has always attracted outside attention.
The latest results announced by DeepMind may boost research and development of new coronavirus vaccines.
Download of DeepMind's new coronavirus protein structure prediction results download link:
https://storage.googleapis.com/deepmind-com-v3-datasets/alphafold-covid19/structures_4_3_2020.zip
A protein structure predicted by AlphaFold
Speaking of AlphaFold, everyone may be familiar with it. It was developed by the research team that developed AlphaGo and is a new member of the Alpha family.
As soon as it came out in December 2018, it attracted the attention of the scientific community. It can predict the 3D structure of a protein based on only the genetic code.
In participating in the 2018 Global Protein Structure Prediction Contest (CASP), it surpassed the other 97 participants and won the first place, with 8 times more than the second place.
Skip peer review and directly release forecast results
DeepMind said that in order to deal with the epidemic caused by the new coronavirus outbreak, the scientific community has done a lot of basic research on the characteristics of this virus family.
Cutting-edge laboratories open up viral genome data, which allows researchers to quickly develop test therapies for the virus.
Other laboratories have shared the identified and predicted structure of viral proteins, and even shared epidemiological data.
For AlphaFold, since the system was introduced, it has been working to accurately predict the structure of proteins without similar protein structures available.
Through continuous improvement of methods, DeepMind hopes to provide the most useful predictions. It also hopes that the results of this release will help the scientific community to better understand the virus mechanism and provide a hypothesis generation platform for the development of new coronavirus treatment protocols.
Normally, DeepMind will release the research results after peer-reviewed and officially published in the journal.
This time skipping the conventional steps and publicizing the results of the structural forecast, DeepMind said, also based on the severe situation and time sensitivity of the epidemic.
Therefore, DeepMind pointed out that their structure prediction system is still under development, and although it is convinced that the system is more accurate than the previous CASP13 system, it cannot determine the accuracy of the provided structure.
How AlphaFold predicts new coronavirus protein structure
According to the DeepMind team, AlphaFold uses neural networks to predict physical properties. These neural networks are trained to predict protein properties from the protein's gene sequence, such as the distance between the amino acid pairs and the angle between the chemical bonds connecting these amino acids.
AlphaFold then adjusted the structure to find the most efficient amino acid arrangement. The program took two weeks to predict the first protein structure, but now it only takes a few hours to predict it.
The DeepMind team trained a neural network to predict the individual distribution of distances between each pair of residues in a protein. These probabilities are then combined into a score to assess how accurate the envisioned protein structure is.
In addition, a separate neural network was trained to summarize the distance sums between all amino acids to estimate how close the envisioned protein structure is to the correct answer.
The first method designed by the DeepMind team
Using these evaluation functions, AlphaFold is able to search all protein maps to find structures that match the research hypothesis.
The first method designed by the DeepMind team is based on techniques commonly used in structural biology, where a certain piece of protein structure is repeatedly replaced with a new protein fragment.
In this way, the trained neural network can invent new protein fragments, thereby continuously improving the score of the envisioned protein structure.
The second method is to optimize the score through gradient descent. This is a mathematical technique commonly used in machine learning to make structures highly accurate by making small and incremental changes.
This set of techniques is applied to the prediction of the entire protein chain, rather than the fragments that are individually folded before the protein structure is assembled, so the complexity of the entire prediction process is technically reduced.
Why artificial intelligence can play a role in biology
Protein is the material basis of all life and predicting its 3D structure is an important challenge in biology, which will affect people's understanding of diseases and drug discovery.
Using gradient descent method to predict structural target T1008
The basic unit of protein is amino acids. There are only more than 20 amino acids in the entire Earth's living system, which constitute tens to hundreds of millions of different proteins.
There are many types of proteins with different properties and functions. The three-dimensional structure of a protein depends on the number and type of amino acids it contains. The structure also determines the role of the protein in the body.
For example, the antibody proteins that make up the immune system are "Y-shaped," similar to a hook. By targeting viruses and bacteria, antibody proteins can detect and label disease-causing microbes and destroy them; while collagen is shaped like a rope, it transmits tension between cartilage, ligaments, bones, and skin.
In addition, protein folding contains many functions. For example, protein quaternary structure folding is affected by a large number of non-covalent interactions including hydrogen bonding, ionic bonding, and hydrophobic interaction.
Therefore, to understand the mechanism of action of proteins at the molecular level, it is necessary to accurately measure the three-dimensional structure of proteins.
Structural biology developed in the past 60 years has adopted techniques including X-ray crystallography, nuclear magnetic resonance, and cryo-electron microscopy to analyze protein structure.
But the DeepMind team believes that these traditional methods rely on a large amount of experimentation and trial and error, and that the cost of studying each structure will probably cost tens of thousands of dollars.
This time-consuming and labor-intensive task is best solved by artificial intelligence. Coupled with the rapid decrease in the cost of gene sequencing in recent years, data in the field of genomics are very rich.
Therefore, artificial intelligence already has the conditions to make deep learning through genomic data to make predictions.