Last rolls of the yoyo: Assessing the human canonical protein count

Christopher Southan

doi:10.12688/f1000research.11119.1

What is it about?

The human genome contains many types of genes. One of the most important of these are proteins that go to make up the molecular machinery of the cell. The basic paradigm of molecular biology is that the stretches of DNA that encode a protein gene are first transcribed into messenger RNA and then translated into new proteins by the ribosomes. For any species the number of basic proteins it encodes in its genome is largely fixed and is a defining characteristic of the organism. For example, yeast has about 6,000 protein coding genes. Estimates from the initial human genome were surprisingly low compared to what people had expected to be well over 30,000. It turns out that improvements in looking at this number resulted in it falling over the years and most sources now put the count at around 20,000. There are different teams who collate these protein sets in different ways for different purposes. However, somewhat unexpectedly after over 15 years since the human genome completion, the protein counts from different teams are still not the same (i.e. there is still no exact consensus). This paper includes a detailed comparison of the different numbers from different sources and provides at least some explanations as to why they still do not agree.

Why is it important?

Defining the canonical protein number is important for defining an organism in molecular terms as at least a prelude to functional exploration. Assessing nine different sources in this work produced nine different numbers with a spread of 3000 between the highest and lowest. Considering the massive experimental focus and data generation for the human genome, transcriptome and proteome it seems peculiar for such a lack of consensus to persist for such an important parameter for human biology. In addition defining this number (while we expect some fuzziness for a variety of reasons) is crucial for the biomedical domain. For example it defines the scope of potential of disease associated genetic perturbations relate to protein coding regions as well as potential drug targets for ameliorating diseases.

Perspectives

As a protein chemist by background I became able to appreciate the tangible existence of real proteins in vitro, having worked on quite a few. In the two years or so I was at the then Oxford Glycosciences I was able engage with high throughput proteomics on the bioinformatics side. Fortuitously, I became engaged in a project to convince the Scientific Advisory Board of OGS that the canonical human protein number was low rather than high (because this had revenue implications for the OGS/Confirmant Protein Atlas contract). One of the outcomes of this was to to write the review "Has the yo-yo stopped? An assessment of human protein-coding gene number" https://www.ncbi.nlm.nih.gov/pubmed/15174140
Dr Christopher Southan

This page is a summary of: Last rolls of the yoyo: Assessing the human canonical protein count, F1000Research, April 2017, Faculty of 1000, Ltd.,
DOI: 10.12688/f1000research.11119.1.
You can read the full text:

Read

Contributors

The following have contributed to this page

Dr Christopher Southan

Counting human proteins

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Counting human proteins

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management