Biological databases
Biological databases are digital libraries that store information about biological systems. This information can include DNA and protein sequences, gene expression data, 3D structures of molecules, and even images captured during microscopy . These databases are essential tools for biologists and researchers in related fields, as they provide a centralized location to store, access, and analyze biological data.
There are many different types of biological databases, each with its own focus. Here are a few examples:
- Nucleotide sequence databases:These databases store DNA and RNA sequences from a wide range of organisms. Examples include GenBank, the DNA Data Bank of Japan (DDBJ), and the European Nucleotide Archive (ENA).
- Protein sequence databases:These databases store protein sequences from a wide range of organisms. Examples include UniProt and the Protein Data Bank (PDB).
- Gene expression databases:These databases store information about the levels of gene expression in cells and tissues. Examples include the Gene Expression Omnibus (GEO) and ArrayExpress.
- Metabolic pathway databases:These databases store information about the chemical reactions that take place in cells. Examples include KEGG and MetaCyc.
Biological databases are constantly being updated with new information. This is because the field of biology is constantly evolving, and new discoveries are being made all the time. Biological databases are a valuable resource for scientists who are studying all aspects of life. By using biological databases, scientists can gain insights into the workings of living organisms, develop new treatments for diseases, and improve our understanding of the natural world.
Primary biological database
Primary biological databases store the raw, unprocessed information gathered directly from biological experiments or observations. They act as the foundation for further analysis and are like the original notes and measurements taken in a science experiment. Here are some examples of primary biological databases:
-
GenBank:This is a freely accessible, annotated collection of nucleotide sequences (DNA and RNA) maintained by the National Center for Biotechnology Information (NCBI) . It includes sequences from various sources, such as research projects, genome sequencing initiatives, and patent applications.
-
DNA Data Bank of Japan (DDBJ):Similar to GenBank, DDBJ is another primary sequence database. It's part of the International Nucleotide Sequence Database Collaboration (INSDC), which ensures data exchange and synchronization between these primary databases.
-
European Nucleotide Archive (ENA):This database is maintained by the European Bioinformatics Institute (EMBL-EBI) and also collaborates with the INSDC. ENA stores nucleotide sequences from various sources within Europe.
-
Protein Data Bank (PDB):This archive contains 3D structural data of large biological molecules, such as proteins and nucleic acids. PDB plays a crucial role in structural biology and drug discovery efforts.
-
Gene Expression Omnibus (GEO):This public repository stores high-throughput functional genomics data, specifically focusing on gene expression. Researchers can submit data from microarray and RNA sequencing experiments.
-
Sequence Read Archive (SRA): This archive acts as a companion to primary sequence databases like GenBank and DDBJ. It stores the raw sequence reads generated by high-throughput DNA sequencing technologies. While not strictly a primary database (it stores raw reads instead of assembled sequences), SRA plays a vital role in preserving the original sequencing information.
Secondary biological databases don't store raw data like their primary counterparts. Instead, they house analyzed and interpreted information derived from primary databases. They are essentially the result of processing, analyzing, and interpreting the raw data, providing a higher level of understanding and allowing researchers to draw conclusions from the original information.
Here are some key characteristics of secondary biological databases:
- Value-added: These databases build upon primary data by offering insights, predictions, and interpretations.
- Curated: Secondary databases are meticulously managed by experts to ensure accuracy and consistency of the data.
- Resource for Discovery: Secondary data allows researchers to explore trends, identify patterns, and develop new hypotheses.
Some prominent examples of secondary biological databases include:
- SWISS-PROT:This database provides functional annotations of proteins based on sequence data. Functional annotations describe the biological function of a protein, including its cellular location and its role in metabolic pathways.
- Kyoto Encyclopedia of Genes and Genomes (KEGG):KEGG offers information on metabolic pathways, genes and enzyme interactions, and also integrates information about diseases.
- Ensembl:This database annotates genes and genomes across different species. It integrates information from various sources to provide a comprehensive view of genes and their genomic context.
Secondary biological data is a powerful tool for researchers because it allows them to:
- Save time and resources: By leveraging existing analysis and interpretations, scientists can focus on new research questions.
- Gain new insights: Secondary data can reveal patterns or trends that might not be readily apparent from primary data alone.
- Develop new hypotheses: The analysis and interpretations within secondary data can spark new ideas for further investigation.