NCBI Database and Tools – National Center for Biotechnology Information (NCBI)

The National Center for Biotechnology Information (NCBI) is a part of the U.S. National Library of Medicine, which is itself a branch of the National Institutes of Health (NIH). It provides access to a wealth of information in the fields of biotechnology and biomedicine. Some key resources offered by NCBI include:

  1. PubMed: A database of biomedical literature, primarily journal articles.
  2. GenBank: A database of DNA sequences.
  3. BLAST: A tool for comparing an individual’s biological sequence against a database of sequences.
  4. Bookshelf: A collection of free, online books and documents in the life sciences and healthcare.
  5. ClinVar: A resource for clinical variants and their relationships to health conditions.

NCBI’s resources are widely used by researchers, clinicians, and students in the fields of biology and medicine.

History and Establishment of NCBI

The National Center for Biotechnology Information (NCBI) was established in 1988 as part of the National Institutes of Health (NIH) in the United States. Its creation was driven by the need to support research in molecular biology and genetics.

Key points in its history:

  1. Establishment (1988): NCBI was founded under the Omnibus Budget Reconciliation Act, with the goal of creating a centralized repository of biological information.
  2. Early Developments (1990s): NCBI developed several important databases and tools, including GenBank, a publicly accessible database of DNA sequences. It also contributed to the development of BLAST (Basic Local Alignment Search Tool), a widely used tool for comparing biological sequences.
  3. Expansion and Innovation (2000s): NCBI continued to expand its resources, incorporating a wide range of biological data and tools. It played a crucial role in the Human Genome Project, which mapped the entire human genome.
  4. Recent Advances (2010s-2020s): The NCBI has continued to evolve, integrating new technologies and expanding its databases and tools to support a broad range of biomedical research.

Today, NCBI is a vital resource for researchers worldwide, offering access to a vast array of databases, tools, and software that support genomic research, drug development, and many other areas of biomedical science.

Mission and Objectives of NCBI

The mission of the National Center for Biotechnology Information (NCBI) is to advance science and health by providing access to biomedical and genomic information. Its objectives are centered around supporting the research community and facilitating scientific discovery. Key objectives include:

  1. Data Repository and Accessibility: To collect, maintain, and provide access to a comprehensive range of biological and biomedical data, including genomic, transcriptomic, and proteomic information.
  2. Research Support: To develop and provide tools and resources that assist researchers in analyzing and interpreting biological data, such as sequence alignment tools (e.g., BLAST), gene expression analysis tools, and more.
  3. Facilitate Discovery: To support scientific discovery by making data and tools available to researchers globally, thus fostering innovation and new findings in the fields of genomics, molecular biology, and related areas.
  4. Collaboration and Integration: To collaborate with other research organizations and institutions to integrate and harmonize biological data from various sources, enhancing the utility and accessibility of the data.
  5. Educational Resources: To provide educational materials and resources to help researchers, educators, and students understand and use NCBI tools and databases effectively.
  6. Public Health Impact: To support public health by enabling research that can lead to better understanding, diagnosis, and treatment of diseases, ultimately improving human health.

Key Resources and Databases of NCBI

NCBI offers a wide array of resources and databases that are essential for various aspects of biomedical and genomic research. Here are some of the key resources:

  1. GenBank: A comprehensive public database of nucleotide sequences from a variety of organisms. It includes sequences from both experimental studies and literature.
  2. PubMed: A database of biomedical literature, including articles from journals in the fields of medicine, life sciences, and related disciplines. PubMed is a key resource for finding research papers and reviews.
  3. BLAST (Basic Local Alignment Search Tool): A widely used tool for comparing an input sequence against a database to find regions of local similarity. It’s essential for sequence alignment and annotation.
  4. Entrez: A search and retrieval system that provides access to a variety of databases, including nucleotide and protein sequences, taxonomy, literature, and more.
  5. dbSNP: A database of single nucleotide polymorphisms (SNPs) and other genetic variation. It’s useful for studying genetic variation and its impact on health and disease.
  6. RefSeq: A curated collection of reference sequences for genomes, transcripts, and proteins. RefSeq provides a comprehensive and accurate representation of the sequences of genes and their products.
  7. Gene: A database that provides detailed information about genes, including their function, structure, and associated diseases.
  8. GEO (Gene Expression Omnibus): A database of high-throughput gene expression and other functional genomics datasets. It provides access to gene expression data from various experiments.
  9. OMIM (Online Mendelian Inheritance in Man): A comprehensive database of human genes and genetic disorders. It provides information on the genetic basis of diseases and traits.
  10. BLAST: A sequence alignment tool used to identify regions of similarity between biological sequences. It is frequently used for comparing gene or protein sequences.
  11. PubChem: A database of chemical molecules and their activities. It includes information on chemical substances, compounds, and their biological activities.

Tools and Services of NCBI

NCBI provides a range of tools and services designed to support various aspects of biomedical and genomic research. Here are some of the key tools and services:

  1. BLAST (Basic Local Alignment Search Tool): A tool for comparing an input sequence (nucleotide or protein) against a database to find regions of local similarity. It helps in sequence alignment and functional annotation.
  2. Entrez: A search and retrieval system that allows users to access and search multiple NCBI databases, including PubMed, GenBank, and more, from a single interface.
  3. Genome Data Viewer (GDV): A tool for visualizing and exploring genomic data. It provides interactive views of genomes, annotations, and other genomic features.
  4. Gene Expression Omnibus (GEO): A repository for high-throughput gene expression and other functional genomics datasets. GEO provides tools for analyzing and visualizing gene expression data.
  5. dbSNP: A database that provides information on single nucleotide polymorphisms (SNPs) and other genetic variations. It includes tools for querying and analyzing genetic variation data.
  6. RefSeq: Provides a curated collection of reference sequences for genomes, transcripts, and proteins. RefSeq offers tools for viewing and analyzing these sequences.
  7. PubChem: A database of chemical molecules and their biological activities. It includes tools for searching and analyzing chemical data.
  8. NCBI Variation Viewer: A tool for visualizing and exploring genetic variants in the context of the genome. It helps researchers understand the impact of variants on gene function and phenotype.
  9. Protein Data Bank (PDB) Tools: Tools for viewing and analyzing 3D structures of proteins and other macromolecules. PDB provides access to structural data and visualization tools.
  10. BLAST+ Command Line Tools: A set of command-line tools for performing BLAST searches. These tools are useful for users who prefer scripting and batch processing.
  11. Taxonomy Browser: A tool for exploring the taxonomy of organisms. It provides access to hierarchical classifications and information about different species.
  12. Sequence Read Archive (SRA): A database of raw sequencing data from various high-throughput sequencing technologies. SRA provides tools for accessing and analyzing these data.

Applications and Impact of NCBI

The National Center for Biotechnology Information (NCBI) has a broad range of applications and a significant impact on various fields of science and medicine. Here are some key areas where NCBI’s resources and tools make a difference:

Applications

  1. Genomic Research:
    • Sequence Analysis: Tools like BLAST and databases such as GenBank and RefSeq enable researchers to analyze and compare DNA, RNA, and protein sequences, facilitating the identification of genes, genetic variations, and functional elements.
    • Genetic Mapping: NCBI resources help in mapping genes to specific locations on chromosomes, aiding in the understanding of gene function and the genetic basis of diseases.
  2. Drug Discovery and Development:
    • Chemical Information: PubChem provides detailed information on chemical compounds and their biological activities, supporting drug discovery and development.
    • Genetic Variation Analysis: Databases like dbSNP assist in identifying genetic variations that may impact drug response and efficacy.
  3. Disease Research:
    • Genetic Disorders: OMIM offers comprehensive information on genetic disorders and their molecular mechanisms, helping researchers understand the genetic basis of diseases and identify potential therapeutic targets.
    • Gene Expression Studies: GEO provides access to high-throughput gene expression data, allowing researchers to study how gene expression changes in different conditions or diseases.
  4. Personalized Medicine:
    • Variant Interpretation: Tools and databases help in interpreting genetic variants in the context of health and disease, which is crucial for developing personalized medicine approaches.
  5. Bioinformatics and Computational Biology:
    • Data Integration: NCBI tools and databases facilitate the integration and analysis of large-scale biological data, supporting computational biology research and the development of new algorithms and methods.
  6. Educational Resources:
    • Training and Tutorials: NCBI provides educational materials and training resources to help researchers, educators, and students effectively use its tools and databases.

Impact

  1. Advancing Scientific Discovery:
    • NCBI’s resources have been instrumental in numerous scientific breakthroughs, including the identification of disease-associated genes, the development of new therapies, and advancements in genomics and bioinformatics.
  2. Facilitating Collaboration:
    • By providing open access to a wealth of biological data, NCBI fosters collaboration among researchers worldwide, enabling them to share data, validate findings, and build on each other’s work.
  3. Supporting Public Health:
    • NCBI’s resources contribute to understanding the genetic basis of diseases, which can lead to better diagnostic tools, treatments, and public health strategies.
  4. Enhancing Data Accessibility:
    • NCBI’s commitment to open access ensures that researchers, clinicians, and the public can access valuable biological information without barriers, promoting transparency and knowledge dissemination.
  5. Driving Innovation:
    • The development of new tools and databases by NCBI drives innovation in bioinformatics, genomics, and related fields, supporting the advancement of science and technology.

Database Retrieval Tool on ncbi

The Database Retrieval Tool on NCBI, often referred to as Entrez, is a powerful search and retrieval system used to access and retrieve information from various NCBI databases. Entrez provides a unified interface to search multiple databases, including PubMed, GenBank, and others.

Key Features of Entrez:

  1. Unified Search Interface:
    • Allows users to search across multiple databases simultaneously or focus on specific databases.
  2. Search Capabilities:
    • Offers advanced search features, including Boolean operators, filters, and field-specific searches.
  3. Links Between Records:
    • Provides links between related records across different databases, such as linking a PubMed article to its associated GenBank sequence.
  4. Summary and Detailed Views:
    • Displays summary information in search results and detailed views for individual records.
  5. Customizable Searches:
    • Users can customize searches and save them for future use.

Commonly Accessed Databases through Entrez:

  1. PubMed: Biomedical literature.
  2. GenBank: Nucleotide sequences.
  3. Protein: Protein sequences.
  4. Structure: 3D structures of proteins, nucleic acids, and complex assemblies.
  5. Gene: Gene-specific information.
  6. SNP: Single nucleotide polymorphisms.
  7. Taxonomy: Information about biological classification.
  8. ClinVar: Clinical significance of genetic variants.

Accessing Entrez:

You can access Entrez directly through the NCBI website. Here’s the main entry point:

From there, you can select the database you want to search and enter your query. The system will provide relevant results along with options to refine and explore further.

Steps for Sequence Submission to NCBI

Submitting sequences to NCBI is a key process for sharing genomic, transcriptomic, and other sequence data with the scientific community. Here’s a step-by-step guide on how to submit sequences to NCBI:

Steps for Sequence Submission:

  1. Prepare Your Data:
    • Ensure your sequence data is in the correct format, typically FASTA for sequences. For more detailed annotations, you might need additional files in GFF or other formats.
    • Prepare any metadata required, such as organism information, gene annotations, and experimental details.
  2. Choose the Appropriate Database:
    • GenBank: For nucleotide sequences, including genomic, transcript, and other DNA sequences.
    • RefSeq: For curated reference sequences.
    • SRA (Sequence Read Archive): For raw sequence data from high-throughput sequencing.
  3. Create an NCBI Account:
  4. Use the Submission Tools:
    • BankIt: A web-based tool for submitting nucleotide sequences to GenBank. Ideal for smaller-scale submissions.
    • Sequin: A desktop application for detailed submission of nucleotide sequences to GenBank.
    • SRA Submission Portal: For submitting raw sequencing data.
  5. Submit Your Sequence:
    • Follow the instructions for the tool you choose. Generally, you’ll need to upload your sequence files, provide metadata, and review your submission.
    • For BankIt, you’ll enter your sequence data and metadata directly into a web form.
    • For Sequin, you’ll load your files into the application, review the data, and submit it.
  6. Review and Confirmation:
    • After submission, you will receive a confirmation email with a submission ID.
    • Review your submission for accuracy and track its status through the NCBI submission portal.
  7. Publication:
    • After processing and review, your sequence data will be publicly accessible through NCBI databases like GenBank or SRA.

Basic local alignment search tool (BLAST)

The Basic Local Alignment Search Tool (BLAST) is a widely used algorithm for comparing an individual biological sequence (such as DNA, RNA, or protein) against a database of sequences to find regions of local similarity. It helps identify homologous sequences, which can provide insights into gene function, evolutionary relationships, and more.

Key Features of BLAST:

  1. Sequence Alignment:
    • Finds regions of local similarity between sequences. This can help identify functional and evolutionary relationships between genes.
  2. Multiple BLAST Programs:
    • BLASTN: Compares nucleotide sequences to nucleotide databases.
    • BLASTP: Compares protein sequences to protein databases.
    • BLASTX: Compares a nucleotide sequence translated into protein against a protein database.
    • TBLASTN: Compares a protein sequence to a nucleotide database translated into protein.
    • TBLASTX: Compares the translated nucleotide sequence to the translated nucleotide database.
  3. Parameters and Settings:
    • Users can adjust parameters such as the scoring matrix, gap penalties, and search space to fine-tune their searches.
  4. Output:
    • Provides alignment results showing matches between the query and database sequences, including scores, E-values (which indicate statistical significance), and graphical representations of the alignments.
  5. Visualization:
    • Offers visual tools to help interpret the results, such as graphical views of alignments and annotations of significant matches.

How to Use BLAST:

  1. Access BLAST:
  2. Select a BLAST Program:
    • Choose the appropriate BLAST tool based on your query sequence type (nucleotide or protein) and your search goals.
  3. Enter Your Sequence:
    • Input your query sequence in the provided text box or upload a file containing your sequence.
  4. Choose a Database:
    • Select the database against which you want to compare your sequence. NCBI offers various databases for different types of sequences (e.g., RefSeq, GenBank, nr).
  5. Set Parameters:
    • Adjust search parameters if necessary. For most basic searches, default settings are sufficient.
  6. Run the Search:
    • Submit your query and wait for the search to complete. The time required depends on the size of the database and the complexity of the search.
  7. Review Results:
    • Examine the results, including alignment details, scores, E-values, and graphical representations.

Nucleotide Database

The NCBI Nucleotide Database is a comprehensive repository of nucleotide sequences, which includes a variety of data types such as genomic DNA, mRNA, and ESTs (Expressed Sequence Tags). This database is widely used for research in genomics, genetics, and molecular biology.

Key Features of the Nucleotide Database:

  1. Sequence Data:
    • Contains raw nucleotide sequences from a variety of organisms, including bacteria, archaea, eukaryotes, and viruses.
  2. Annotations:
    • Provides associated annotations such as gene features, functional information, and biological context for many sequences.
  3. Cross-Referencing:
    • Links to related data in other NCBI databases, such as protein sequences in the Protein Database or functional annotations in the Gene Database.
  4. Search Capabilities:
    • Allows users to search for specific sequences, keywords, or other criteria. You can perform both simple and advanced searches.
  5. Download Options:
    • Users can download sequences in various formats, such as FASTA or GenBank format, for further analysis.
  6. Integration with BLAST:
    • Sequences from the Nucleotide Database can be used as a reference in BLAST searches to find similar sequences.

Accessing the Nucleotide Database:

  1. Visit the Database:
  2. Search and Retrieve Sequences:
    • Basic Search: Enter keywords, accession numbers, or specific sequences in the search box.
    • Advanced Search: Use advanced search options to refine your query based on specific criteria like organism, sequence length, and more.
  3. Viewing Records:
    • Search results provide links to detailed records that include sequence data, annotations, and links to related information.
  4. Downloading Sequences:
    • From individual sequence records or batch downloads, you can export sequences in different formats for further analysis.

Example Use Cases:

  • Gene Discovery: Identifying and analyzing genes within a genome.
  • Sequence Comparison: Comparing new sequences against a comprehensive database to identify homologous sequences or predict functions.
  • Functional Annotation: Associating newly discovered sequences with known functions based on similarity to annotated sequences.

Protein Database

The NCBI Protein Database is a comprehensive repository of protein sequences from a wide range of organisms. It provides access to protein sequences and their functional annotations, supporting research in molecular biology, biochemistry, and related fields.

Key Features of the Protein Database:

  1. Protein Sequences:
    • Includes a vast collection of protein sequences from diverse organisms, including bacteria, archaea, eukaryotes, and viruses.
  2. Annotations:
    • Provides detailed annotations for each protein sequence, including information about protein function, structure, domains, and more.
  3. Cross-Referencing:
    • Links to related data in other NCBI databases, such as nucleotide sequences in the Nucleotide Database and gene information in the Gene Database.
  4. Search Capabilities:
    • Allows users to search for specific proteins using keywords, accession numbers, or sequence data. Advanced search options are available for more refined queries.
  5. Integration with BLAST:
    • Protein sequences from the Protein Database can be used in BLAST searches to find similar sequences and infer functional relationships.
  6. Download Options:
    • Users can download protein sequences in various formats, such as FASTA, for further analysis.

Accessing the Protein Database:

  1. Visit the Database:
  2. Search and Retrieve Sequences:
    • Basic Search: Enter keywords, accession numbers, or protein sequences in the search box.
    • Advanced Search: Use advanced search options to refine your query based on criteria such as organism, protein family, and more.
  3. Viewing Records:
    • Search results provide links to detailed records that include protein sequence data, annotations, and links to related information.
  4. Downloading Sequences:
    • You can export sequences from individual records or perform batch downloads for further analysis.

Example Use Cases:

  • Protein Function Prediction: Inferring the function of unknown proteins by comparing them to known proteins.
  • Structural Analysis: Investigating protein structure and domains to understand their roles in biological processes.
  • Homology Studies: Identifying homologous proteins across different species to study evolutionary relationships.

Gene Expression Database

The NCBI Gene Expression Database is known as the Gene Expression Omnibus (GEO). It is a comprehensive resource for storing and sharing high-throughput gene expression data and other functional genomics datasets.

Key Features of GEO:

  1. Gene Expression Data:
    • Contains data from microarray experiments, RNA-Seq, and other high-throughput technologies that measure gene expression levels.
  2. Metadata and Annotations:
    • Provides detailed metadata about experiments, including experimental design, sample conditions, and technical details.
  3. Data Submission:
    • Allows researchers to submit their own gene expression data for public access.
  4. Search and Retrieval:
    • Enables users to search for specific datasets, experiments, or genes. Advanced search options help refine queries based on criteria like organism, experimental condition, or dataset type.
  5. Analysis Tools:
    • Offers tools for analyzing and visualizing gene expression data, including interactive features for exploring data trends and patterns.
  6. Integration with Other Databases:
    • Links to related data in other NCBI databases, such as nucleotide sequences in GenBank or protein sequences in the Protein Database.

Accessing GEO:

  1. Visit GEO:
    • Access the Gene Expression Omnibus via the NCBI website: NCBI GEO
  2. Search and Explore Data:
    • GEO Datasets: Search for specific datasets using keywords or experiment IDs.
    • GEO Series: Explore collections of related datasets, including metadata and experimental details.
    • GEO Profiles: Search for specific gene expression profiles across different experiments.
  3. Viewing and Downloading Data:
    • Detailed records provide access to raw data files, processed data, and experimental annotations.
    • Users can download datasets in various formats, such as text files or spreadsheets, for further analysis.
  4. Data Submission:
    • Researchers can submit their own gene expression data to GEO using the GEO Submission Portal.

Example Use Cases:

  • Gene Expression Analysis: Comparing gene expression levels across different conditions, treatments, or tissues.
  • Functional Genomics: Investigating the functional impact of genetic variations or mutations on gene expression.
  • Pathway Analysis: Identifying pathways and networks that are differentially expressed in disease states or experimental conditions.

Latest Questions

Start Asking Questions

This site uses Akismet to reduce spam. Learn how your comment data is processed.

⚠️
  1. Click on your ad blocker icon in your browser's toolbar
  2. Select "Pause" or "Disable" for this website
  3. Refresh the page if it doesn't automatically reload