Tools and Methods | ACE2 by James & Jeff

TOOLS

BLAST - Basic local alignment search tool that identifies nucleotide or amino acid sequence similarities.

NCBI - The National Center for Biotechnology Information contains several databases that provide

primarily biomedical and genomic information on a large scale. (4)

UniProtKB (Uniprot Knowledgebase) - A database that is part of UniProt (Universal Protein Resource) which contains in depth information on proteins. (14)

CDD (Conserved Domain Database) - A database within NCBI that provides 3-D models of proteins, domains, secondary structures, and other sub-units. (35)

GeneCards - Provides in depth information on genes and protein interactions. (34)

Ensembl - Provides information on genes, comparative genomics, evolution, and diseases. (15)

AceView - A database part of NCBI that provides mRNA and experimental cDNA sequences. (18)

RefSeq - An NCBI database that includes non-redundant sequence information involving genomic DNA, transcripts, and proteins. (6)

AmiGO 2 - A subset of the Gene Ontology Database that provides information on the cellular component, molecular function, and biological processes of genes. (30)

EPD (Eukaryotic Promoter Database) - A collection of databases that help to identify valid eukaryotic promoters. (11), (12)

OMIM (Online Mendelian Inheritance in Man) - Contains information on human genes, genetic phenotypes, and thorough details on several genetic disorders. (22)

ClinicalTrials.gov - Provides several global clinical studies under the U.S National Library of Medicine.

DisGeNET - Contains large collections of gene and variant information in relation to human diseases.

METHODS

1.) Initially, a partial amino acid sequence was received from a purified protein. The protein was purified through a combination of Molecular Sieve (Gel-Filtration), Ion-Exchange and Affinity chromatography from cultured HeLa cells.

2.) The partial amino acid sequence was then run through a query search using the BLAST tool in order to obtain a possible list of proteins that the sequence could encode for.

3.) After assessing mainly the accession numbers, percent identity, databases, and PRI review date values of the top search results, the correct protein was identified.

4.) Following the protein identification, the gene encoding for the particular protein was identified through NCBI.

5.) With the name of the gene obtained, we searched through NCBI Refseq, Ensembl, and Aceview in order to identify the database with the most reviewed transcript variants to produce an accurate gene map. Particularly interesting transcript variants (2 and 5) were chosen for further analysis due to their functionality differences and each were mapped to identify key sequence differences. A table composed of the NCBI Refseq and Ensembl transcript variants was formed for possible identification of common sequence similarities between each of them.

6.) The next step after researching the gene was to identify the structure of the specific protein. The main database that was used to identify the structure was Uniprot, which served to provide an immense amount of information on the secondary structures, domains, protein interactions, and much more areas. Databases such as NCBI, CDD, RefSeq, along with several scientific journals helped to assess the structure as well.

7.) After researching on the protein structure, the following step was to get an understanding of the protein’s function and the biochemical pathways that the protein is involved with. UniProt and NCBI initially helped with finding lists of the different functions that the protein does. Scientific papers such as those published on NCBI and ScienceDirect were analyzed to get more in depth information on the diagrams. The GeneCards database was particularly heavily used in order to get information on protein expression, subcellular localization, and protein interactions.

8.) Information on ACE2’s protein interactions and general structure aided in identification of possible mutations which could cause different diseases in humans. The databases Disgenet and ClinVar were used to look at data on gene/disease associations collected for observations of certain SNPs and other mutations that are likely to have negative effects on the human body.

TOOLS

METHODS