1.Nucleotide Sequence Databases
1.1 International Nucleotide Sequence Database Collaboration
Database name |
Full name and/or description |
URL |
DDBJ-DNA Data Bank of Japan |
All known nucleotide and protein sequences |
|
EMBL-Nucleotide Sequence Database |
All known nucleotide and protein sequences |
|
GenBank |
All known nucleotide and protein sequences |
1.2. DNA sequences: genes, motifs and regulatory sites
1.2.1. Coding and coding DNA
Database name |
Full name and/or description |
URL |
ACLAME |
A classification of genetic mobile elements |
|
CUTG |
Codon usage tabulated from GenBank |
|
Genetic Codes |
Genetic codes in various organisms and organelles |
|
Entrez Gene |
Gene-centered information at NCBI |
|
HERVd |
Human endogenous retrovirus database |
|
Hoppsigen |
Human and mouse homologous processed pseudogenes |
|
Imprinted Gene Catalogue |
Imprinted genes and parent-of-origin effects in animals |
|
Islander |
Pathogenicity islands and prophages in bacterial genomes |
|
MICdb |
Prokaryotic microsatellites |
|
NPRD |
Nucleosome positioning region database |
|
STRBase |
Short tandem DNA repeats database |
|
TIGR Gene Indices |
Organism-specific databases of EST and gene sequences |
|
Transterm |
Codon usage, start and stop signals |
|
UniGene |
Non-redundant set of eukaryotic gene-oriented clusters |
|
UniVec |
Vector sequences, adapters, linkers and primers used in DNA cloning, can be used to check for vector contamination |
|
VectorDB |
Characterization and classification of nucleic acid vectors |
|
Xpro |
Eukaryotic protein-encoding DNA sequences, both intron-containing and intron- less genes |
1.2.2. Gene structure, introns and exons, splice sites
Database name |
Full name and/or description |
URL |
ASAP |
Alternative spliced isoforms |
|
ASD |
Alternative splicing database at EBI, includes three databases AltSplice, AltExtron and AEdb |
|
ASDB |
Alternative splicing database: protein products and expression patterns of alternatively spliced genes |
|
ASHESdb |
Alternatively spliced human genes by exon skipping database |
|
EASED |
Extended alternatively spliced EST database |
|
ECgene |
Genome annotation for alternative splicing |
|
EDAS |
EST-derived alternative splicing database |
|
ExInt |
Exon�intron structure of eukaryotic genes |
|
HS3D |
Homo sapiens splice sites dataset |
|
Intronerator |
Alternative splicing in C.elegans and C.briggsae |
|
SpliceDB |
Canonical and non-canonical mammalian splice sites |
|
SpliceInfo |
Modes of alternative splicing in human genome |
|
SpliceNest |
A tool for visualizing splicing of genes from EST data |
1.2.3. Transcriptional regulator sites and transcription factors
Database name |
Full name and/or description |
URL |
ACTIVITY |
Functional DNA/RNA site activity |
|
DBTBS |
Bacillus subtilis promoters and transcription factors |
|
DoOP |
Database of orthologous promoters: chordates and plants |
|
DPInteract |
Binding sites for E.coli DNA-binding proteins |
|
EPD |
Eukaryotic promoter database |
|
HemoPDB |
Hematopoietic promoter database: transcriptional regulation in hematopoiesis |
|
JASPAR |
PSSMs for transcription factor DNA-binding sites |
|
MAPPER |
Putative transcription factor binding sites in various genomes |
|
PLACE |
Plant cis-acting regulatory DNA elements |
|
PlantCARE |
Plant promoters and cis -acting regulatory elements |
|
PlantProm |
Plant promoter sequences for RNA polymerase II |
|
PRODORIC |
Prokaryotic database of gene regulation networks |
|
PromEC |
E . coli promoters with experimentally identified transcriptional start sites |
|
SELEX_DB |
DNA and RNA binding sites for various proteins, found by systematic evolution of ligands by exponential enrichment |
|
TESS |
Transcription element search system |
|
TRACTOR db |
Transcription factors in gamma-proteobacteria database |
|
TRANSCompel |
Composite regulatory elements affecting gene transcription in eukaryotes |
|
TRANSFAC |
Transcription factors and binding sites |
|
TRED |
Transcriptional regulatory element database |
|
TRRD |
Transcription regulatory regions of eukaryotic genes |
2. RNA sequence databases
Database name |
Full name and/or description |
URL |
16S and 23S rRNA Mutation Database |
16S and 23S ribosomal RNA mutations |
|
5S rRNA Database |
5S rRNA sequences |
|
Aptamer database |
Small RNA/DNA molecules binding nucleic acids, proteins |
|
ARED |
AU-rich element-containing mRNA database |
|
Mobile group II introns |
A database of group II introns, self-splicing catalytic RNAs |
|
European rRNA database |
All complete or nearly complete rRNA sequences |
|
GtRDB |
Genomic tRNA database |
|
Guide RNA Database |
RNA editing in various kinetoplastid species |
|
HIV Sequence Database |
HIV RNA sequences |
|
HuSiDa |
Human siRNA database |
|
HyPaLib |
Hybrid pattern library : structural elements in classes of RNA |
|
IRESdb |
Internal ribosome entry site database |
|
microRNA Registry |
Database of microRNAs (small non-coding RNAs) |
|
NCIR |
Non-canonical interactions in RNA structures |
|
ncRNAs Database |
Non-coding RNAs with regulatory functions |
|
NONCODE |
A database of non-coding RNAs |
|
PLANTncRNAs |
Plant non-coding RNAs |
|
Plant snoRNA DB |
snoRNA genes in plant species |
|
PolyA_DB |
A database of mammalian mRNA polyadenylation |
|
PseudoBase |
Database of RNA pseudoknots |
|
Rfam |
Non-coding RNA families |
|
RISSC |
Ribosomal internal spacer sequence collection |
|
RNAdb |
Mammalian non-coding RNA database |
|
RNA Modification Database |
Naturally modified nucleosides in RNA |
|
RRNDB |
rRNA operon numbers in various prokaryotes |
|
siRNAdb |
siRNA database and search engine |
|
Small RNA Database |
Small RNAs from prokaryotes and eukaryotes |
|
SRPDB |
Signal recognition particle database |
|
SSU rRNA Modification Database |
Modified nucleosides in small subunit rRNA |
|
Subviral RNA Database |
Viroids and viroid-like RNAs |
|
tmRNA Website |
tmRNA sequences and alignments |
|
tmRDB |
tmRNA database |
|
tRNA sequences |
tRNA viewer and sequence editor |
|
UTRdb/UTRsite |
5' and 3' -UTRs of eukaryotic mRNAs |
3. Protein sequence databases
3.1. General sequence databases
Database name |
Full name and/or description |
URL |
EXProt |
Sequences of proteins with experimentally verified function |
|
NCBI Protein database |
All protein sequences: translated from GenBank and imported from other protein databases |
|
PA-GOSUB |
Protein sequences from model organisms, GO assignment and subcellular localization |
|
PIR-PSD |
Protein information resource protein sequence database, has been merged into the UniProt knowledgebase |
|
PIR-NREF |
PIR's non-redundant reference protein database |
|
PRF |
Protein research foundation database of peptides: sequences, literature and unnatural amino acids |
|
Swiss-Prot |
Now UniProt/Swiss-Prot: expertly curated protein sequence database, section of the UniProt knowledgebase |
|
TrEMBL |
Now UniProt/TrEMBL: computer-annotated translations of EMBL nucleotide sequence entries: section of the UniProt knowledgebase |
|
UniParc |
UniProt archive: a repository of all protein sequences, consisting only of unique identifiers and sequence |
|
UniProt |
Universal protein knowledgebase: merged data from Swiss-Prot, TrEMBL and PIR protein sequence databases |
|
UniRef |
UniProt non-redundant reference database: clustered sets of related sequences (including splice variants and isoforms) |
3.2. Protein properties
Database name |
Full name and/or description |
URL |
AAindex |
Physicochemical properties of amino acids |
|
ProNIT |
Thermodynamic data on protein�nucleic acid interactions |
|
ProTherm |
Thermodynamic data for wild-type and mutant proteins |
|
TECRdb |
Thermodynamics of enzyme-catalyzed reactions |
3.3. Protein localization and targeting
Database name |
Full name and/or description |
URL |
DBSubLoc |
Database of protein subcellular localization |
|
NESbase |
Nuclear export signals database |
|
NLSdb |
Nuclear localization signals |
|
NMPdb |
Nuclear matrix associated proteins database |
|
NOPdb |
Nucleolar proteome database |
|
PSORTdb |
Protein subcellular localization in bacteria |
|
SPD |
Secreted protein database |
|
THGS |
Transmembrane helices in genome sequences |
|
TMPDB |
Experimentally characterized transmembrane topologies |
3.4. Protein sequence motifs and active sites
Database name |
Full name and/or description |
URL |
ASC |
Active sequence collection: biologically active peptides |
|
Blocks |
Alignments of conserved regions in protein families |
|
CSA |
Catalytic site atlas : active sites and catalytic residues in enzymes of known 3D structure |
|
COMe |
Co-ordination of metals etc.: classification of bioinorganic proteins ( metalloproteins and some other complex proteins) |
|
CopS |
Comprehensive peptide signature database |
|
eBLOCKS |
Highly conserved protein sequence blocks |
|
eMOTIF |
Protein sequence motif determination and searches |
|
Metalloprotein Site Database |
Metal-binding sites in metalloproteins |
|
O-GlycBase |
O- and C-linked glycosylation sites in proteins |
|
PDBSite |
3D structure of protein functional sites |
|
Phospho.ELM |
S/T/Y protein phosphorylation sites (formerly PhosphoBase) |
|
PROMISE |
Prosthetic centers and metal ions in protein active sites |
|
PROSITE |
Biologically significant protein patterns and profiles |
|
ProTeus |
Signature sequences at the protein N- and C-termini |
3.5. Protein domain databases; protein classification
Database name |
Full name and/or description |
URL |
ADDA |
A database of protein domain classification |
|
CDD |
Conserved domain database, includes protein domains fromPfam, SMART, COG and KOG databases |
|
CluSTr |
Clusters of Swiss-Prot + TrEMBL proteins |
|
FunShift |
Functional divergence between the subfamilies of a protein domain family |
|
Hits |
A database of protein domains and motifs |
|
InterPro |
Integrated resource of protein families, domains and functional sites |
|
iProClass |
Integrated protein classification database |
|
PIRSF |
Family/superfamily classification of whole proteins |
|
PRINTS |
Hierarchical gene family fingerprints |
|
Pfam |
Protein families: multiple sequence alignments and profile hidden Markov models of protein domains |
|
PRECISE |
Predicted and consensus interaction sites in enzymes |
|
ProDom |
Protein domain families |
|
ProtoMap |
Hierarchical classification of Swiss-Prot proteins |
|
ProtoNet |
Hierarchical clustering of Swiss-Prot proteins |
|
S4 |
Structure-based sequence alignments of SCOP superfamilies |
|
SBASE |
Protein domain sequences and tools |
|
SMART |
Simple modular architecture research tool: signalling, extracellular and chromatin-associated protein domains |
|
SUPFAM |
Grouping of sequence families into superfamilies |
|
SYSTERS |
Systematic re-searching and clustering of proteins |
|
TIGRFAMs |
TIGR protein families adapted for functional annotation |
3.6. Databases of individual protein families
Database name |
Full name and/or description |
URL |
AARSDB |
Aminoacyl-tRNA synthetase database |
|
ASPD |
Artificial selected proteins/peptides database |
|
BacTregulators |
Transcriptional regulators of AraC and TetR families |
|
CSDBase |
Cold shock domain-containing proteins |
|
CuticleDB |
Structural proteins of Arthropod cuticle |
|
DCCP |
Database of copper-chelating proteins |
|
DExH/D Family Database |
DEAD-box, DEAH-box and DExH-box proteins |
|
Endogenous GPCR List |
G protein-coupled receptors; expression in cell lines |
|
ESTHER |
Esterases and other alpha/beta hydrolase enzymes |
|
EyeSite |
Families of proteins functioning in the eye |
|
GPCRDB |
G protein-coupled receptors database |
|
gpDB |
G-proteins and their interaction with GPCRs |
|
Histone Database |
Histone fold sequences and structures |
|
Homeobox Page |
Homeobox proteins, classification and evolution |
|
Hox-Pro |
Homeobox genes database |
|
Homeodomain Resource |
Homeodomain sequences, structures and related genetic and genomic information |
|
HORDE |
Human olfactory receptor data exploratorium |
|
InBase |
Inteins (protein splicing elements) database: properties, sequences, bibliography |
|
KinG�Kinases in Genomes |
S/T/Y-specific protein kinases encoded in complete genomes |
|
Knottins |
Database of knottins�small proteins with an unusual �disulfide through disulfide ' knot |
|
LGICdb |
Ligand-gated ion channel subunit sequences database |
|
Lipase Engineering Database Sequence |
structure and function of lipases and esterases |
|
LOX-DB |
Mammalian, invertebrate, plant and fungal lipoxygenases |
|
MEROPS |
Database of proteolytic enzymes (peptidases) |
|
NPD |
Nuclear protein database |
|
NucleaRDB |
Nuclear receptor superfamily |
|
Nuclear Receptor Resource |
Nuclear receptor superfamily |
|
NUREBASE |
Nuclear hormone receptors database |
|
Olfactory Receptor Database |
Sequences for olfactory receptor-like molecules |
|
ooTFD |
Object-oriented transcription factors database |
|
PKR |
Protein kinase resource: sequences, enzymology, genetics and molecular and structural properties |
|
PLPMDB |
Pyridoxal-5 0 -phosphate dependent enzymes mutations |
|
ProLysED |
A database of bacterial protease systems |
|
Prolysis |
Proteases and natural and synthetic protease inhibitors |
|
REBASE |
Restriction enzymes and associated methylases |
|
Ribonuclease P Database |
RNase P sequences, alignments and structures |
|
RPG |
Ribosomal protein gene database |
|
RTKdb |
Receptor tyrosine kinase sequences |
|
S/MARt dB |
Nuclear scaffold/matrix attached regions |
|
Scorpion |
Database of scorpion toxins |
|
SDAP |
Structural database of allergenic proteins and food allergens |
|
SENTRA |
Sensory signal transduction proteins |
|
SEVENS |
7-transmembrane helix receptors (G-protein-coupled) |
|
SRPDB |
Proteins of the signal recognition particles |
|
TrSDB |
Transcription factor database |
|
VKCDB |
Voltage-gated potassium channel database |
|
Wnt Database |
Wnt proteins and phenotypes |
4. Structure Databases
4.1. Small molecules
Database name |
Full name and/or description |
URL |
ChEBI |
Chemical entities of biological interest |
|
CSD |
Cambridge structural database: crystal structure information for organic and metal-organic compounds |
|
HIC-Up |
Hetero-compound Information Centre� Uppsala |
|
AANT |
Amino acid�nucleotide interaction database |
|
Klotho |
Collection and categorization of biological compounds |
|
LIGAND |
Chemical compounds and reactions in biological pathways |
|
PDB-Ligand |
3D structures of small molecules bound to proteins and nucleic acids |
|
Ligand Depot |
Ligand Depot is a data warehouse which integrates databases, services,
tools and methods related to small molecules bound to macromolecules.
|
|
PubChem |
Structures and biological activities of small organic molecules |
4.2. Carbohydrates
Database name |
Full name and/or description |
URL |
CCSD |
Complex carbohydrate structure database (CarbBank) |
|
CSS |
Carbohydrate structure suite: carbohydrate 3D structures derived from the PDB |
|
Glycan |
Carbohydrate database, part of the KEGG system |
|
GlycoSuiteDB |
N- and O-linked glycan structures and biological sources |
|
Monosaccharide Browser |
Space-filling Fischer projections of monosaccharides |
|
SWEET-DB |
Annotated carbohydrate structure and substance information |
4.3. Nucleic acid structure
Database name |
Full name and/or description |
URL |
NDB |
Nucleic acid-containing structures |
|
NTDB |
Thermodynamic data for nucleic acids |
|
RNABase |
RNA-containing structures from PDB and NDB |
|
SCOR |
Structural classification of RNA: RNA motifs by structure, function and tertiary interactions |
4.4. Protein structure
Database name |
Full name and/or description |
URL |
ArchDB |
Automated classification of protein loop structures |
|
wwPDB |
Worldwide Protein Data Bank |
|
PDBj |
Protein Data Bank Japan-the archive for macromolecular structures. |
|
ASTRAL |
Sequences of domains of known structure, selected subsets and sequence� structure correspondences |
|
BAliBASE |
A database for comparison of multiple sequence alignments |
|
BioMagResBank |
NMR spectroscopic data for proteins and nucleic acids |
|
CADB |
Conformational angles in proteins database |
|
CATH |
Protein domain structures database |
|
CE 3D |
protein structure alignments |
|
CKAAPs DB |
Structurally similar proteins with dissimilar sequences |
|
Dali |
Protein fold classification using the Dali search engine |
|
Decoys �R' Us |
Computer-generated protein conformations |
|
DisProt |
Database of Protein Disorder: proteins that lack fixed 3D structure in their native states |
|
DomIns |
Domain insertions in known protein structures |
|
DSDBASE |
Native and modeled disulfide bonds in proteins |
|
DSMM |
Database of simulated molecular motions |
|
eF-site |
Electrostatic surface of Functional site: electrostatic potentials and hydrophobic properties of the active sites |
|
GenDiS |
Genomic distribution of protein structural superfamilies |
|
Gene3D |
Precalculated structural assignments for whole genomes |
|
GTD |
Genomic threading database: structural annotations of complete proteomes |
|
GTOP |
Protein fold predictions from genome sequences |
|
Het-PDB |
Navi Hetero-atoms in protein structures |
|
HOMSTRAD |
Homologous structure alignment database: curated structure-based alignments for protein families |
|
IMB Jena Image Library |
Visualization and analysis of 3D biopolymer structures |
|
IMGT/3Dstructure-DB |
Sequences and 3D structures of vertebrate immunoglobulins, T cell receptors and MHC proteins |
|
ISSD |
Integrated sequence�structure database |
|
LPFC |
Library of protein family core structures |
|
MMDB |
NCBI's database of 3D structures, part of NCBI Entrez |
|
E-MSD |
EBI's macromolecular structure database |
|
ModBase |
Annotated comparative protein structure models |
|
MolMovDB |
Database of macromolecular movements: descriptions of protein and macromolecular motions, including movies |
|
PALI |
Phylogeny and alignment of homologous protein structures |
|
PASS2 |
Structural motifs of protein superfamilies |
|
PepConfDB |
A database of peptide conformations |
|
PDB |
Protein structure databank: all publicly available 3D structures of proteins and nucleic acids |
|
PDB-REPRDB |
Representative protein chains, based on PDB entries |
|
PDBsum |
Summaries and analyses of PDB structures |
|
PDB_TM |
Transmembrane proteins with known 3D structure |
|
Protein Folding Database |
Experimental data on protein folding |
|
SCOP |
Structural classification of proteins |
|
Sloop |
Classification of protein loops |
|
Structure Superposition Database |
Pairwise superposition of TIM-barrel structures |
|
SWISS-MODEL Repository |
Database of annotated 3D protein structure models |
|
SUPERFAMILY |
Assignments of proteins to structural superfamilies |
|
SURFACE |
Surface residues and functions annotated, compared and evaluated: a database of protein surface patches |
|
TargetDB |
Target data from worldwide structural genomics projects |
|
3D-GENOMICS |
Structural annotations for complete proteomes |
|
TOPS |
Topology of protein structures database |
5. Genomics Databases (non-human)
5.1. Genome annotation terms, ontologies and nomenclature
Database name |
Full name and/or description |
URL |
Genew |
Human gene nomenclature: approved gene symbols |
|
GO |
Gene ontology consortium database |
|
GOA |
EBI's gene ontology annotation project |
|
IUBMB Nomenclature database |
Nomenclature of enzymes, membrane transporters, electron transport proteins and other proteins |
|
IUPAC Nomenclature database |
Nomenclature of biochemical and organic compounds approved by the IUBMB-IUPAC Joint Commission |
|
IUPHAR-RD |
The International Union of Pharmacology recommendations on receptor nomenclature and drug classification |
|
PANTHER |
Gene products organized by biological function |
|
UMLS |
Unified medical language system |
5.1.1. Taxonomy and Identification
Database name |
Full name and/or description |
URL |
ICB |
gyrB database for identification and classification of bacteria |
|
NCBI Taxonomy |
Names of all organisms represented in GenBank |
|
PANDIT |
Protein and associated nucleotide domains with inferred trees |
|
RIDOM |
rRNA-based differentiation of medical microorganisms |
|
RDP-II |
Ribosomal database project |
|
Tree of Life |
Information on phylogeny and biodiversity |
5.2. General genomics databases
Database name |
Full name and/or description |
URL |
COG |
Clusters of orthologous groups of proteins |
|
COGENT |
Complete genome tracking: predicted peptides from fully sequenced genomes |
|
CORG |
Comparative regulatory genomics: conserved non-coding sequence blocks |
|
DEG |
Database of essential genes from bacteria and yeast |
|
EBI Genomes |
EBI's collection of databases for the analysis of complete and unfinished viral , pro- and eukaryotic genomes |
|
EGO |
Eukaryotic gene orthologs: orthologous DNA sequences in the TIGR gene indices |
|
EMGlib |
Enhanced microbial genomes library: completely sequenced genomes of unicellular organisms |
|
Entrez Genomes |
NCBI's collection of databases for the analysis of complete and unfinished viral , pro- and eukaryotic genomes |
|
ERGO |
Light Integrated biochemical data on nine bacterial genomes: publicly available portion of the ERGO database |
|
FusionDB |
Database of bacterial and archaeal gene fusion events |
|
Genome Atlas |
DNA structural properties of sequenced genomes |
|
Genome Information Broker |
DDBJ's collection of databases for the analysis of complete and unfinished viral , pro- and eukaryotic genomes |
|
Genome Reviews |
Integrated view of complete genomes |
|
GOLD |
Genomes online database: a listing of completed and ongoing genome projects |
|
HGT-DB |
Putative horizontally transferred genes in prokaryotic genomes |
|
Integr8 |
Functional classification of proteins in whole genomes |
|
KEGG |
Kyoto encyclopedia of genes and genomes: integrated suite of databases on genes , proteins and metabolic pathways |
|
MBGD |
Microbial genome database for comparative analysis |
|
ORFanage |
Database of orphan ORFs (ORFs with no homologs) in complete microbial genomes |
|
PACRAT |
Archaeal and bacterial intergenic sequence features |
|
PartiGeneDB |
Assembled partial genomes for _ 250 eukaryotic organisms |
|
PEDANT |
Results of an automated analysis of genomic sequences |
|
TIGR Microbial Database |
Lists of completed and ongoing genome projects with links to complete genome sequences |
|
TIGR Comprehensive Microbial Resource |
Various data on complete microbial genomes: uniform annotation, properties of DNA and predicted proteins |
|
TransportDB |
Predicted membrane transporters in complete genomes, classified according to the TC classification system |
|
WIT3 |
What is there ? Metabolic reconstruction for completely sequenced microbial genomes |