Bioinformatics Glossary

Bioinformatics Glossary

Naked DNA

Pure, isolated DNA devoid of any proteins that may bind to it.

NCEs (New Chemical Entity)

Compounds identified as potential drugs that are sent from research and development into clinical trials to determine their suitability .

Nested PCR

The second round amplification of an already PCR-amplified sequence using a new pair of primers which are internal to the original primers. Typically done when a single PCR reaction generates insufficient amounts of product.

Neural net

A neural net is an interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal brain. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns. Neural nets are used in bioinformatics to map data and make predictions, such as taking a multiple alignment of a protein family as a training set in order to identify novel members of the family from their sequence data alone.

Nonsense mutation

A point mutation in which a codon specific for an amino-acid is converted into a nonsense codon.

Northern blotting

A technique to identify RNA molecules by hybridization that is analogous to Southern blotting (see Southern blotting).

Nuclease

Any enzyme that can cleave the phosphodiester bonds of nucleic acid backbones.

Nucleoside

A five-carbon sugar covalently attached to a nitrogen base.

Nucleotide

A nucleic acid unit composed of a five carbon sugar joined to a phosphate group and a nitrogen base.

Object-Relational Database

Object databases combine the elements of object orientation and object-oriented programming languages with database capabilities. They provide more than persistent storage of programming language objects. Object databases extend the functionality of object programming languages (e.g., C++, Smalltalk, or Java) to provide full-featured database programming capability. The result is a high level of congruence between the data model for the application and the data model of the database. Object-relational databases are used in Bioinformatics to map molecular biological objects (such as sequences, structures, maps and pathways) to their underlying representations (typically within the rows and columns of relational database tables.) This enables the user to deal with the biological objects in a more intuitive manner, as they would in the laboratory, without having to worry about the underlying data model of their representation.

Oligonucleotide

A short molecule consisting of several linked nucleotides (typically between 10 and 60) covalently attached by phosphodiester bonds.

Open reading frame (ORF)

Any stretch of DNA that potentially encodes a protein. Open reading frames start with a start codon, and end with a termination codon. No termination codons may be present internally. The identification of an ORF is the first indication that a segment of DNA may be part of a functional gene.

Operator

A segment of DNA that interacts with the products of regulatory genes and facilitates the transcription of one or more structural genes.

Operon

A unit of transcription consisting of one or more structural genes, an operator, and a promoter.

Ortholog

Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. (See also Paralogs.)

Overlapping clones

Collection of cloned sequences made by generating randomly overlapping DNA fragments with infrequently cutting restriction enzymes.

Palindrome

A region of DNA with a symmetrical arrangement of bases occuring about a single point such that the base sequences on either side of that point are identical (if the strands are both read in the same direction) e.g 5' GAATTC 3' whose complementary sequence is 3' CTTAAG 5'.

Pattern

Molecular biological patterns usually occur at the level of the characters making up the gene or protein sequence. A pattern language must be defined in order to apply different criteria to different positions of a sequence. In order to have position-specific comparison done by a computer, a pattern-matching algorithm must allow alternative residues at a given position, repetitions of a residue, exclusion of alternative residues, weighting, and ideally, combinatorial representation.

Pathways

Bioinformatics strives to define representations of key biological datatypes, algorithms and inference procedures, including sequences, structures, biological pathways and reactions. Representing and computing with biological pathways requires ontologies for representing pathway knowledge; User interfaces to these databases; Physico-chemical properties of enzymes and their substrates in pathways; And pathway analysis of whole genomes including identifying common patterns across species and species differences.

Paralog
Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.

Parameters

Parameters are user-selectable values, typically experimentally determined, that govern the boundaries of an algorithm or program. For instance, selection of the appropriate input parameters governs the success of a search algorithm. Some of the most common search parameters in bioinformatics tools include the stringency of an alignment search tool, and the weights (penalties) provided for mismatches and gaps.

Peptide

A short stretch of amino acids each covalently coupled by a peptide (amide) bond.

Peptide bond (amide bond)

A covalent bond formed between two amino acids when the amino group of one is linked to the carboxy group of another (resulting in the elimination of one water molecule).

Phage (Bacteriophage)

A virus that infects bacterial cells and serves as a useful vector for introducing genes into bacteria for a number of purposes.

Phage display

A technique in which phage are engineered to fuse a foreign peptide or protein with their capsid (surface) proteins and hence display it on their cell surfaces. The immobilized phage may then be used as a screen to see what ligands bind to the expressed fusion protein exhibited (displayed) on the phage surface.

Pharmacogenomics

The use of (DNA-based) genotyping in order to target pharmaceutical agents to specific patient populations. Genetic differences are known to affect responses to many types of drug therapy, and pharmacogenomics analysis serves to customize the use of pharmaceuticals for specific subgroups of patients.The rationale for this approach is that observed gene expression differences may correlate with, and explain, the differences in side effects and efficacy to drugs in humans.

Pharmacophore

The three dimensional spatial arrangment of atoms, substituents, functional groups, or chemical features that together are sufficient to describe the pharmacologically active components of a drug molecule or molecule series.

Phenotype

Any observable feature of an organism that is the result of one or more genes.

Phylum

The segmentation of the animal kingdom into about 30 major groups collectively known as phyla. The members of each phylum share the same basic structure and organization. For instance, fish, birds, and human beings belong to one phylum - the Chordata - because all have spinal cords.

Physical map

A physical map consists of a linearly ordered set of DNA fragments encompassing the genome or region of interest. Physical maps are of two types, macro-restriction maps and ordered clone maps. The former consists of an ordered set of large DNA fragments generated by using restriction enzymes whose recognition sequences are infrequently represented in the genome. An ordered clone map consists of an overlapping collection of cloned DNA fragments. The DNA may be cloned into any one of the available vector systems--YACs, cosmids, phage, or even plasmids. Major advantages of ordered clone
maps are that they are of high resolution and directly provide the clones for further study.

Plasmid

Any replicating DNA element that can exist in the cell independently of the chromosomes. Synthetic plasmids are used for DNA cloning. Most commonly found in bacterial cells.

Pleitropy

The multiple effects on an organism's phenotype due to a single gene or allele e.g the cytokines which can bind to multiple cellular receptors and effect growth and multiple immune pathways.

Point mutation

A mutation in which a single nucleotide in a DNA sequence is substituted by another nucleotide.

Poly(A) tail

The stretch of Adenine (A) residues at the 3' end of eukaryotic mRNA that is added to the pre-mRNA as it is processed, before its transport from the nucleus to the cytoplasm and subsequent translation at the ribosome.

Polyadenylation site

A site on the 3'-end of messenger RNA (mRNA) that signals the addition of a series of Adenines during the RNA processing step and before the mRNA migrates to the cytoplasm. These so-called poly(A) "tails" increase mRNA stability andallow one to isolate mRNA from cells by PCR-amplification using poly(T) primers.

Polygenic inheritance

Inheritance involving alleles at many genetic loci.

Polymerase chain reaction (PCR )

Technique used to amplify or generate large amounts of replica DNA of a segment of any DNA whose "flanking" sequences are known. Oligonucleotide primers which bind these flanking sequences are used by an enzyme (Taq polymerase) to copy the sequence in between the primers. Cycles of heat to break apart the DNA strands, cooling to allow the primers to bind, and heating again to allow the enzyme to copy the intervening sequence lead to a doubling of DNA at each cycle. The reactions are typically carried out on a regulated heating block and consist of 30-35 cycles of repeated amplification of all the DNA present. Single molecules of "target" DNA can be amplified to microgram amounts of DNA. The target DNA can be of any origin.

Polymorphism

(lit. many forms) The existence of a gene in a population in at least two different forms at a frequency far higher than that attributable to recurrent mutation alone. Variations in a population may be measured by determining the rate of mutation in polymorphic genes (see SNPs).

Polypeptide

A single chain of covalently attached amino acids joined by peptide bonds. Polypeptide chains usually fold into a compact, stable form (a domain) that is part (or all) of the final protein.

Positional cloning

Method used to define the location of a gene on a chromosome and use this information to identify and clone the gene. The location of the gene is determined by linkage analysis of DNA from a large family containing afflicted and normal members to identify linkages between the transmission of the disease gene and observable genetic markers. This information is then used to screen (by chromosomal jumping and walking) the location for putative genes. The disease gene must be compared between the afflicted and normal family members and be shown to be different in the two groups. The full sequencing of the gene will then provide information regarding the characteristics and function of the gene product, and a potential explanation for the cause of the disease.

Post-transcriptional modification

Alterations made to pre-mRNA before it leaves the nucleus and becomes mature mRNA.

Post-translational modification

Alterations made to a protein after its synthesis at the ribosome. These modifications, such as the addition of carbohydrate or fatty acid chains, may be critical to the function of the protein.

Primary sequence (protein)

The linear sequence of a polypeptide or protein.

Primary structure (protein)

see primary sequence.

Primer

A short oligonucleotide that provides a free 3' hydroxyl for DNA or RNA synthesis by the appropriate polymerase (DNA polymerase or RNA polymerase).

Probe

Any biochemical that is labelled or tagged in some way so that it can be used to identify or isolate a gene, RNA, or protein.

Profile

Sequence profiles are usually derived from multiple alignments of sequences with a known relationship, and consist of tables of position-specific scores and gap-penalties. Each position in the profile contains scores for all of the possible amino acids, as well as one penalty score for opening and one for continuing a gap at the specified position. Attempts have been made to further improve the sensitivity of the profile by refining the procedures to construct a profile starting from a given multiple alignment. Other representations for sequence domains or motifs do not necessarily require the presence of a correct and complete multiple alignment, such as hidden Markov models.

Prokaryote

An organism or cell that lacks a membrane-bounded nucleus. Bacteria and blue-green algae are the only surviving prokaryotes (cf. Eukaryote).

Promoter (site)

A promoter site is defined by its recognition by eukaryotic RNA polymerase II; its activity in a higher eukaryote; by experimentally evidence, or homology and sufficient similarity to an experimentally defined promoter; and by observed biological function.

Protein families

Sets of proteins that share a common evolutionary origin reflected by their relatedness in function which is usually reflected by similarities in sequence, or in primary, secondary or tertiary structure. Subsets of proteins with related structure and function.

Proteome

The entire protein complement of a given organism.

Proteomics

The study of the proteome. Typically, the cataloging of all the expressed proteins in a particular cell or tissue type, obtained by identifying the proteins from cell extracts using a combination of 2D gel electrophoresis and mass spectrometry. The large scale analysis of the protein composition and function. (cf genomics)

Purine

A nitrogen-containing compound with a double-ring structure. The parent compound of Adenine and Guanine.

Pyrimidine

A nitrogen-containing compound with a single six-membered ring structure. The parent compound of Thymidine and Cytosine.

Query (sequence)

A DNA, RNA of protein sequence used to search a sequence database in order to identify close or remote family members (homologs) of known function, or sequences with similar active sites or regions (analogs), from whom the function of the query may be deduced.

Rational drug design (Structure based drug design)

The development of drugs based on the 3-dimensional molecular structure of a particular target.

Reading frame

A sequence of codons beginning with an intiation codon and ending with a termination codon, typically of at least 150 bases (50 amino acids) coding for a polypeptide or protein chain (see ORF and URF).

Reagents

Sources of biological or chemical material that can be used as the starting blocks in laboratory experiments. Reagents can range from chemicals needed to perform a particular chemical reaction, constituents of a laboratory protocol, or clones to be used in a large-scale gene expression study.

Recessive

Any trait that is expressed phenotypically only when present on both alleles of a gene (cf dominant).

Recombinant DNA (rDNA)

DNA molecules resulting from the fusion of DNA from different sources. The technology employed for splicing DNA from different sources and for amplifying the resultant heterogenous DNA.

Recombination

A new combination of alleles resulting from the rearrangement occuring by crossing-over or by independent assortment (see crossing over).

Recursion

An algorithmic procedure whereby an algorithm calls on itself to perform a calculation until the result exceeds a threshold, in which case the algorithm exits. Recursion is a powerful procedure with which to process data and is computationally quite efficient.

Regulatory gene

A DNA sequence that functions to control the expression of other genes by producing a protein that modulates the synthesis of their products (typically by binding to the gene promoter). (cf. Structural gene).

Relational Database

A database that follows E. F. Codd's 11 rules, a series of mathematical and logical steps for the organization and systemization of data into a software system that allows easy retrieval, updating, and expansion. An RDBMS stores data in a database consisting of one or more tables of rows and columns. The rows correspond to a record (tuple); the columns correspond to attributes (fields) in the record. In an RDBMS, a view, defined as a subset of the database that is the result of the evaluation of a query, is a table. RDBMSs use Structured Query Language (SQL) for data definition, data management, and data access and retrieval. Relational and object-relational databases are used extensively in bioinformatics to store sequence and other biological data.

Relational Database Management Systems (RDBMS)

A software system that includes a database architecture, query language, and data loading and updating tools and other ancillary software that together allow the creation of a relational database application.

Repeats (repeat sequences)

Repeat sequences and approximate repeats occur throughout the DNA of higher organisms (mammals). For example, the Alu sequences of length about 300 characters, appear hundreds of thousands of times in Human DNA with about 87% homology to a consensus Alu string. Some short substrings such as TATA-boxes, poly-A and (TG)* also appear more often than by chance. Repeat sequences may also occur within genes, as mutations or alterations to those genes. Repetitive sequences, especially mobile elements, have many applications in genetic research. DNA transposons and retroposons are routinely used for insertional mutagenesis, gene mapping, gene tagging, and gene transfer in several model systems.

Repetitive elements

Repetitive elements provide important clues about chromosome dynamics, evolutionary forces, and mechanisms for exchange of genetic information between organisms The most ubiquitous class of repetitive elements in the DNA sequence in primate genomes is the Alu family of interspersed repeats which have arisen in the last 65 million years of evolution Alu repeats belong to a class of sequences defined as short interspersed elements (SINEs). Approximately 500,000 Alu SINEs exist within the human genome, representing about 5% of the genome by mass.

Replication

The synthesis of an informationally identical macromolecule (e.g. DNA) from a template molecule.

Repressor

The protein product of a regulatory gene that combines with a specific operator (regulatory DNA sequence) and hence blocks the transcription of genes in an operon.

Restriction enzyme (restriction endonuclease)

A type of enzyme that recognizes specific DNA sequences (usually palindromic sequences 4, 6, 8 or 16 base pairs in length) and produces cuts on both strands of DNA containing those sequences only. The "molecular scissors" of rDNA technology.

Restriction fragment length polymorphisms (RFLPs)

Variation within the DNA sequences of organisms of a given species that can be identified by fragmenting the sequences using restriction enzymes, since the variation lies within the restriction site. RFLPs can be used to measure the diversity of a gene in a population.

Restriction map

A physical map or depiction of a gene (or genome) derived by ordering overlapping restriction fragments produced by digestion of the DNA with a number of restriction enzymes.

Reverse Genetics

The use of protein information to elucidate the genetic sequence encoding that protein. Used to describe the process of gene isolation starting with a panel of afflicted patients (see positional cloning) .

Reverse transcriptase

A DNA polymerase that can synthesise a complementary DNA (cDNA) strand using RNA as a template - a so-called RNA-dependent DNA polymerase.

Reverse transcriptase-PCR (RT-PCR)

Procedure in which PCR amplification is carried out on DNA that is first generated by the conversion of mRNA to cDNA using reverse transcriptase.

Ribonucleic acid (RNA)

A category of nucleic acids in which the component sugar is ribose and consisting of the four nucleotides Thymidine, Uracil, Guanine, and Adenine. The three types of RNA are messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA).

Secondary structure (protein)

The organization of the peptide backbone of a protein that occurs as a result of hydrogen bonds e.g alpha helix, Beta pleated sheet.

Selectivity

Selectivity of bioinformatics similarity search algorithms is defined as the significance threshold for reporting database sequence matches. As an example, for BLAST searches, the parameter E is interpreted as the upper bound on the expected frequency of chance occurrence of a match within the context of the entire database search. E may be thought of as the number of matches one expects to observe by chance alone during the database search.

Sense strand

The strand of double-stranded DNA that acts as the template strand for RNA synthesis. Typically only one gene product is produced per gene, reading from the sense strand only. (Some viruses have open reading frames in both the sense and the antisense strands).

Sensitivity

Sensitivity of bioinformatics similarity search algorithms centers around two areas: First, how well can the method detect biologically meaningful relationships between two related sequences in the presence of mutations and sequencing errors; Secondly how does the heuristic nature of the algorithm affect the probability that a matching sequence will not be detected. At the user's discretion, the speed of most similarity search programs can be sacrificed in exchange for greater sensitivity - with an emphasis on detecting lower scoring matches.

Sequence Tagged Site (STS)

A unique sequence from a known chromosomal location that can be amplified by PCR. STSs act as physical markers for genomic mapping and cloning.

Sexual PCR (Molecular Diversity)

Sexual PCR is a form of PCR in which similar, but not identical, DNA sequences are reassembled to obtain novel juxtapositions, simulating the result of genetic recombination. The result is the creation of an array of related genes which may possess improved characteristics. By repeated rounds of recombination, selection and PCR-based amplification vastly improved gene-products, such as enzymes with greater activity, may be generated and selected.

Shotgun cloning

The cloning of an entire gene segment or genome by generating a random set of fragments using restriction endonucleases to create a gene library that can be subsequently mapped and sequenced to reconstruct the entire genome.

Similarity (homology) search

Given a newly sequenced gene, there are two main approaches to the prediction of structure and function from the amino acid sequence. Homology methods are the most powerful and are based on the detection of significant extended sequence similarity to a protein of known structure, or of a sequence pattern characteristic of a protein family. Statistical methods are less successful but more general and are based on the derivation of structural preference values for single residues, pairs of residues, short oligopeptides or short sequence patterns. The transfer of structure/function information to a potentially homologous protein is straightforward when the sequence similarity is high and extended in length, but the assessment of the structural significance of sequence similarity can be difficult when sequence similarity is weak or restricted to a short region.

Signal sequence (leader sequence)

A short sequence added to the amino-terminal end of a polypeptide chain that forms an amphipathic helix allowing the nascent polypeptide to migrate through membranes such as the endoplasmic reticulum or the cell membrane. It is cleaved from the polypeptide after the protein has crossed the membrane.

Single nucleotide polymorphisms (SNPs)

Variations of single base pairs scattered throughout the human genome that serve as measures of the genetic diversity in humans. About 1 million SNPs are estimated to be present in the human genome, and SNPs are useful markers for gene mapping studies.

Single-pass sequencing

Rapid sequencing of large segments of the genome of an organism by isolating as many expressed (cDNA) sequences as possible and performing single sequencer runs on their 5' or 3' ends. Single-pass sequencing typically results in individual, error-prone sequencing reads of 400-700 bases, depending on the type of sequencer used. However, if many of these are generated from numerous clones from different tissues, they may be overlapped and assembled to remove the errors and generate a contiguous sequence for the entire expressed gene.

Site

Sites in sequences can be located either in DNA (e.g. binding sites, cleavage sites) or in proteins. In order to identify a site in DNA, ambiguity symbols are used to allow several different symbols at one position. Proteins, however, need a different mechanism (see Pattern). Restriction enzyme cleavage sites, for instance, have the following properties: limited length (typically, less than 20 base pairs); definition of the cleavage site and its appearance (3', 5' overhang or blunt); definition of the binding site.

Southern blotting

A procedure for the identification of DNA by transmitting a fragment isolated on an agarose gel to a nitrocellulose filter where it can be hybridized with a complementary "probe" sequence.

Splice site

The sequence found at the 5' and 3' region of exon/intron boundaries, usually defined by a consensus sequence:

Intron

5' CAGGTAAGT---------TNCAGG 3'

A G C T

N represents any nucleotide; the bottom line represents alternative nucleotides at the indicated positions.

Splice form

By using alternative splicing, a single message precursor from DNA can generate an entire family of mRNAs and proteins. This can be utilized to create specificity in cell-cell or cell-ligand interactions. A cell may produce a given protein, but it will be a different splice-form of the protein than that produced by an adjacent cell. In this manner, the two cells have the potential to interact differently with other cells or molecules. Two places where this has been extremely important is in the production of cell-surface specificity proteins in the immune and nervous systems.

Splicing

The joining together of separate DNA or RNA component parts. For example, RNA splicing in eukaryotes involves the removal of introns and the stitching together of the exons from the pre-mRNA transcript before maturation.

Solvent accessibility

The surface area (typically measured in square angstroms) of a biological molecule, usually a protein, that is exposed to solvent in its native, folded form. Determining the solvent accessibility of a protein helps define which amino acids in its molecular sequence are on the exterior of the molecule, and thus available to participate in interactions with other molecules.

Structural gene

Gene which encodes a structural protein (cf. Regulatory gene).

Structure prediction

Algorithms that predict the secondary, tertiary and sometimes even quarternary structure of proteins from their sequences. Determining protein structure from sequence has been dubbed "the second half of the Genetic Code" since it is the folded tertiary structure of a protein that governs how it functions as a gene product. As yet most structure prediction methods are only partially successful, and typically work best for certain well-defined classes of proteins.

Substitution matrix

A model of protein evolution at the sequence level resulting in the development of a set of widely used substitution matrices. These are frequently called Dayhoff, MDM (Mutation Data Matrix), BLOSUM or PAM (Percent Accepted Mutation) matrices. They are derived from global alignments of closely related sequences. Matrices for greater evolutionary distances are extrapolated from those for lesser ones.

Subtraction library

A cDNA library that only contains cDNAs uniquely expressed in a given cell or tissue. e.g T cells and B cells will express many common RNAs, as well as a very small percentage which will be unique for T cells and B cells respectively. To make a T cell subtraction library, the cDNA from a T cell library is hybridized with a vast excess of B cell RNA. The commonly expressed genes will result in RNA-cDNA hybrids which can be removed (or subtracted) to leave only T cell specific cDNAs.

Tentative Consensus (TC)

The identification of a sequence from an EST cluster that represents part or all of a complete gene. TCs are usually determined by clustering ESTs allowing for sequencing errors, artefacts such as chimeric clones, and naturally occuring biological phenomena such as alternative splicing. Creation of a cluster allows one to generate a consensus sequence and then identify a long open reading frame which would suggest the possibility of that consensus representing a bona fide gene.

Tentative Human Consensus sequences (THCs)

A consensus sequence generated from human EST fragments. THCs may be validated by comparison against databases of known human gene sequences, human genomic sequences, or by identification of the ORFs or other sequence features contained within the consensus as belonging to a known human gene product.

Tertiary structure

Folding of a protein chain via interactions of its sideschain molecules including formation of disulphide bonds between cysteine residues.

Thymine

A pyrimidine base found in DNA but not in RNA.

Tissue

Section of an organ that consists of a largely homogenous population of cell types. Since many organs are multifunctional, they have developed highly specialized cell types to perform different functions. Identifying the section of an organ that is homogenous for a particular cell type ensures that the gene expression profiles extracted from those cells will accurately resemble the class of cells that make up the tissue.

Transcript

The single-stranded mRNA chain that is assembled from a gene template.

Transcription

The assembly of complementary single-stranded RNA on a DNA template.

Transcription factors

A group of regulatory proteins that are required for transcription in eukaryotes. Transcription factors bind to the promoter region of a gene and facilitate transcription by RNA polymerase.

Transfer RNA (tRNA)

A small RNA molecule that recognizes a specific amino acid, transports it to a specific codon in the mRNA, and positions it properly in the nascent polypeptide chain.

Transformation

A genetic alteration to a cell as a result of the incorporation of DNA from a genetically diferent cell or virus; can also refer to the introduction of DNA into bacterial cells for genetic manipulation.

Transgene

A foreign gene that is introduced into a cell or whole organism (eg.transgenic mice) for therapeutic or experimental purposes.

Translation

The process of converting RNA to protein by the assembly of a polypeptide chain from an mRNA molecule at the ribosome.

Transmembrane region

The region of a transmembrane protein that actually spans the membrane. Transmembrane regions are usually hydrophobic in order to be thermodynamically compatible with the lipid bilayer portion of the membrane. They may consist of either alpha-helical or beta-strand secondary structure elements, but in either case the external residues (the ones facing the membrane) are invariably hydrophobic while the internal residues may be hydrophilic (as in the case of a pore or channel) or polar. One common transmembrane structural domain is the seven-helix bundle seen in numerous channel proteins.

Tissue

Tentative Consensus (TC)

Tentative Human Consensus sequences (THCs)

Tertiary structure

Folding of a protein chain via interactions of its sideschain molecules including formation of disulphide bonds between cysteine residues.

Thymine

A pyrimidine base found in DNA but not in RNA.

Tissue

Transcript

The single-stranded mRNA chain that is assembled from a gene template.

Transcription

The assembly of complementary single-stranded RNA on a DNA template.

Transcription factors

A group of regulatory proteins that are required for transcription in eukaryotes. Transcription factors bind to the promoter region of a gene and facilitate transcription by RNA polymerase.

Transfer RNA (tRNA)

A small RNA molecule that recognizes a specific amino acid, transports it to a specific codon in the mRNA, and positions it properly in the nascent polypeptide chain.

Transformation

Transgene

A foreign gene that is introduced into a cell or whole organism (eg.transgenic mice) for therapeutic or experimental purposes.

Translation

The process of converting RNA to protein by the assembly of a polypeptide chain from an mRNA molecule at the ribosome.

Transmembrane region

Tissue

Unidentified reading frame (URF)

An open reading frame encoding a protein of undefined function .

Uracil

Nitrogenous pyrimidine base found in RNA but not DNA.

Variable numbers of tandem repeats (VNTRs)

DNA sequence blocks of 2-60 base pairs which are repeated from two to more than 20 times in different individuals. This polymorphism makes VNTRs very useful DNA markers used in genomic mapping, linkage analysis and also DNA fingerprinting.

Variation (genetic)

Variation in genetic sequences and the detection of DNA sequence variants genome-wide allow studies relating the distribution of sequence variation to a population history. This in turn allows one to determine the density of SNPS or other markers needed for gene mapping studies. Quantitation of these variations together with analytical tools for studying sequence variation also relate genetic variations to phenotype.

Vector

Any agent that transfers material (typically DNA) from one host to another. Typically DNA vectors are autonomous DNA elements (such as plasmids) that can be manipulated and integrated into a host's DNA or recombinant viruses.

Virtual libraries

The creation and storage of vast collections of molecular structures in an electronic database. These databases may be queried for subsets that exhibit specific physicochemical features, or may be "virtually screened" for their ability to bind a drug target. This process may be performed prior to the synthesis and testing of the molecules themselves.

Visualization

Visualization is the process of representing abstract scientific data as images that can aid in understanding the meaning of the data.

Weight matrix

The density of binding sites in a gene or sequence can be used to derive a ratio of density for each element in a pattern of interest. The combined individual density ratios of all elements are then collectively used to build a scoring profile known as a weight matrix. This profile can be used to test the prediction of the identification of the selected pattern and the ability of the algorithm to discriminate them from non-pattern sequences.

Western blot

Technique in which specific antibodies are used to identify their antigens from a mixture of proteins. Typically, these proteins mixtures are first separated by electrophoresis and then transfered onto nylon sheets by electrotransfer. Radiolabeled or enzyme-linked antibodies are incubated with the sheets and unbound antibodies washed away allowing the position of the bound antibody to be revealed by autoradiography or color which is formed upon addition of a substrate.

Wild type
Form of a gene or allele that is considered the "standard" or most common.

X chromosome

In mammals, the sex chromosome that is found in two copies in the homogametic sex (female in humans) and one copy in the hererogametic sex (male in humans).

Yeast 2-hybrid system

A yeast-based method used to simultaneously identify, and clone the gene for, proteins interacting with a known protein. The basis of this method is a "transcriptional reporter assay" (see definition) in which reporter gene expression is dependent on two domains. The first domain is linked to the known protein. The second domain is genetically linked to a library. If the library is screened against the known protein the two domains will interact only if a protein from the library binds the known protein, resulting in transcription activation of the reporter gene, and a blue color. The "blue yeast clone" will contain the gene encoding the newly identified protein.

Z-DNA

A conformation of DNA existing as a left-handed double helix (the phosphate-sugar backbone forms a left-handed zig-zag course), which may play a role in gene regulation.

Zinc fingers

A protein motif formed by the interaction of repeated cysteine and histidine residues with a zinc ion. The spacing of the repeats results in finger like arrangements of the protein loops formed from the interaction which interact with DNA. These motifs are typically found in transcription factors.