Sequence
Used to describe the pattern of amino acids or nucleotides making proteins and nucleic acids.
Related Links
This is a search for sequence in our database
- 2Can ebi tutorials: Learn how to use the tools at the EBI to find out more about your nucleotide or protein sequences.
You will be guided through a series of exercises using sample fragments of sequence. To gain more information about these sequences, you will use a variety of tools to compare the sequences to databases and analyse them.
- A Gene Map of the Human Genome: The Human Genome Project is expected to produce a sequence of DNA representing the
functional blueprint and evolutionary history of the human species. However, only about 3% of
this sequence is thought to specify the portions of our 50,000 to 100,000 genes that encode
proteins. '
- Algorithms for Phylogenetic Reconstructions: These lecture notes are the result of a series of lectures given by Martin Vingron (MPI/FU Berlin)
and Jens Stoye (Bielefeld University) and a practical course by Hannes Luz (MPI, Berlin).
As the title implies, the focus of these notes is on ideas and algorithmic methods that are applied
when evolutionary relationships are to be reconstructed from molecular sequences or other species related
data. Biological aspects like the molecular basis of evolution or methods of data acquisition
and data preparation are not included. (PDF)
- Align against Greengenes (16S): Use this tool for aligning your set of 16S rRNA gene sequences using NAST or for finding near-neighbors or both. Each query sequence in your uploaded file will be searched for 16S rRNA gene sequences and aligned according to a Core Set of alignment templates. You can even upload a whole fasta genome and NAST will find, extract, and align the 16S rRNA genes for you. Additionally, Simrank (an N-mer comparison tool) will be used to find nearest-neighbors (non-chimeric) as well as nearest-isolates for each of your sequences from the entire Greengenes database...
- Arabidopsis Reactome: Arabidopsis Reactome is a curated database of biological processes in Arabidopsis. It covers biological pathways ranging from the basic processes of metabolism to high-level processes such as hormonal signalling. While Arabidopsis Reactome is targeted at Arabidopsis pathways, it also includes many individual biochemical reactions from other plant species. This makes the database relevant to the large number of researchers who work on model organisms. All the information in Arabidopsis Reactome is backed up by its provenance: either a literature citation or an electronic inference based on sequence similarity. Our ontology ensures that the various events are linked in an appropriate spatial and temporal context.
- Artemis Home Page: Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation. Artemis is written in Java, and is available for UNIX, GNU/Linux, BSD, Macintosh and MS Windows systems. It can read complete EMBL and GENBANK database entries or sequence in FASTA or raw format. Extra sequence features can be in EMBL, GENBANK or GFF format
- Artemis Manual: This document describes release 8 of Artemis a DNA sequence viewer and sequence annotation tool.
- Article on Whets: Wheat Estimated Transcript Server (WhETS): a tool to provide best estimate of hexaploid wheat transcript sequence
- Assembler Tool: A program to assemble sequences belonging to one locus from an isolate or strain, and compare them to reference sequence(s)
- BALIBASE - A benchmark alignment database: BAliBASE 2.0 includes three new alignment references sets (references 6-8) containing 26 protein families with 12 distinct repeat types, 9 transmembrane families and 5 families with inverted domains, representing more than 1100 sequences. As in references 1-5, core blocks are defined that only include the repeated/inverted domains and the transmembrane helices.
- BCM : Multisequence alignment: Multisequence alignment Search
- BCM Search Launcher :: The BCM Search Launcher organizes and simplifies molecular biology-related search and analysis
services available on the WWW. It provides a single point-of-entry for related services, for
example, a single page for launching nucleic acid sequence searches using standard parameters.
See last newsletter
- BEAUty and BEAST: BEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It is not intended solely as a method of reconstructing phylogenies but also as a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. We include a simple to use user-interface program for setting up standard analyses and a suit of programs for analysing the results.
- BioEdit: BioEdit is a biological sequence alignment editor written for Windows 95/98/NT. A rich, intuitive multiple document interface with many convenient features makes alignment, manipulation and viewing of sequences relatively quick and easy on your desktop computer.
- BLAST: BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs designed to
explore all of the available sequence databases regardless of whether the query is protein or DNA.
The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to
distant sequence relationships. The scores assigned in a BLAST search have a well-defined
statistical interpretation, making real matches easier to distinguish from random background hits.
BLAST uses a heuristic algorithm which seeks local as opposed to global alignments and is
therefore able to detect relationships among sequences which share only isolated regions of
similarity (Altschul et al., 1990).
- Blast Tutorial: This BLAST tutorial is designed to help both the novice and experienced BLAST user to set up and perform a BLAST search, decipher the output and analyze the results. The tutorial illustrates the potential for BLAST and PSI-BLAST searches to identify even weak (subtle) homologies to annotated entries in the database. It demonstrates that BLAST and PSI-BLAST (see separate PSI-BLAST tutorial) are important tools for predicting both biochemical activities and function from sequence relationships. In addition to the tutorial, the BLAST guide may be useful in becoming acquainted with the ins and outs of BLAST searching.
- BOXSHADE: Pretty Printing and Shading of Multiple-Sequence Alignments at ISREC, Switzerland This server
takes a multiple-alignment file in either GCG's MSF-format or Clustals ALN-format. Output can
be created in the following formats:
Postscript/EPS (using shaded background)
RTF old (using colors)
RTF new (using shaded background)
XFIG-files (using shaded background)
ASCII (showing similarities)
ASCII (showing differences)
HPGL (using colors)
PICT (for later editing on MACs and PCs)
- CATH: Protein Structure Classification: CATH is a novel hierarchical classification of protein domain structures, which clusters proteins at four major levels, class(C),
architecture(A), topology(T) and homologous superfamily (H). Class, derived from secondary structure content, is assigned
for more than 90% of protein structures automatically. Architecture, which describes the gross orientation of secondary structures,
independent of connectivities, is currently assigned manually. The topology level clusters structures according to their toplogical
connections and numbers of secondary structures. The homologous superfamilies cluster proteins with highly similar structures and
functions. The assignments of structures to toplogy families and homologous superfamilies are made by sequence and structure
comparisons.
- Central dogma of molecular biology: The dogma is a framework for understanding the transfer of sequenceSequence
In mathematics, a sequence is an ordered list of objects . Like a Set , it contains Element , and the number of terms is called the length of the sequence....
informationInformation
Information as a Conveyed concept has a diversity of meanings, from everyday usage to technical settings. Generally speaking, the concept of information is closely related to notions of constraint, communication, control system, data, form, instruction, knowledge, Meaning , stimulation, pattern, perception, and knowledge representation....
between sequential information-carrying biopolymerBiopolymer
Biopolymers are a class of polymers produced by living organisms.Starch, proteins and peptides,and DNA and RNA are all examples of biopolymers, in which the monomeric units, respectively, are sugars, amino acids, and nucleotides....
s, in the most common or general case, in living organismOrganism
In biology, an organism is any life thing . In at least some form, all organisms are capable of response to stimulus , reproduction, growth and developmental biology, and maintenance of homeostasis as a stable whole....
s.
- Chromas: Shareware ($39) Opens chromatogram files from Applied Biosystems DNA sequencers, Prints chromatogram with options to zoom or fit to one page.
Exports sequences in plaint text, formatted with base numbering, FASTA, EMBL, GenBank or GCG formats.
Copy the sequence to the clipboard in plain text or FASTA format for pasting into other applications.
Export sequences from batches of chromatogram files, with automatic removal of vector sequence.
Reverse & complement the sequence and chromatogram.
Search for sequences by exact matching or optimal alignment.
Display translations in 3 frames along with the sequence.
Copy an image of a chromatogram section for pasting into documents or presentations
- Chromas Home Page: Opens chromatogram files from Applied Biosystems and Amersham MegaBace DNA sequencers.
and a few more formats
- ChromDB: The Plant Chromatin Database: The mission of ChromDB is to bring together information on homologs of chromatin-associated proteins encoded by plant genomes, organize this information logically and in a comparative manner across species (with reference to other eukaryotes to the extent possible), and make it readily accessible to any members of the research and teaching communities who might be interested in the role played by chromatin proteins in the control of gene expression and genome organization. Our first priority is to provide (a) complete sets of genes encoding chromatin-associated proteins (CAPs) for those plant species with complete or near-complete genome sequences (currently Arabidopsis and rice, with poplar to be displayed in the future), (b) the best available splicing models and predicted protein sequences for these genes, and (c) a molecular phylogenetic context for understanding the evolutionary diversification of these protein families during plant evolution with reference to animal and fungal homologs. Our next priorities will be to identify and present CAP genes in all available EST and partial genomic sequences of other plant species, as well as to provide information regarding their biochemical, developmental, physiological, and cellular functions through a community annotation mechanism.
- CLC Bio: Sequence Viewer: CLC Sequence Viewer creates a software environment enabling users to make a large number of bioinformatics analyses, combined with smooth data management, and excellent graphical viewing and output options.
- Clustal W: Clustal W is a general purpose multiple sequence alignment program for DNA or proteins.It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.
- Cocoa Genome Database: The release of the cacao genome sequence will provide researchers with access to the latest genomic tools, enabling more efficient research and accelerating the breeding process, thereby expediting the release of superior cacao cultivars.
- CodonCode: CodonCode Aligner is an easy-to-use program for sequence assembly, contig editing, and mutation detection.
- Computational Genomic Group (CGG): Computational approaches for analysis of nucleotide and protein sequences and structures as well as understanding structural and functional organization of genetic signals encoded in genome sequences.
- Compute pI/Mw: is a tool which allows the computation of the theoretical pI (isoelectric point) and Mw (molecular
weight) for a list of SWISS-PROT and/or TREMBL entries or for a user entered sequence
[reference].
- ContraFold: CONTRAfold is a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring. By incorporating most of the features found in typical thermodynamic models, CONTRAfold achieves the highest single sequence prediction accuracies to date, outperforming currently available probabilistic and physics-based techniques. Our result thus closes the gap between probabilistic and thermodynamic models, demonstrating that statistical learning procedures provide an effective alternative to empirical measurement of thermodynamic parameters for RNA secondary structure prediction.
- deCODE: what is your name in DNA: In the seconds it takes after you press the above deCODE button, I will ...
-Change the letters of your name to the closest DNA bases...
-Search every one of 168,297 protein sequences from 8826 animals, plants and microorganisms...
-And return to you, the protein that contains the closest match to... the letters of your name !!
- Design PCR Primers: tools: A list of free web applications to design PCR primers: including alignment primers, primers for RAPDS and from protein sequences. This team has done a great job at reviewing what is there.
- DNA Baser: DNA Baser Assembler is unique and revolutionary bioinformatics software for manual and automatic DNA sequence assembly, DNA sequence analysis, automatic sample processing, contig editing, metadata integration, file format conversion and mutation detection.
- DNA2.0 Bioinformatics toolbox: This collection of javascript bioinformatics software is a convenient, user friendly and free resource for anyone interested in generating, formatting and analyzing DNA and protein sequences. The goal here is to support your gene design efforts as much as possible so that you can focus on doing research instead of spending time on decoding cryptic software manuals.
- EMBOSS on Codequest (BAB web server): EMBOSS is The European Molecular Biology Open Software Suite. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole. EMBOSS breaks the historical trend towards commercial software packages.
- Eukaryotic Genomes at JGI: From this site you can get details about our current and upcoming projects, or go directly to the individual genome sites.
All of the individual sites include direct access to download sequence files, BLAST, search, view and navigate the genomic annotations.
- Expressed Sequence Tags (dEST): dbEST (Nature Genetics 4:332-3;1993) is a division of GenBank that contains sequence data
and other information on "single-pass" cDNA sequences, or Expressed Sequence Tags, from a
number of organisms. A brief account of the history and current status of human ESTs in GenBank
is available (Trends Biochem. Sci. 20:295-6;1995). Also, consult the special "Genome Directory"
issue of Nature (vol. 377, issue 6547S, 28 September 1995).
- Expression Profiler: Expression Profiler is a set of tools for the clustering, analysis and visualization of gene expression and sequence data.
- fastDNAml construction of phylogenetic trees of DNA Sequences: References:
Olsen, G. J., Matsuda, H., Hagstrom, R., and Overbeek, R. 1994. fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10: 41-48.
Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17: 368-376.
- FastPCR Online Manual: FastPCR is free PCR primer and probe design software. It is efficient for standard, long, real-time PCRs, for inverse PCR, direct amino acid sequence degenerate PCR, multiplex PCR, LUX primers design for quantitative PCR, automatically SSR loci detection and direct PCR primers design, and in silico PCR. It can also perform sequence alignments, clustering and any kind repeat sequence searching.
- GenBank: GenBank is the NIH genetic sequence database, an annotated collection of all publicly available
DNA sequences ( Nucleic Acids Research 1998 Jan 1;26(1):1-7). There are approximately
1,622,000,000 bases in 2,356,000 sequence records as of June 1998. As an example, you may view
the record for the neurofibromatosis gene.
- Gene Codes Corporation: Makers of the Sequencher sequence analysis program for the Mac
- Gene Designer 2.0 : FREE bioinformatics software application. The dynamic drag-and-drop sequence functionality to create genes with Gene Designer has been awarded US Patent #7,805,252. This unique interface technology created by DNA2.0 scientists enables users to quickly represent nucleic acid sequences and to move them around with ease.
- Genedoc: A Full Featured Multiple Sequence Alignment Editor, Analyser and Shading Utility for Windows
- Geneious website: Geneious combines all the leading DNA and protein sequence analysis tools into one revolutionary software solution! Its ease of use makes bioinformatics accessible to any biologist. It runs on all major operating systems and is very affordable.
- Genome Database: Established at Johns Hopkins University in Baltimore, Maryland, USA in 1990, the Genome
Database (GDB) is the official central repository for genomic mapping data resulting from the
Human Genome Initiative. The Human Genome Initiative is a worldwide research effort to analyze
the structure of human DNA and determine the location and sequence of the estimated 100,000
human genes. In support of this project, GDB stores and curates data generated worldwide by
those researchers engaged in the mapping effort of the Human Genome Project (HGP).
- Genomics Jump station: The Genomics Jump Station: The ultimate Web page for information and links on all aspects of Genomics Research. Institutes and groups involved in functional genomics, companies involved in functional genomics, publications on functional genomics, analysis of protein sequences, analysis of DNA sequences.
- HHMI's BioInteractive: Virtual Labs: Use cutting-edge DNA
sequencing techniques and
powerful database searches to
identify an unknown bacterial
species! In this lab, you will learn about
polymerase chain reaction (PCR), automatic
DNA sequencers, and the science of using a
database of known DNA sequences to identify
an organism.
- Human Genome Project: The Human Genome Project (HGP) was one of the great feats of exploration in history - an inward voyage of discovery rather than an outward exploration of the planet or the cosmos; an international research effort to sequence and map all of the genes - together known as the genome - of members of our species, Homo sapiens. Completed in April 2003, the HGP gave us the ability to, for the first time, to read nature\'s complete genetic blueprint for building a human being.
- IMGT - International ImMunoGeneTics Database: IMGT, the international ImMunoGeneTics database, is a high-quality integrated database specialising in Immunoglobulins (Ig), T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species. IMGT includes two databases : LIGM-DB, a comprehensive database of Ig and TcR from human and other vertebrates and MHC/HLA-DB (in development). A tool, IMGT/DNAPLOT, will allow Ig, TcR and MHC sequence analysis.
- Improbizer: Improbizer searches for motifs in DNA or RNA sequences that occur with improbable frequency (to be just chance) using a variation of the expectation maximization (EM) algorithm.
- Integrative Genomic Viewer: The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
- Jpred: Jpred is an interactive protein secondary structure prediction Internet server. The server allows a single sequence or multiple alignment to be submitted, and returns predictions from six secondary structure prediction algorithms that exploit evolutionary information from multiple sequences.
- Karyn\'s Genomes: This is a collection and brief description of some of the available sequenced genomes. There is also information about why they are thought to be important for sequencing, together with links to the sequences, publication data and further reading.
- LAMARC - Likelihood Analysis with Metropolis Algorithm using Random Coalescence: LAMARC is a package of programs for computing population parameters, such as population
size, population growth rate and migration rates by using likelihoods for samples of data
(sequences, microsatellites, and electrophoretic polymorphisms) from populations. It
approximates the summation of likelihood over all possible gene genealogies that could
explain the observed sample.
- MAC5: MAC5 is a program which implements MCMC sampling to estimate a phylogenetic tree from a DNA multiple alignment. What differentiates MAC5 from similar programs (e.g. BAMBE, MrBayes) is its use of five-state sequence evolution models as a means to include the gap information when estimating an alignment. Despite vilification of these models in the literature in the past (e.g., Durbin et al., 1998, p. 217), we have found that, in many circumstances, they are useful for improving the precision of topology estimation.
- MAFFT version 6 : MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼10,000 sequences), etc.
- MatrixPlot: visualizing sequence constraints: MatrixPlot can be used to generate mutual information plots of sequence alignments, distance matrices of sequence with known 3D coordinates,
and plots of user provided matrix files.
- Mega: the goal of the MEGA (Molecular Evolutionary Genetics Analysis) software project has been to make useful methods of comparative sequence analysis easily accessible to the scientific community for research and education.
- mFOLD: The objective of this web server is to provide easy access to RNA and DNA folding and hybridization software to the scientific community at large. By making use of universally available web GUIs (Graphical User Interfaces), the server circumvents the problem of portability of this software. Detailed output, in the form of structure plots with or without reliability information, single strand frequency plots and ‘energy dot plots’, are available for the folding of single sequences.
- MIAHIMA: Method for Inferring Sequence History In Terms of Multiple Alignment: MISHIMA is a program for multiple DNA sequence alignment. It takes input in FASTA format and outputs the alignment in MISHIMA or CLUSTALW format.
The idea of this program is to use heuristic to quickly find similarities shared by multiple sequences. Those similarities are then used to split the input sequences into fragments which are aligned separately. After that the partial alignments are assembled back together for complete alignment.
- Micorbial Genomes at JGI (USA): From this site you can get details about our current and upcoming projects, or go directly to the individual microbial genome sites. All of the individual sites include direct access to download sequence file(s), BLAST, and view annotations. Additionally, the Eukaryotic microbial sites include more advanced visualization and search tools.
- MPsrch: MPsrch is a biological sequence sequence comparison tool that implements the true Smith and Waterman algorithm. It runs a search on a HP/COMPAQ cluster, using single and parallelised versions of the software. It allows an rigorous search in a reasonable computational time. MPsrch utilises an exhaustive algorithm, which is recognised as the most sensitive sequence comparison method available, whereas Blast and Fasta utilise a heuristic one. As a consequence, MPsrch is capable of identifying hits in cases where Blast and Fasta fail and also reports fewer false-positive hits.
- Muscle Online Documentation: MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options is provided that give you the choice of optimizing accuracy, speed, or some compromise between the two. Default parameters are those that give the best average accuracy in our tests. Using versions current at the time of writing, my tests show that MUSCLE can achieve both better average accuracy and better speed than CLUSTALW or T‑Coffee, depending on the chosen options.
- NCBI Genome Workbench: NCBI Genome Workbench is an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data.
- OmniMapFree: OmniMapFree is a computer program for displaying and analysing the genome of any organism. You can use it to generate genome maps of your own favourite organism and to link chromosome, genetic, transcriptomic, proteomic, molecular mutant and many other types of important information directly to the genomic sequence. It is very easy to use - it does not require any knowledge of databases or computer programming. All of the data is held in plain text files which can be created and modified using a simple text editor e.g. Notepad or Wordpad.
- Ondex: Data integration and visualisation: The Ondex data integration platform enables data from diverse biological data sets to be linked, integrated and visualised through graph analysis techniques. Ondex uses a rich and flexible core data structure, which has the ability to bring together information from structured databases and unstructured sources such as biological sequence data and free text. Ondex also offers a front-end, which allows users to visualise and analyse the integrated data.
- ORf finder: This tool identifies all open reading frames using the standard or alternative genetic codes. The
deduced amino acid sequence can be saved in various formats and searched against the sequence
database using the WWW BLAST server.
- PALI Phylogeny and ALIgnment of homologous protein structures: This database provides structure based sequence alignments for homologous proteins of known 3-D structure. The alignments available include those of pairwise (two proteins at a time) and multiple (simultaneous superposition of all the structures in a family). The database also provides dendrograms depicting phylogenetic relationships based on sequence and structural similarities.
- Paper on JalView: Summary: Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is known that automatic multiple sequence alignments can often be improved by manual editing. Therefore, tools are needed to view and edit multiple sequence alignments. Due to growth in the sequence databases, multiple sequence alignments can often be large and difficult to view efficiently. The Jalview Java alignment editor is presented here, which enables fast viewing and editing of large multiple sequence alignments.
- PhylOgenetic Web Repeater (POWER): The PhylOgenetic Web Repeater (POWER) allows users performing phylogenetic analysis with molecular data by most programs of PHYLIP package repeatedly. POWER provide two pipelines to process the analysis. One of them includes multiple sequence alignment (MSA) at the begining of the pipeline whereas the other begin phylogenetic analysis with aligned sequence.
- PHYLOGENY ESTIMATION : traditional and Bayesian approaches: The construction of evolutionary trees is now a standard part of exploratory sequence analysis.
Bayesian methods for estimating trees have recently been proposed as a faster method of
incorporating the power of complex statistical models into the process. Researchers who rely
on comparative analyses need to understand the theoretical and practical motivations that
underlie these new techniques, and how they differ from previous methods. The ability of the
new approaches to address previously intractable questions is making phylogenetic analysis
an essential tool in an increasing number of areas of genetic research.
- Phylogeny.fr: Phylogeny.fr is a free, simple to use web service dedicated to reconstructing and analysing phylogenetic relationships between molecular sequences.
Phylogeny.fr runs and connects various bioinformatics programs to reconstruct a robust phylogenetic tree from a set of sequences.
- POLAND: Thermal denaturation: The Poland server will calculate the thermal denaturation profile of double-stranded RNA, DNA or RNA/DNA-hybrids based
on sequence input and parameter settings in this form.
- PROF: Secondary Structure Prediction System: Submit a single amino acid sequence for secondary structure prediction.
- PROMALS: PROMALS constructs multiple protein sequence alignments using information from database searches and secondary structure prediction.
- Prosite: PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help
to reliably identify to which known protein family (if any) a new sequence belongs [More details / References / Disclaimer].
- Protein Information Resource: The Protein Information Resource (PIR), maintains the PIR-International Protein Sequence Database --- a
comprehensive, annotated, and non-redundant set of protein sequence databases in which entries are classified into family groups
and alignments of each group are available.
- Protein sequence analysis at UCL: Our research concerns protein sequence analysis, primarily exploiting the technique of protein `fingerprinting\' (which uses conserved motifs to
characterise particular folds and functionalities). We maintain a database of fingerprints (PRINTS), which complements PROSITE. We aim to
improve fingerprint potency, to enhance predictive power in the Twilight Zone. We also design software to display sequence and structural data
in visually-striking ways (e.g. using Java).
- ProtParam: (References) is a tool which allows the computation of various physical and chemical parameters
for a given protein stored in SWISS-PROT or for a user entered sequence. The computed
parameters include the molecular weight, theoretical pI, amino acid composition, extinction
coefficient, estimated half-life, instability index, aliphatic index and grand average of
hydropathicity (GRAVY)
- PSORT WWW Server: A WWW Server for Analyzing and Predicting Protein Sorting Signals Coded in Amino Acid
Sequence
- RawDot: Large Dot Plots. This page accesses a very fast dot plot algorithm designed for large DNA sequences. This demonstration only allows a
sequence to be compared with itself.
- Readseq at EBI: Web base tool to convert sequences.
- Readseq on Codequest (BAB web server): A sequence converter
- Reputer. Search for repeats in a DNA sequence: REPuter computes all maximal duplications and reverse, complemented and reverse complemented repeats in a DNA input
sequence.
- RESearch - a new computer program for DNA analysis: The main design philosophy behind the development of RESearch was to make the basic DNA analysis tools readily available
on IBM PCs, and to make them easy to use. RESearch enables molecular biologists to analyse a GCG formatted sequence in
several ways including searching for short nucleotide sequences and restriction enzyme sites, translating open reading frames
and searching for short amino acid sequences.
- Restriction Mapper: Restriction Mapper is a web site that finds restriction endonuclease cleavage sites in DNA sequences. It supports linear and circular DNA and provides several ways to sort and filter output.
- Review of BioEdit - 1 (BITS - 2005): This package is quick and easy-to-use for editing sequences and plasmid drawing. However it is limited when it comes to sequence analysis, with much fewer tools than GCG. It can accept a wide range of file formats and with the additional accessory applications such as ClustalW, it is particularly useful for performing multiple sequence alignments and editing them. It can be customised by adding external applications and by adding www tools.
- Ribosomal Database Project: The Ribosomal Database Project (RDP) provides ribosome related data services to the scientific community, including online data analysis, rRNA derived phylogenetic trees, and aligned and annotated rRNA sequences
- RNAstructure: RNAstructure is a Windows program for the prediction and analysis of RNA secondary structure. It works with Windows ME, Windows NT 4, Windows 2000, Windows XP, and Windows Vista. (Note that with Windows Vista, you will need to install the older Windows help reader by following the instructions provided by Microsoft to use the online help. We are working to change this.) Version 4.6 includes a secondary structure prediction algorithm, a sequence editor, an integrated drawing tool, the OligoWalk program, OligoScreen, Dynalign, and a partition function calculator. RNAstructure uses the most current thermodynamic parameters from the Turner lab.
- Sanger, Frederick: I succeeded in developing new methods for amino acid sequencing and used
them to deduce the complete sequence of insulin, for which I was awarded the Nobel Prize for Chemistry in 1958...
- SAT: Sequence Alignment Teacher: Dynamic programming (DP) is a general optimization strategy that is successfully used across various disciplines of science. In bioinformatics, it is widely applied in calculating the optimal alignment between pairs of protein or DNA sequences. These alignments form the basis of new, verifiable biological hypothesis. Despite its importance, there are no interactive tools available for training and education on understanding the DP algorithm. Here, we introduce an interactive computer application with a graphical interface, for the purpose of educating students about DP. The program displays the DP scoring matrix and the resulting optimal alignment(s), while allowing the user to modify key parameters such as the values in the similarity matrix, the sequence alignment algorithm version and the gap opening/extension penalties.
- Satchmo: Simultaneous Alignment and Tree Construction using Hidden Markov mOdels. SATCHMO simultaneously constructs a tree and a set of multiple sequence alignments, one for each internal node of the tree.
- SEQmonk: A tool to visualise and analyse high throughput mapped sequence data
- SeqSearch: SeqSearch is build around a collection of databases for restriction enzymes, DNA sequences, plasmids, contructs, oligo's and motifs and offers an extended set of editing and analysis tools.
- Serial Cloner: Serial Cloner is a Molecular Biology software.
It provides tools with an intuitive interface that assists you in DNA cloning, sequence analysis and visualization.
- SIGFIND - Signal Peptide Prediction Server (Human): This software (SIGFIND) predicts signal peptides at the start of protein sequences
or searches open reading frames with a potential signal peptide coded in nucleotide sequences.
The sig.pep. score along the sequence indicates the location and size of the signal-peptide.
This score ranges from 0 (=no signal peptide) to 9 (=max. score for presence of a signal peptide).
The range where this score drops from high to low indicates the approximate position of the
cleavage site.
- Softberry: Huge collection of software for the analysis and annotation of sequences and genomes.
- Staden Site at sanger: The Staden Package consists of a series of tools for DNA sequence preparation (pregap4), assembly (gap4), editing (gap4) and DNA/protein sequence analysis (spin).
The package was originally developed at the MRC-LMB in Cambridge. It is now open source (BSD licence) and is hosted on sourceforge.net.
- STING: STING is a PDB Viewer and Interactive WWW TOOL with emphasis on bi-directional
coupling of sequence and 3D information.
- STING: Sequence To and withIN Graphics: STING is a WWW tool for the simultaneous display of information about
macromolecular structure (in STING's Graphics Frame) and sequence (in
STING's Sequence Frame). Special attention is given to MacroMolecular
INTERFACE analysis.
- STINGpaint: STINGpaint was developed to allow the presentation of residue
physico-chemical characteristics in the Sequence Frame in our package
STING. As a consequence, during development of STING project, we have
slightly expanded on STINGpaint idea and adopted it for use with Multiple
Sequence Alignment (MSA) coloring. It turns out that this tool was very
interesting for people wanting to easily grasp specifically colored regions along
the MSA
- SWISS-PROT : Annotated Protein Database: SWISS-PROT is a curated protein sequence database which strives to provide a high level of
annotations (such as the description of the function of a protein, its domains structure,
post-translational modifications, variants, etc), a minimal level of redundancy and high level of
integration with other databases (Disclaimer).
- Tablet - Next Generation Sequence Assembly Visualization: Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.
- TCoffee: A web server for mixing Sequences and Structures into multiple sequence alignments
- The Draft Human Genome Sequence, an Introduction: Anyone with a computer and an Internet connnection can now explore the draft sequence of the human genome. How can molecular biologists capitalize on these data riches, and what are the advantages of using the assembled draft sequence? This website aims to jump-start those who want make use of this information, but are not sure where or how to start.
- TIGR - Genome Projects: TIGR\'s Genome Projects are a collection of curated databases containing DNA and protein sequence, gene expression, cellular role, protein family, and taxonomic data for microbes, plants and humans.
- TmPrime: TmPrime is a computer program to design oligonucleotide sets for LCR- and PCR-based gene synthesis to construct man-made DNA sequences by assembling pools of oligonucleotides.
The program divides the long input DNA sequence based on the user specified melting temperatures and assembly conditions, and dynamically optimizes the length of oligonucleotides to achieve homologous melting temperatures. The output reports the melting temperatures, oligonucleotide sequences, and potential formation of secondary structures in a PDF file which will be sent to you via email.
- TOPALi: There are a growing number of biological questions that can be answered by analysing multiple sequence alignment data. We have extended the original TOPALi Java application, beyond recombination detection, to launch a range of statistical and evolutionary analyses of multiple sequence alignments as web services. The extended TOPALi v2 provides phylogenetic model selection, Bayesian analysis (BA) and Maximum Likelihood (ML) phylogenetic tree estimation, detection of sites under positive selection, and recombination breakpoint location analysis.
- TraceEdit: Ridom TraceEdit is a cross-platform graphical DNA trace viewer and editor. TraceEdit displays the chromatogram files from Applied Biosystems automated sequencers and files in the Staden SCF format. Incorrect base calls can be edited and saved. TraceEdit is freely available and designed to operate on Windows, MacOS X and UNIX platforms
- Tree inference at the T-REX server: The most comprehensive web server version includes methods for Visualization and interactive manipulation of phylogenetic trees (using Hierarchical, Radial and Axial types of drawing);
Inference of phylogenetic trees using distance (NJ, BioNJ, UNJ, ADDTREE, MW, FITCH, Circular order reconstruction),parsimony (DNAPARS, PROTPARS, PARS, DOLLOP) and maximum likelihood (PHYML, DNAML, DNAMLK, PROML, PROMLK) based approaches;
Inference of phylogenetic trees form incomplete distance matrices; Inference of phylogenetic networks (i.e. Reticulograms); Detection of horizontal gene transfers; Multiple sequence alignment (ClustalW), Sequence to distance transformations(using Uncorrected, Jukes-Cantor, Tajima-Nei, Kimura 2-parameter, Tamura, Jin-Nei gamma, Kimura protein, LogDet and F84 distances);
Computation of the Robison and Foulds topological distance between two or more trees. The program can also carry out bootstrap and jackknife resampling to assess strength of support of the tree and network branches.
- Tutorial for Clustal X and TreeView: The Clustal programs are the choice for the novice to make a decent phylogenetic analysis of obtained nucleotide or amino acid sequences. Clustal can align the sequences and produce output files for drawing trees. The tree viewing feature is however not included in the standard Clustal packages.
- Tutorial on ClustalW: Clustal W is a general purpose global multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.
- Twitter @geneious: DNA, RNA and protein sequence alignment, assembly and analysis software platform, integrating bioinformatic and molecular biology tools
A computer nearby! Made in NZ
- UGENE - Unipro: UGENE is a free cross-platform genome analysis suite.
Multiple sequence alignment using MUSCLE 3,4 and KAlign;
HMM profiles build and search, based on the source of HMMER 2 and HMMER 3;
PCR Primers design using Primer 3;
Protein secondary structure prediction using GOR IV and PSIPRED;
Phylogenetic analysis with Phylip;
Search for restriction enzymes and integration with REBASE;
Extremely fast repeat finder;
DNA reference assembly using Bowtie;
Search for transcription factor binding sites using SITECON;
Protein back translation;
ORF finder;
Complete Smith-Waterman algorithm implementation;
Comparing genomes using dotplot view.
- UniProt: The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.
- UniProtKB: The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data.
- Vector Data from Invitrogen: Information on Invitrogen Vector Data Files: The search will display a table containing links to Vector Maps, DNA Sequences (ASCII/Text format), Restriction analyses (ASCII/Text format) and multiple cloning site detail sheets (pdf).
- VectorDB: VectorDB contains annotations and sequence information for many vectors commonly used in molecular biology. Information for more than 2600 vectors is available with search facilities. Vectors which are also in GenBank have direct links to that database via NCBI\'s Entrez browser!
- WebCutter: Webcutter is an on-line tool for restriction mapping nucleotide sequences. Now the new version Cutter"
- WhETS: Tool to get best estimate of wheat transcript sequence by combining Triticeae ESTs/mRNAs with rice genes.