OCA allows the user to rapidly search through the contents of the entire PDB Archive for entries obeying certain constraints. A full text search can be made for any string appearing in the text of a PDB entry, excluding the coordinate records and PDB record names. Many specific records can be searched for regular expressions or numerical limits. OCA gives you the option of saving object sets resulting from queries. This saved set can be used as a starting point for further database operations or as a reference for your work. Every saved set includes the date of the search and the query from which it was generated.
The various fields shown on the OCA Advanced Search screen will be ANDed together before searching. All fields are insensitive to case.
It is a good idea to use the "Clear form" button before initiating any new searches. Once the constraints are set, click on the "Search" button to begin.
The full text searches used by OCA are based on the Glimpse indexing and query system. Once the search has been concluded, you may choose to view an entry using Rasmol, MAGE, or a VRML browser. Links to other resources such as SCOP and Entrez are presented when they exist. Any or all of the files may be displayed and downloaded with just the header information or complete with the coordinates, in PDB or mmCIF format.
Simply enter you search string into the desired fields and click on the Search button. Hits found within the released entries are listed first, then any hits from the pending entries. If there are no hits, the browser may provide suggestions of similar words from which you can make a choice, or you can enter another word for a subsequent search.
Following a search, OCA presents you with the field searched, number of hits, your query and our suggestions, along with an output message and the chance to choose a suggested term an begin a new search. You may choose to refine your search if it results in more than 100 hits. You may download the list of returned ID codes for further reference, or select one ID code and retrieve it.
Retrieval presents you with the Atlas page. From this you may display and download the header information of the PDB file or the complete coordinate entry, in PDB or mmCIF format. If the entry has an associated structure factor or NMR restraints file, a PDB-generated biomol file, or a MacroMolecule(s) file created by EBI, you may view the data from here.
You may choose to view a structure using Rasmol, MAGE, or a VRML browser installed on your computer.
Links to other Web resources such as SCOP, NDB, Entrez, PDBREPORT, MMDB, ESTHER, PDBSUM, and DALI are presented when they exist, and these links are updated nightly.
Finally, you may enter another PDB ID code or return to the OCA main page from the bottom of the Atlas page.
The search fields of OCA are:
PDB ID code Four-character accession code Keyword Molecule name, class or family, or related term Author Family name of depositor or author of associated publication Text query Any word in the complete PDB text FASTA Search Fasta search of the sequence Experiment Method of structure determination Resolution A unique value or range of values, in Angstroms Space group Both extended and standard Hermann-Mauguin symbols Organism Trivial name, systematic name or expression system. Date (lower) Date entry was released or updated Date (upper) Date entry was released or updated Associated group Prosthetic group, metal ion, ligand or substrate, or its three letter PDB abbreviation
Boolean Searches and Wild Cards This version of OCA includes NOT, OR and 'Wild-cards' search.
The symbol '*' is used to denote a sequence of any number (including 0) of arbitrary characters. Just add a star'*' at the beginning or end of a word (or both) to 'extend' the search.
For example enter "*ussman" in authors, "*tox* in keywords to retrieve entries with keywords like neurotoxic and toxin "phos*" in Assoc.group. Examples for NOT search are: Author: not sussman Keyword: -antifreeze Organism: -snake Assoc. group: -hem Text query: milk not sugar NOTE: 1. You may use the word 'not' (case is unimportant) or the minus (-) sign attached to the word 2. sussman NOT (harel Silman) will expand to sussman -harel -silman Examples for OR search are: Keyword: *ferr* or *hemo* Author: silman or harel Text: zinc AND (torpedo OR snake ) Organism: snake or torpedo Assoc. group: *dibromo* or atp Space group: p31 or p 2
Searching by PDB ID Code
This is a fast and simple way of finding a particular PDB entry. A PDB ID code or accession code is an identification code consisting of four characters. The first is a digit in the range 0 - 9, the remaining three are alpha-numeric.
You may use "*" or "." in place of any character, such as '9.' or '9*' to retrieve a list of PDB ID codes starting with "9", or '1.ce' or '1*ce' to retrieve a list of ID codes starting with "1" and ending with "ce".
This searches only the HEADER, TITLE, KEYWDS and COMPND records, or fields, of the PDB entry. These fields contain the classification, title of the experiments, classification and related terms, and molecule names, respectively.
Searching on multiple terms in this field causes them to be 'anded' together, giving the same result as using 'and' between the terms. For instance, 'hemoglobin deoxy ferrous' and 'hemoglobin and deoxy and ferrous' have the same result.
This field performs a search on the family names found in AUTHOR and JRNL AUTHOR fields. More than one name will be 'anded' for the search, and both fields are searched. The wild card '*' may be used. Stemming, or using just a portion of the word does not work.
For instance, on December 1, 1997, 'suss' returned no hits, 'suss*' returned 20 hits, as does 'sussman'. 'sussman and mathews' returned 0 hits, and 'sussman or mathews' returned 49 hits.
The full text search is based on the Glimpse indexing and query system.
This fields searches any word in the complete PDB file, not including the coordinate section, and not including the PDB record names. See the PDB Contents Guide for a complete description of the PDB format.
To get an idea of the power and speed of the browser, enter in the Text query field one of the following examples and press the Search button.
zinc and torpedo or snake (zinc and torpedo) or snakeIn the first case, the query is interpreted as 'zinc and (torpedo or snake)'.
Searchs a library containing all the protein sequences in the current PDB, using the FASTA package, for similar protein sequences to your's,
A sequence in any of the following "formats" can be entered (copy and paste) in the 'FASTA search' text input area. Spaces or character case are not important. A minimum of seven residue names must be entered.
1. Three letters code sequence: --------------------------- ASN CYS GLN GLN TYR VAL ASP GLU GLN PHE PRO GLY PHE SER GLY SER GLU MET TRP ASN PRO ASN ARG GLU MET SER GLU ASP CYS LEU TYR LEU ASN ILE TRP VAL PRO SER PRO ARG PRO LYS SER THR THR VAL MET VAL TRP ILE TYR GLY 2. One letter code sequence: ------------------------ NCQQYVDEQFPGFSGSEMWNPNREMS EDCLYLNIWVPSPRPKSTTVMVWIYGCUTOFF VALUE:
The 'cutoff value' limits the number of scores and alignments shown based on the expected number of scores. A cutoff value of 2.0 (-E 2.0) will show all library sequences with scores with an expectation value <= 2.0.
For protein searches, library sequences with E() values < 0.01 for searches of a 10,000 entry protein database are almost always homologous. Frequently sequences with E()-values from 1 - 10 are related as well. Remember, however, that these E() values also reflect differences between the amino acid composition of the query sequence and that of the "average" library sequence. Thus, when searches are done with query sequences with "biased" amino-acid composition, unrelated sequences may have "significant" scores because of sequence bias.
A file containing the sequence of every chain in the PDB in FASTA format can be found at ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt
FASTA is available from ftp://ftp.virginia.edu/pub/fasta/.
A very good FASTA manual is available at http://swarmer.stanford.edu/fastaman.html.
FASTA: Pearson, W.R. and Lipman, D.J. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444-2448(1988)
Method of structure determination. Use the pop-up button to choose diffraction, nmr, theoretical model, or all techniques to constrain your search. The EXPDTA records of the PDB files are searched.
You may enter a range, such as '2.17-2.20' for an inclusive range search, or a unique value, such as '3.0', in Angstroms. The REMARK 2 record is searched.
The CRYST1 record is searched. Both extended and standard Hermann-Mauguin symbols are recognized. Entering either 'P 21' or 'P 1 21 1' currently returns 738 hits.
The scientific and common names and the expression systems as found on the SOURCE records in the PDB file are searched.
Date (lower), Date (upper)
These refer to the date an entry was released or updated, in DAY-MONTH-YEAR format, using either '/' or '-' as separators. The month can be entered as a 3-letter name, as in 9/Sep/1986, or as a number, as in 30-11-1990.
01-Dec-1997 is inserted as the default in the Date lower field.
You may search for a prosthetic group, metal ion, ligand or substrate,
by chemical name or its three-letter PDB residue name. The PDB file HET
and HETNAM records are searched.
See the The PDB Het Group Dictionary for complete descriptions of the het groups currently in use.
Send your comments, suggestions, and bug reports to Jaime Prilusky at Jaime.Prilusky@weizmann.ac.il.