|
QueryBuilder Help
|
|
QueryBuilder (QB)
provides one-stop shopping for information in FlyBase.
Using QB, you can
search any field in any report in FlyBase (in a QuerySegment), and
then combine the resulting hit-list with searches in other fields,
to allow combinatorial searches (combining QuerySegments using
Boolean operators).
A set of results
can be exported to QB from other searches on FlyBase, through the
'Hit list refinement' button at the top right of a hit-list, and
then modified to refine the search by adding additional query
segments.
There are three options
on the QB start page:
-
Select a pre-constructed QueryTemplate
-
Import a previously saved query
-
Build a new query
|
Select a pre-constructed QueryTemplate
|
|
The first option on the QB start page allows one to choose a query
from a list of pre-constructed query templates. The available
templates are organized by data type/output. To see the list of
templates related to a given class of data, choose the data class of
interest from the pull down menu at the left. A list of
pre-constructed query templates will appear at the right and a data
class-specific list of “keywords” will appear at the left. The
list of templates can be further refined by selecting one or more of
the keywords. Only the templates containing the chosen keywords will
remain. To return to the complete set of templates for a given data
class, just deselect the chosen keywords.
When you
find a template that matches or is similar to your query of interest,
click on the button to the left of the template. This will bring you
to a Query Builder Page with the specified query set up and ready to
run. To modify the parameters to exactly match your own query
specifications, use the green “Edit” tabs present in each segment
of the query. Modify the search terms as desired, click “Finish
Editing”, and then select “Run Query”.
|
Importing Saved Queries
|
|
Any QuerySchema (a
collection of QuerySegments combined using Boolean operators) can be
saved for running again at a later date using the 'Save Query' option
on the results page. The QuerySchema is saved as a small text file.
|
How to Build a new Query
|
|
To query FlyBase using
QB, you must build one or more segments.
To start building a
query, click the yellow box titled 'Query is empty... Click here to
start building'.
Note that building a
segment using the Controlled Vocabulary (CV) hierarchy as your
DataSet is slightly different from building a segment with any other
data class.
|
|
Building a segment
using a text-string
|
|
STEP 1: Select a data
class to search from the DataSet menu.
There are 16 options to
chose from. Choosing from any of the top 13 DataSets changes the
window display to show all the fields found in the report for that
DataSet. The first of the three remaining options is to query FlyBase
using the controlled vocabularies (CVs) we use to add structured
content to some fields. See below for information on using CVs to
search FlyBase.
STEP 2: Select a field
to search, or use "Any field" to search full records.
STEP 3: Enter text
string to search for. The search algorithm will identify data fields
that contain the text string you have entered. You may opt for case
sensitivity if desired. Autocomplete will list the field entries
corresponding to the text you have typed.
STEP 4: Click the
"Finish editing" button.
STEP 5: (optional): To
add additional segments, click the "+" button. Additional
segments can be joined to existing segments using standard Boolean
operators.
|
|
Building a segment
using a Controlled Vocabulary term
|
|
STEP 1: Select "CV
Hierarchy (GO/etc.)" from the DataSet drop-down menu.
STEP 2: Clicking this
option changes the window display to show the top-level terms from
various CVs used in FlyBase. You can either browse through the CVs
from these top-level terms or you can search directory for terms
matching what you are looking for, using the search box above the
terms. By default, your search will be performed for CV terms from
the whole subtree of the term you've chosen. If you wish to search
only for the exact CV term you have chosen, select "This CV term
only" from the drop down menu. (Hint: you'll retrieve more
results by searching the whole subtree)
STEP 3: Once you've
chosen your term, the window returns to the QB start page, now with
the first QuerySegment composed with your chosen CV term.
STEP 4: Click "Done" button.
STEP 5: (optional): To
add additional segments, click the "+" button. Additional
segments can be joined to existing segments using standard Boolean
operators.
|
|
Prepare, Check, and Run Query
|
|
STEP 1: Check Boolean
operators (if the query consists of more than one segment). Default
is "AND". Change to "OR" or "BUT NOT"
if desired.
STEP 2: Check that
QuerySegments are correct. Segments can be modified by clicking on
them, or deleted, by clicking the "X" in the top right hand
corner of the segment boxes.
STEP 3: Select output
options. Default is to show related genes, to provide
cross-references to other datasets, and to search D.melanogaster
data only. Change if desired.
STEP 4: Click "Run
query" button.
|
Searching Expression Data
|
- Step 1:
- Select the "Build a new query" option.
- Step 2:
- Select the "Expression Patterns" dataset from the DataSet menu.
- Step 3:
- Build your query using CV terms in the Stage, Tissue, and Subcellular Location text fields.
- The auto-complete feature will help you choose valid CV terms to build an expression statement.
-
-
Hints and Tips:
The input fields in this form use a sophisticated auto-complete feature.
When you begin typing in (or even just click inside) a field, a list of suggested
CV terms will appear.
For the first field you fill in, all appropriate CV terms for that category are available.
Each filled search field further constrains the auto-complete function for the remaining fields.
For example, if you have entered "gastrula stage" in the Developmental Stage field,
the auto-complete function for the Body Part/Tissue search field will include the CV term "parasegment 10",
but will exclude the CV term "leg".
Likewise, if you have entered the CV term "prothoracic leg" in the Body Part/Tissue seach field,
the auto-complete function for the Developmental Stage search field
will include "adult stage" but exclude "embryonic stage 4".
If you select only terms suggested by the auto-complete feature,
your expression statement query should always match some results.
Below each search field is a Qualifier field, in which you can enter a qualifier,
such as "early" for Developmental Stage, or "apical" for Subcellular Location.
Each of the qualifier search fields also has an auto-complete function,
and will only offer qualifiers that have been used in curation
with the term entered in search field above it.
Because of this hierarchical auto-completion,
it is possible to select a subset of terms that exclude all possibilities in the other field or fields.
In this situation, the auto-complete will tell you that there are 'no matching variants'.
This is especially true if you select qualifiers for one or more terms.
If you run such a query, no hits will be returned.
Also, the auto-complete cannot take into account that an expression statement
may only exist in, e.g., the "Insertions" dataset,
when you are currently searching the "Genes" dataset.
In these cases, your search will return no direct hits,
but the "Crossreferences" buttons above the results in Step 4
will indicate that there are hits in one or both of the other datasets.
To avoid running queries which produce no hits, it is highly recommended that you use terms suggested by the auto-complete feature.
Use of partial contexts and/or wildcards will still allow the auto-complete and search features to function,
but may result in over- or under-prediction (inclusion of non-relevant hits, or exclusion of relevant ones)
by the complex search/retrieval algorithm.
- Step 4:
- Click on the green 'Finish editing' button.
-
You can edit your query before running it by clicking the green 'Edit' button,
which will take you back to step 3.
You can also add new clauses to your search by clicking on the yellow plus sign button.
The logic here is similar to what is used for other QueryBuilder datasets.
-
-
Hints and Tips:
Recombinant constructs and transgene insertions can be searched
by changing the output option in Step 4 from "Genes"
to "Insertions" or "Recombinant Constructs".
- Step 5:
- Run your query. (Click on the green 'Run query' button.)
-
-
Hints and Tips:
Please note that you can view results from any of the three datasets
("Genes", "Insertions" and "Recombinant Constructs") in Step 5,
even though you have selected one dataset for your output option.
Crossreferences in other datasets are indicated above the output.
Clicking one of these links will switch your view to the results from the indicated data set.
|
Features
|
- Calculations
- Calculations can be incorporated into searches of fields that contain numbers.
- The options are greater than (>), less than (<), plus or minus (+/-) and range (-).
- Any value, no value
- Search for the presence or absence of information in a field, rather than a specific value.
- The options are IS NULL and IS NOT NULL (this query is case sensitive).
- Logical operators
- Combine multiple query legs with logical operators.
- The options are and, or, and but not.
- Phrases
- Multiple words are treated as a phrase.
- Only records that include the search words in the order you specify will be matched.
- Batch queries
- Upload a list of FlyBase IDs, search for all related records.
- Standard Batch download is also available for query results.
- Hierarchical CV queries
- Full support for GO and Anatomy/Development term relationships.
- Searches of CV fields within standard data classes (e.g., Genes) find only records that contain the individual term you specify. The GO/Anatomy CV database associates each term in these CVs with all of the terms below it in the hierarchy, allowing a single search to find records that contain a term or any child of that term.
- Field type tags
- Five field type tags help organize and identify search options.
-
- CV - Controlled Vocabulary, terms are consistent across records
- Flag - Flags records with the presence of links of specified type (any search of flag field will be performed as "IS NOT NULL", ignoring user-supplied context)
- Map - Genetic, cytogenetic, or genomic map data
- Symbol - Symbols are the only, or predominant, datatype
- Text - Data is free text, usage may not be consistent from record to record
- Field content dictionaries
- Preview the information in a field, or select dictionary entries to use in a search.
- The field dictionary lists up to 100 most-commonly-used symbols, terms, numbers or words from the data in the selected field.
- Alternative results
- Related records in other FlyBase data classes are a click away via the green buttons.
- QB creates a set of cross-references for the records that match your search criteria. An itemized results list (of Genes records, for example) is displayed for the data class that is selected when a search is run. A series of green buttons at the top of the results page provide links to related records in other data classes (Insertions, for example). With QB you do not need to open each report and click through layers of links to find related information. This feature can also be used to find information that may be difficult to search for directly because of unfamiliar nomenclature (such as Insertion Symbols). Only References are excluded from automatic generation of alternative results (because of the large size of this dataset).
- Linkouts
- Related information from other databases is a click away via the yellow buttons.
- If the records identified by your search include links to external databases, these links are available from the yellow button or buttons in the Linkout section of the results page.
|
Further Information and Examples
|
- Asterisk is wild. An asterisk (*) on either end of your search string, or embedded in the middle of the string, is interpreted as "any character".
-
- Stocks|Symbol mam*
- Alleles|CV:Phenotype Class *maternal*
- Insertions|Symbol *ptc*
- Wild cards are not automatically added to QB searches. If a query is unproductive, try it again with * on one or both ends.
- Search Flag fields with * or any string of letters.
-
- Genes|Flag:InteractiveFly default
- Polypeptides|Flag:Antibody URL (DSHB Hybridoma: *
- Case-insensitive searches are standard. There are two exceptions:
-
- A case-sensitive Symbol search is available for most data classes.
- The reserved phrases IS NULL and IS NOT NULL are case sensitive.
- Multiple words are treated as a phrase.
-
- Genes|Text:Other information tissue culture cells
- Cytological location searches are redirected to the GBrowse dataset, which uses estimated sequence ranges of cytological locations.
- Join query segments with AND or OR.
- When using two or more query segments, QB gives precedence to the previous segments.
-
- haltere AND wing OR leg is interpreted as (haltere AND wing) OR (leg)
- Calculation query examples:
-
- GBrowse Data|Exact Number of exons > 2
- Polypeptides|Protein size (kD: < 50
- Annotations|Map:Sequence range 3L:5,787,637..5,819,561 +/- 5000 (commas are optional)
- Insertions|Map:Cytogenetic location 67B-D
- References record sets are created only when the References dataset is searched.
-
- References|Author Wakimoto (creates a References dataset)
- Alleles|Text: Discoverer Wakimoto (does not create a References dataset)
|
Notes, Known Problems and Features yet to come
|
- To find out more about the controlled ontology databases:
- GO - Gene Ontology:
- http://www.geneontology.org
- To search for GO terms and their definitions, we recommend:
- http://www.ebi.ac.uk/ego
- To find out more about our Anatomy and Developmental terms, go to Termlink:
- http://www.flybase.org/cgi-bin/fbcvq.html?start
- Cross-references to stocks and images are generated, but cross-references from these data types are blocked. This is because these records may include tangentially related objects, such as the set of genes that are mutant in a multiply marked mapping stock.
- People data are not included in QB.
- All of the menus and dictionary files are produced automatically. Dictionary files remain on the server for 2 hours. If an index dictionary for a given field isn't already present on the server, it will take a bit of time to generate it
- If you encounter any problems with QueryBuilder, or would like help with your queries, please use the contact FlyBase form to write to us.
|
|