Return to bioextract.org

About BioExtract Server

1.1. Introduction

The BioExtract Server was developed within the context of the Laurence H. Baker Center for Bioinformatics and Biological Statistics at Iowa State University in collaboration with VisualMetrics Corporation for consolidating and serving curated data from publicly accessible biomolecular databases. This server is a distributed database service designed to consolidate and serve data subsets from accessible, heterogeneous, biomolecular databases. It offers a central distribution point for uniformly formatted data from various data sources. The basic operations of the BioExtract Server allow researchers via their Web browsers to: specify data sources; select cleaning and analytic tools; flexibly query the sources with a full range of relational operators; determine download formats for their resulting extracts; save workflows; and name and keep query results persistent for reuse.

1.2. Database System

The BioExtract Server provides researchers with the ability to select from a list of data sources to be queried. The data sources available to researchers through the BioExtract Server are classified as either data sources or previously defined datasets. A previously defined dataset is a group of records resulting from a previously executed query. All functionality in the BioExtract Server that can be applied to data sources may also be applied to these datasets (e.g., query, export, analyze). By having the ability to save the results of a query to datasets, researchers may share and subsequently search persistent subsets of data. The BioExtract Server’s data sources are distributed and implemented as relational databases, proprietary fieldstream data warehouses, or data sources hosted by Web servers. Each of these implementations provides researchers with particular advantages and therefore, it is important that all of these implementation types be included within the BioExtract Server. Some of the Advantages of the fieldstream database system include:

1.3. BioExtract Server Architecture

The BioExtract Server has been implemented using a multitiered J2EE architecture. Sun’s Java 2 Enterprise Edition (J2EE) Platform provides the ability to develop, deploy, and execute applications in a distributed environment. This architecture also provides:

The tiers making up the BioExtract Server are the client tier, the middle tier and the backend database server tier. (See Figure 1.)

1.3.1. Client tier. The researcher interacts with the BioExtract Server via a Web browser. The system administrator enters data sources and researcher groups into the system. To access the system, researchers are automatically logged into the server with the “guest” id. See Why? — Benefits of becoming a registered user to read about the advantages of creating a personal account.

1.3.2. Middle tier. The middle tier is implemented through the development of a J2EE application and deployed to an application server. It handles the communication between the backend database servers and client processes. All client requests are processed through the middle tier. The list of available databases, database groups, data sets, researchers, researcher groups, and researcher workflows are all managed at this level.

1.3.3. Backend database server tier. Through Remote Method Invocation (RMI), the middle tier application accesses data stored in the fieldstream databases and stand-alone analytic tools. These databases and analytic tools may reside on the same machine or may be distributed across an intranet or the Internet.

Image of BioExtract Server Architecture

Figure 1. BioExtract Server Architecture

1.4. BioExtract Server Functionality

The BioExtract Server provides the researcher with the ability to select from a list of distributed databases or data sets, query selected databases or data sets, apply cleaning and analytic tools to query results, view results, export or save results, and save researcher workflows.

1.4.1. Pick Sources. After logging onto the server, the researcher has access to multiple databases and previously saved data subsets. The list of available databases and data sets varies based on the researcher’s identity. Multiple distributed databases or data sets may be searched using a single query.

1.4.2. Query. The query capabilities within the BioExtract Server are in the form of FIELDĀ®OPERATORĀ®CONDITION. Fields represent the features, qualifiers, annotations, and other text fields within the records. Operators contextually depend on whether the field selected is numeric or text. For numeric fields, the operators include the relational operators. For text, the operator is always EQ (=), but the condition that can be specified supports GREP options (wildcards, missing characters, etc.) By saving the results of a query to a data set, the researcher is able to subsequently query that data set. The set of query fields available for searching is the union of the available search fields for each selected data source.

1.4.3. Analyze Data. The researcher is provided with a list of analytic tools that may be applied to a result set. Based on the researcher’s tool selection, the data may be processed against a number of algorithms to automatically identify, correct, and annotate the data for many of the most common problems found in sequence misalignments or putative sequence identifications. Depending on the analytic tool selected, the input into the tool may be entered directly, may be based on the current query result set, or may be the output from a previously executed tool.

1.4.4. View Results. After the researcher has executed a query, the results may be viewed in the detail screen of the BioExtract Server or by linking to the original data source.

1.4.5. Export. The results of a query may be exported locally by the researcher. Presently, the exports are ASCII with the format specified by the researcher based on a list of available formats.

1.4.6. Workflows. As the researcher works with the BioExtract server, “steps” are saved in the form of a workflow. Examples of steps might include executing a query, saving a result set, or running an analytic tool. The researcher has the option of saving workflows, modifying workflows, executing workflows, or executing a single step contained within a workflow.

back to top