INDRA DB¶
The INDRA (Integrated Network and Dynamical Reasoning Assembler) Database is a framework for creating, maintaining, and accessing a database of content, readings, and statements. This implementation is currently designed to work primarily with Amazon Web Services RDS running Postrgres 9+. Used as a backend to INDRA, the INDRA Database provides a systematic way of scaling the knowledge acquired from other databases, reading, and manual input, and puts that knowledge at your fingertips through a direct Python client and a REST api.
Knowledge sources¶
The INDRA Database currently integrates and distills knowledge from several different sources, both biology-focused natural language processing systems and other pre-existing databases
Daily Readers¶
We have read all available content, and every day we run the following readers:
we read all new content with the following readers:
we read a limited subset of new content with the following readers:
on the latest content drawn from:
PubMed - ~19 million abstracts and ~29 million titles
PubMed Central - ~2.7 million fulltext
Elsevier - ~0.7 million fulltext (requires special access)
Other Readers¶
We also include more or less static content extracted from the following readers:
Other Databases¶
We include the information from these pre-existing databases:
These databases are retrieved primarily using the tools in indra.sources
. The
statements extracted from all of these sources are stored and updated in the
database.
Knowledge Assembly¶
The INDRA Database uses the powerful internal assembly tools available in INDRA but implemented for large-scale incremental assembly. The resulting corpus of cleaned and de-duplicated statements, each with fully maintained provenance, is the primary product of the database.
For more details on the internal assembly process of INDRA, see the INDRA documentation.
Access¶
The content in the database can be accessed by those that created it using the
indra_db.client
submodule. This repo also implements a REST API which can be
used by those without direct acccess to the database. For access to our REST
API, please contact the authors.
The INDRA database only works for Python 3.6+, though some parts are still compatible with 3.5.
First, install INDRA,
then simply clone this repo, and make sure that it is visible in your
PYTHONPATH
.
The development of INDRA DB is funded under the DARPA Communicating with Computers program (ARO grant W911NF-15-1-0544).
Further INDRA Database documentation¶
- License and funding
- INDRA Database modules
- The Client
- Pipeline Management CLI
- Pipeline CLI Implementations
- Database Integrated Reading Tools
- Database Integrated Preassembly Tools
- Database Schemas
- Utilities
- Database Session Constructors (
indra_db.util.constructors
) - Scripts to Get Content (
indra_db.util.content_scripts
) - Distilling Raw Statements (
indra_db.util.distill_statements
) - Script to Create a SIF Dump (
indra_db.util.dump_sif
) - General Helper Functions (
indra_db.util.helpers
) - Routines for Inserting Statements and Content (
indra_db.util.insert
)
- Database Session Constructors (
- Some Miscellaneous Modules