Database Schemas

Here are defined the schemas for the principal and readonly databases, as well as some useful mixin classes.

Principal Database Schema (indra_db.schemas.principal_schema)

The Principal Schema

The Principal database is the core representation of our data, the ultimate authority on what we know. It is heavily optimized for the input and maintenance of our data.

class indra_db.schemas.principal_schema.PrincipalSchema(Base)[source]

The Principal schema class organizes the table constructors.

The tables can be divided into various groups, with a clear order of creation for many of them.

Core Tables

First are the core tables representing our knowledge:

Statement Attribute Tables

Then there are the tables that represent attributes of statements. The set of tables is identical for the raw statements:

and the preassembled statements:

Curation Table

This table is where we record the curations submitted by ourselves and our users, which we use to improve our results.

Ancillary Tables

We also have several tables that we use to keep track of processing metadata, and some artifacts useful in that processing.

text_ref()[source]

Represent a piece of text, as per its identifiers.

Each piece of text will be made available in different forms through different services, most commonly abstracts through pubmed and full text through pubmed central. However they are from the same paper, which has various different identifiers, such as pmids, pmcids, and dois.

We do our best to merge the different identifiers and for the most part each paper has exactly one text ref. Where that is not the case it is mostly impossible to automatically reconcile the different identifiers (this often has to do with inconsistent versioning of a paper and mixups over what is IDed).

Size: medium

Basic Columns

These are the core columns representing the different IDs we use to represent a paper.

  • id integer PRIMARY KEY: The primary key of the TextRef entry. Elsewhere this is often referred to as a “text ref ID” or “trid” for short.

  • pmid varchar(20): The identifier from pubmed.

  • pmcid varchar(20): The identifier from PubMed Central (e.g. “PMC12345”)

  • doi varchar(100): The ideally universal identifier.

  • pii varchar(250): The identifier used by Springer.

  • url varchar UNIQUE: For sources found exclusively online (e.g. wikipedia) use their URL.

  • manuscript_id varchar(100) UNIQUE: The ID assigned documents given to PMC author manuscripts.

Metadata Columns

In addition we also track some basic metadata about the entry and updates to the data in the table.

  • create_date timestamp without time zone: The date the record was added.

  • last_updated timestamp without time zone: The most recent time the record was edited.

  • pub_year integer: The year the article was published, based on the first report we find (in order of PubMed, PMC, then PMC Manuscripts).

Constraints

Postgres is extremely efficient at detecting conflicts, and we use this to help ensure our entries do not have any duplicates.

  • pmid-doi: UNIQUE(pmid, doi)

  • pmid-pmcid: UNIQUE(pmid, pmcid)

  • pmcid-doi: UNIQUE(pmcid, doi)

Lookup Columns

Some columns are hard to look up when they are in their native string format, so they are processed and broken down into integer parts, as far as possible.

  • pmid_num integer: the int-ified pmid, faster for lookup.

  • pmcid_num integer: the int-portion of the PMCID, so “PMC12345” would here be 12345.

  • pmcid_version integer: although rarely used, occasionally a PMC ID will have a version, indicated by a dot, e.g. PMC12345.3, in which case the “3” would be stored in this column.

  • doi_ns integer: The DOI system works by assigning organizations (such as a journal) namespace IDs, and that organization is then responsible for maintaining a unique ID system internally. These namespaces are always numbers, and are stored here as such.

  • doi_id varchar: The custom ID given by the publishing organization.

mesh_ref_annotations()[source]

Represent the MeSH annotations of papers provided by PubMed.

Each abstract/entry in PubMed is accompanied by human-curated MeSH IDs indicating the topics of the paper. Each paper will have many IDs in general, so a separate table is used, liked to the text_ref table by an un-constrained PMID. This make insertion of the data easier because the custom TRIDs need not be retrieved to dump the mesh refs.

Size: large

Columns

  • id integer PRIMARY KEY: The primary database-assigned ID of the row.

  • pmid_num integer NOT NULL: The int-ified pmid that is used to link entries in this table with those in the text_ref table.

  • mesh_num `integer NOT NULL: The intified MeSH ID (with the prefix removed). The is_concept column indicates whether the prefix was D (False) or C (True).

  • qual_num integer: The qualifier number that is sometimes included with the annotation (Prefix Q).

  • major_topic boolean DEFAULT false: The major topic flag indicates whether the ID describes a primary purpose of the paper.

  • is_concept boolean DEFAUL false: Indicate whether the prefix was C (true) or D (false).

Constraints

Postgres is extremely efficient at detecting conflicts, and we use this to help ensure our entries do not have any duplicates.

  • mesh-uniqueness: UNIQUE(pmid_num, mesh_num, qual_num, is_concept)

mti_ref_annotaions_test()[source]

Represent the MeSH annotations of abstracts as inferred by MTI.

MTI is a machine learned model that attempts to predict MeSH annotations on new un-annotated abstracts after training on the existing annotations.

Size: medium

Columns

  • id integer PRIMARY KEY: The primary database-assigned ID of the row.

  • pmid_num integer NOT NULL: The int-ified pmid that is used to link entries in this table with those in the text_ref table.

  • mesh_num `integer NOT NULL: The intified MeSH ID (with the prefix removed). The is_concept column indicates whether the prefix was D (False) or C (True).

  • qual_num integer: The qualifier number that is sometimes included with the annotation (Prefix Q).

  • major_topic boolean DEFAULT false: The major topic flag indicates whether the ID describes a primary purpose of the paper.

  • is_concept boolean DEFAUL false: Indicate whether the prefix was C (true) or D (false).

Constraints

Postgres is extremely efficient at detecting conflicts, and we use this to help ensure our entries do not have any duplicates.

  • mesh-uniqueness: UNIQUE(pmid_num, mesh_num, qual_num, is_concept)

text_content()[source]

Represent the content of a text retrieved from a particular source.

For each paper as a logical entity, there are many places where you can acquire the actual article or parts of it. For example you can get an abstract from PubMed for most content, and for a minority subset you can get full text from PubMed Central, either their Open-Access corpus or their author’s Manuscripts.

Both the text itself and the metadata for the source of the text are represented in this table.

Size: large

Basic Columns

  • id integer PRIMARY KEY: The auto-generated primary key of the table. These are elsewhere called Text Content IDs, or TCIDs.

  • text_ref_id integer NOT NULL: A foreign-key constrained reference to the appropriate entry in the text_ref table.

  • source varchar(250) NOT NULL: The name of the source, e.g. “pubmed” or “pmc_oa”. The list of content names can be found in the class attributes in content managers.

  • format varchar(250) NOT NULL: The file format of the content, e.g. “XML” or “TEXT”.

  • text_type varchar(250) NOT NULL: The type of the text, e.g. “abstract” of “fulltext”.

  • preprint boolean: Indicate whether the content is from a preprint.

  • license [varchar]: Record the license that applies to the content.

  • content bytea: The raw compressed bytes of the content.

Metadata Columns

  • insert_data timestamp without time zone: The date the record was added.

  • last_updated timestamp without time zone: The most recent time the record was edited.

Constraints

Postgres is extremely efficient at detecting conflicts, and we use this to help ensure our entries do not have any duplicates.

  • content-uniqueness: UNIQUE(text_ref_id, source, format, text_type)

reading()[source]

Represent a reading of a piece of text.

We have multiple readers and of course many thousands of pieces of text content. Each entry in this table applies to a given reader applied to a given pieces of content.

As such, the primary ID is a hash constructed from the text content ID prepended with integers that are assigned to each reader-reader version pair. The function generate_reading_id implements the particular process used. The reader numbers are assigned in the readers global, and the reader version number is the index of the version listed for the given reader in the reader_versions dictionary in the same module.

Size: very large

Basic Columns

  • id bigint PRIMARY KEY: A hash ID constructed from a reader number, reader version number, and the text content ID of the content that was read.

  • text_content_id integer NOT NULL: A foreign-key constrained reference to the appropriate entry in the text_content table.

  • batch_id integer NOT NULL: A simple random integer (not unique) that is assigned each batch of inserted readings. It is used in the moments after the insert to easily retrieve the content that was just added, potentially plus some extra.

  • reader varchar(20) NOT NULL: The name of the reader, e.g. “REACH” or “SPARSER”.

  • reader_version varchar(20) NOT NULL: The version of the reader, which may be any arbitrary string in principle. This allows each reader to define its own versioning scheme.

  • format varchar(20) NOT NULL: The file format of the reading result, e.g. “XML” or “JSON”.

  • bytes bytea: The raw compressed bytes of the reading result.

Metadata Columns

  • create_date timestamp without time zone: The date the record was added.

  • last_updated timestamp without time zone: The most recent time the record was edited.

Constraints

Postgres is extremely efficient at detecting conflicts, and we use this to help ensure our entries do not have any duplicates.

  • reading-uniqeness: UNIQUE(text_content_id, reader, reader_version)

db_info()[source]

Represent the provenance and metadata for an external knowledge base.

INDRA DB takes content not just from our own readings but also merges that with many pre-existing knowledge bases, many of them human curated. These knowledge bases are defined and managed by classes contained in knowledgebase_manager.

No real data is contained in this column, simply records of which knowledge bases have been added, updated, and when.

Size: very small

Basic Columns

  • id integer PRIMARY KEY: A database-assigned integer unique ID for each database entry. These are elsewhere referred to as db_info_ids or dbids.

  • db_name varchar NOT NULL: A short lowercase string that is used internally to identify the knowledge base, e.g. “pc” for Pathway Commons.

  • db_full_name varchar NOT NULL: The full name of the knowledge base, neatly formatted, e.g. “Pathway Commons”.

  • source_api varchar NOT NULL: The indra source API that was used to extract Statements from the knowledge base, e.g. “biopax”.

Metadata Columns

  • create_date timestamp without time zone: The date the record was added.

  • last_updated timestamp without time zone: The most recent time the record was edited.

raw_statements()[source]

Represent Statements exactly as extracted by their source apis.

INDRA Defines several source APIs for different file types from which we can extract INDRA Statements. The goal of these APIs is primarily to accurately convey the contents of the files, and minimal fixes are made at this stage (e.g. grounding is saved for preassembly).

Thus this table contains statements that are considered “messy” in two key ways:

  • they have a lot of repetition of information, and

  • they have whatever grounding the original source gave them.

However these Statements also have the Evidence object JSON contained in their json column, and this Evidence information is NOT copied into the pa_statements table, which allows for a flexible incremental updates. A “lateral join” on this table can be used to get the first N evidence associated with each PA Statement.

Size: very large

Basic Columns

  • id integer PRIMARY KEY: A database-assigned integer unique ID for each database entry. These are elsewhere referred to as “Statement ID”s, or “sid”s.

  • uuid varchar UNIQUE NOT NULL: A UUID generated when a Statement object is first created. This can be used for tracking particular objects through the code.

  • batch_id integer NOT NULL: A simple random integer (not unique) that is assigned each batch of inserted Statements. It is used in the moments after the insert to easily retrieve the content that was just added, potentially plus some extra.

  • mk_hash bigint NOT NULL: A hash of the matches_key of a Statement. This should be unique for any statement containing the same information.

  • text_hash bigint: A hash of a the evidence text, used to detect exact duplicate Statements (same information from the same exact source, right down to the text) that sometimes occur due to bugs

  • source_hash bigint NOT NULL: A hash of the source information.

  • db_info_id integer: A foreign key into the db_info table, for those statements that come from knowledge bases.

  • reading_id bigint: A foreign key into the reading table, for those statements that come from a reading.

  • type varchar(100) NOT NULL: The type of the Statement, e.g. “Phosphorylation”.

  • indra_version varchar(100) NOT NULL: The version of INDRA that was used to generate this Statement, specifically as returned by indra.util.get_version.get_version().

  • json bytea NOT NULL: The bytes of the Statement JSON (including exactly one Evidence JSON)

Metadata Columns

  • create_date timestamp without time zone: The date the Statement was added.

Constraints

Postgres is extremely efficient at detecting conflicts, and we use this to help ensure our entries do not have any duplicates.

  • reading_raw_statement_uniqueness: UNIQUE(mk_hash, text_hash, reading_id)

  • db_info_raw_statement_uniqueness: UNIQUE(mk_hash, source_hash, db_info_id)

raw_activity()[source]

Represent the activity of a raw statement (an ActiveForm).

raw_agents()[source]

Represent an identifier for an agent of a raw statement.

raw_mods()[source]

Represent a modification of an agent of a raw statement.

raw_muts()[source]

Represent a mutation of an agent of a raw statement.

Represent links between raw statements and preassembled statements.

Each preassembled statement is constructed from multiple raw statements, in general. This maps each pa_statement to the raw statements that were merged to form it. It is through this table that evidence can be gathered for pa_statements.

The astute reader may note that the raw_statements-to -pa_statement relationship is many-to-one, which can be represented simply using a foreign-key in the “many” table, in this case raw_statements. This is not done because the pa_statement does not, in general, exist when the raw_statement is added to the database.

Constructed as it is, these links can be copied in bulk during preassembly, as opposed to having to modify as many as a million entries with a newly created foreign-key map.

Size: large

Basic Columns

  • id integer PRIMARY KEY: A database-assigned integer unique ID for each database entry.

  • raw_stmt_id integer NOT NULL REFERENCES raw_statements(id): The Raw Statement ID foreign key to the raw_statements table.

  • pa_stmt_mk_hash bigint NOT NULL REFERENCES pa_statements(mk_hash): The PA Statement matches-key hash foreign key to the pa_statements table.

Constraints

Postgres is extremely efficient at detecting conflicts, and we use this to help ensure our entries do not have any duplicates.

  • stmt-link-uniqueness: UNIQUE(raw_stmt_id, pa_stmt_mk_hash)

pa_statements()[source]

Represent preassembled statements.

Preassmebled Statements are generated from Raw Statements using INDRA’s preassembly tools. Specifically:

  • agents are grounded,

  • agent groundings are disambiguated (using adeft),

  • sites are fixed (using protmapper),

  • and finally, repeated information is consolidated, for example Phosphorylation(MEK(), ERK()) is represented only once in this corpus, with links to the many instances that information was extracted, which are stored in the raw_statements table.

Each entry is linked back to the (in general multiple) raw statements it was derived from in the raw_unique_links table.

Size: medium large

Basic Columns

  • mk_hash bigint PRIMARY KEY: a hash of the statement matches key, which is unique for the _knowledge_ of the Statement.

  • matches_key varchar NOT NULL: The matches-key that was hashed.

  • uuid varchar UNIQUE NOT NULL: A UUID generated when a Statement object is first created. This can be used for tracking particular objects through the code. The UUID is distinct from any of the raw statement UUIDs that compose this Statement.

  • type varchar(100) NOT NULL: The type of the Statement, e.g. “Phosphorylation”.

  • indra_version varchar(100) NOT NULL: The version of INDRA that was used to generate this Statement, specifically as returned by indra.util.get_version.get_version().

  • json bytea NOT NULL: The bytes of the Statement JSON (including exactly one Evidence JSON)

Metadata Columns

  • create_date timestamp without time zone: The date the Statement was added.

Represent the links of support calculated during preassembly.

In INDRA, we look for cases where more specific Statements may lend support to more general Statements, and potentially vice versa, to better gauge whether an extraction is reliable.

Size: large

Basic Columns

  • id integer PRIMARY KEY: A database-assigned integer unique ID for each database entry.

  • supporting_mk_hash bigint NOT NULL REFERENCES pa_statements(mk_hash): A foreign key to the PA Statement that is giving the support (that is, the more specific Statement).

  • supported_mk_hash bigint NOT NULL REFERENCES pa_statements(mk_hash): A foreign key to the PA Statement that is given the support (that is, the more generic Statement).

Constraints

Postgres is extremely efficient at detecting conflicts, and we use this to help ensure our entries do not have any duplicates.

  • pa_support_links_link_uniqueness: UNIQUE(supporting_mk_hash, supported_mk_hash)

pa_activity()[source]

Represent the activity of a preassembled Statement.

pa_agents()[source]

Represent an identifier for an agent of a preassembled statement.

pa_mods()[source]

Represent a modification of an agent of a preassembled statement.

pa_muts()[source]

Represent a mutation of an agent of a preassembled statement.

curations()[source]

Represent the curations of our content.

At various points in our APIs and UIs it is possible to curate the content we have extracted, recording whether it is an accurate extraction from the source text, and if not the reason why.

Size: small

Basic Columns

  • id integer PRIMARY KEY: A database-assigned integer unique ID for each database entry.

  • pa_hash bigint REFERENCES pa_statements(mk_hash): A reference into the pa_statements table to the the pa statement whose evidence was curated.

  • source_hash bigint: A hash that represents the source of this Statement (e.g. reader and piece of content).

  • tag varchar: A text code indicating the type of error curated. The domain of these strings is regulated in code elsewhere.

  • text varchar: A free-form text description by the curator of what they think went wrong (or right).

  • curator varchar NOT NULL: The identity of the curator. This has elsewhere been standardized to be their email.

  • auth_id varchar: [deprecated]

  • source varchar: A string indicating where this curation originated, e.g. “DB REST API” for the INDRA Database REST service.

  • ip inet: The IP address from which the curation was submitted.

  • date timestamp without time zone: The date the curation was added.

  • pa_json jsonb: the preassembled Statement JSON that was curated.

  • ev_json jsonb: the Evidence JSON that was curated (including the text).

source_file()[source]

Record the pubmed source file that was processed.

updates()[source]

Record when text ref and content updates were performed.

reading_updates()[source]

Record runs of the readers on the content we have found.

xdd_updates()[source]

Record the times we process dumps from xDD.

rejected_statements()[source]

Represent raw statements that were rejected.

discarded_statements()[source]

Record the reasons for which some statements were discarded.

preassembly_updates()[source]

Record updates of the preassembled corpus.

Readonly Database Schema (indra_db.schemas.readonly_schema)

Defines the get_schema function for the readonly database, which is used by external services to access the Statement knowledge we acquire.

class indra_db.schemas.readonly_schema.ReadonlySchema(Base)[source]

Schema for the Readonly database.

We use a readonly database to allow fast and efficient load of data, and to add a layer of separation between the processes of updating the content of the database and accessing the content of the database. However, it is not practical to have the views created through sqlalchemy: instead they are generated and updated manually (or by other non-sqlalchemy scripts).

Before building these tables, the belief table must already have been loaded into the readonly database.

The following views must be built in this specific order (temp):

Note that the order of views below is determined not by the above order but by constraints imposed by use-case.

Meta Tables

Any table that has “meta” in the name is intended as a primary lookup table. This means it will have both the data indicated in the name of the table, such at (agent) “text”, (agent) “name”, or “source”, but also a collection of columns with metadata essential for sorting and grouping of hashes:

  • Sorting:

    • belief

    • ev_count

    • agent_count

  • Grouping:

    • type_num

    • activity

    • is_active

Temporary Tables

There are some intermediate results that it is worthwhile to calculate and store for future table construction. Sometimes these were once permanent tables but are no longer used for their own sake, and it was simply simpler to delete them after their derivatives were completed. In other cases the temporary tables are more principled: created because many future tables draw on them and using a “with” clause for each one would be impractical.

Whatever the reason, deleting the temporary tables greatly reduces the size of the readonly database. Such tables are marked in with “(temp)” at the beginning of their doc string.

belief()[source]

The belief of preassembled statements, keyed by hash.

Columns

  • mk_hash bigint

  • belief real

Indices

  • mk_hash

evidence_counts()[source]

The evidence counts of pa statements, keyed by hash.

Columns

  • mk_hash bigint

  • ev_count integer

Indices

  • mk_hash

The source metadata for readings, keyed by reading ID.

Columns

  • trid integer

  • pmid varchar(20)

  • pmid_num integer

  • pmcid varchar(20)

  • pmcid_num integer

  • pmcid_version integer

  • doi varchar(100)

  • doi_ns integer

  • doi_id varchar

  • pii varchar(250)

  • url varchar(250)

  • manuscript_id varchar(100)

  • tcid integer

  • source varchar(250)

  • rid integer

  • reader varchar(20)

Indices

  • rid

  • pmid

  • pmid_num

  • pmcid

  • pmcid_num

  • doi

  • doi_ns

  • doi_id

  • manuscript_id

  • tcid

  • trid

Join of PA JSONs and Raw JSONs for faster lookup.

Columns

  • id integer

  • raw_json bytea

  • reading_id bigint

  • db_info_id integer

  • mk_hash bigint

  • pa_json bytea

  • type_num smallint

  • src varchar

Indices

  • mk_hash

  • reading_id

  • db_info_id

  • src

pa_agent_counts()[source]

The number of agents for each Statement, keyed by hash.

Columns

  • mk_hash bigint

  • agent_count integer

Indices

  • mk_hash

raw_stmt_src()[source]

The source (e.g. reach, pc) of each raw statement, keyed by SID.

Columns

  • sid integer

  • src varchar

Indices

  • sid

  • src

pa_stmt_src()[source]

(temp) The number of evidence from each source for a PA Statement.

This table is constructed by forming a column for every source short name present in the raw_stmt_src.

Columns

  • mk_hash bigint

  • …one column for each source… integer

Indices

  • mk_hash

(temp) A quick-lookup from mk_hash to basic text ref data.

Columns

  • mk_hash bigint

  • trid integer

  • pmid_num varchar

  • pmcid_num varchar

  • source varchar

  • reader varchar

Indices

  • mk_hash

  • trid

  • pmid_num

mesh_terms()[source]

(temp) All mesh annotations with D prefix, keyed by PMID int.

Columns

  • mesh_num integer

  • pmid_num integer

Indices

  • pmid_num

mesh_concepts()[source]

(temp) All mesh annotations with C prefix, keyed by PMID int.

Columns

  • mesh_num integer

  • pmid_num integer

Indices

  • pmid_num

hash_pmid_counts()[source]

(temp) The number of pmids for each PA Statement, keyed by hash.

Columns

  • mk_hash bigint

  • pmid_count integer

Indices

  • mk_hash

mesh_term_ref_counts()[source]

The D-type mesh IDs with pmid and ref counts, keyed by hash and mesh.

Columns

  • mk_hash bigint

  • mesh_num integer

  • ref_count integer

  • pmid_count integer

Indices

  • mesh_num

  • mk_hash

mesh_concept_ref_counts()[source]

The C-type mesh IDs with pmid and ref counts, keyed by hash and mesh.

Columns

  • mk_hash bigint

  • mesh_num integer

  • ref_count integer

  • pmid_count integer

Indices

  • mesh_num

  • mk_hash

raw_stmt_mesh_terms()[source]

The D-type mesh number raw statement ID mapping.

Columns

  • sid integer

  • mesh_num integer

Indices

  • sid

  • mesh_num

raw_stmt_mesh_concepts()[source]

The C-type mesh number raw statement ID mapping.

Columns

  • sid integer

  • mesh_num integer

Indices

  • sid

  • mesh_num

pa_meta()[source]

(temp) The metadata most valuable for querying PA Statements.

This table is used to generate the more scope-limited name_meta, text_meta, and other_meta. The reason is that NAME and TEXT (in particular) agent groundings are vastly overrepresented.

Columns

  • ag_id integer

  • ag_num integer

  • db_name varchar

  • db_id varchar

  • role_num smallint

  • type_num smallint

  • mk_hash bigint

  • ev_count integer

  • belief real

  • activity varchar

  • is_active boolean

  • agent_count integer

  • is_complex_dup boolean

Indices

  • db_name

  • mk_hash

source_meta()[source]

All the source-related metadata condensed using JSONB, keyed by hash.

Columns

  • mk_hash bigint

  • ev_count integer

  • belief real

  • num_srcs integer

  • src_json json

  • only_src varchar

  • has_rd boolean

  • has_db boolean

  • type_num smallint

  • activity varchar

  • is_active boolean

  • agent_count integer

Indices

  • mk_hash

  • only_src

  • activity

  • type_num

  • num_srcs

text_meta()[source]

The metadata most valuable for querying PA Statements by agent TEXT.

This table is generated from pa_meta, because TEXT is extremely overrepresented among agent groundings. Removing these and NAMEs from the “OTHER” efficiently narrows the search very rapidly, and for the larger sets of NAME and TEXT removes an index-search.

Columns

  • ag_id integer

  • ag_num integer

  • db_id varchar

  • role_num smallint

  • type_num smallint

  • mk_hash bigint

  • ev_count integer

  • belief real

  • activity varchar

  • is_active boolean

  • agent_count integer

  • is_complex_dup boolean

Indices

  • mk_hash

  • db_id

  • type_num

  • activity

name_meta()[source]

The metadata most valuable for querying PA Statements by agent NAME.

This table is generated from pa_meta, because NAME is overrepresented among agent groundings. Removing these and NAMEs from the “OTHER” efficiently narrows the search very rapidly, and for the larger sets of NAME and TEXT removes an index-search.

Columns

  • ag_id integer

  • ag_num integer

  • db_id varchar

  • role_num smallint

  • type_num smallint

  • mk_hash bigint

  • ev_count integer

  • belief real

  • activity varchar

  • is_active boolean

  • agent_count integer

  • is_complex_dup boolean

Indices

  • mk_hash

  • db_id

  • type_num

  • activity

other_meta()[source]

The metadata most valuable for querying PA Statements.

This table is a copy of pa_meta with rows with agent groundings besides NAME and TEXT removed.

Columns

  • ag_id integer

  • ag_num integer

  • db_name varchar

  • db_id varchar

  • role_num smallint

  • type_num smallint

  • mk_hash bigint

  • ev_count integer

  • belief real

  • activity varchar

  • is_active boolean

  • agent_count integer

  • is_complex_dup boolean

Indices

  • mk_hash

  • db_name

  • db_id

  • type_num

  • activity

mesh_term_meta()[source]

A lookup for hashes by D-type mesh IDs.

Columns

  • mk_hash bigint

  • mesh_num integer

  • tr_count integer

  • ev_count integer

  • belief real

  • type_num smallint

  • activity varchar

  • is_active boolean

  • agent_count integer

Indices

  • mk_hash

  • type_num

  • activity

mesh_concept_meta()[source]

A lookup for hashes by C-type mesh IDs.

Columns

  • mk_hash bigint

  • mesh_num integer

  • tr_count integer

  • ev_count integer

  • belief real

  • type_num smallint

  • activity varchar

  • is_active boolean

  • agent_count integer

Indices

  • mk_hash

  • type_num

  • activity

agent_interactions()[source]

Agent and type data in simple JSONs for rapid lookup, keyed by hash.

This table is used for retrieving interactions, agent pairs, and relations (any kind of return that is more generic than full Statements).

Columns

  • mk_hash bigint

  • ev_count integer

  • belief real

  • type_num smallint

  • activity varchar

  • is_active boolean

  • agent_count integer

  • agent_json jsonb

  • src_json jsonb

  • is_complex_dup boolean

Indices

  • mk_hash

  • agent_json

  • type_num

Class Mix-ins (indra_db.schemas.mixins)

This defines class mixins that are used to add general features to SQLAlchemy table objects via multiple inheritance.

exception indra_db.schemas.mixins.DbIndexError[source]
class indra_db.schemas.mixins.IndraDBTableMetaClass(*args, **kwargs)[source]

This serves as a meta class for all tables, allowing str to be useful.

In particular, this makes it so that the string gives a representation of the SQL table, including columns.

class indra_db.schemas.mixins.IndraDBRefTable[source]

Define an API and methods for a table of text references.

classmethod pmid_in(pmid_list, filter_ids=False)[source]

Get sqlalchemy clauses for entries IN a list of pmids.

classmethod pmid_notin(pmid_list, filter_ids=False)[source]

Get sqlalchemy clauses for entries NOT IN a list of pmids.

classmethod pmcid_in(pmcid_list, filter_ids=False)[source]

Get the sqlalchemy clauses for entries IN a list of pmcids.

classmethod pmcid_notin(pmcid_list, filter_ids=False)[source]

Get the sqlalchemy clause for entries NOT IN a list of pmcids.

classmethod doi_in(doi_list, filter_ids=False)[source]

Get clause for looking up entities IN a list of dois.

classmethod doi_notin(doi_list, filter_ids=False)[source]

Get clause for looking up entities NOT IN a list of dois.

classmethod has_ref(id_type, id_list, filter_ids=False)[source]

Get clause for entries IN the given ID list.

classmethod not_has_ref(id_type, id_list, filter_ids=False)[source]

Get clause for entries NOT IN the given ID list

get_ref_dict()[source]

Return the refs as a dictionary keyed by type.

class indra_db.schemas.mixins.Schema(Base)[source]

General class for schemas

Indexes (indra_db.schemas.indexes)

This defines the classes needed to create and maintain indices in the database, the other part of the infrastructure of which is included in the IndraDBTable class mixin definition.