The Readonly Client

Here are our primary tools intended for retrieving Statements, in particular Pre-Assembled (PA) Statements, from the readonly database. This is some of the most heavily optimized access code in the repo, and is the backbone of most external or outward facing applications.

The readonly database, as the name suggests, is designed to take only read requests, and is updated via dump only once a week. This allows users of our database to access it even as we perform daily updates on the principal database, without worrying about queries interfering.

Construct composable queries (indra_db.client.readonly.query)

This is a sophisticated system of classes that can be used to form queires for preassembled statements from the readonly database.

class indra_db.client.readonly.query.Query(empty=False, full=False)[source]

The core class for all queries; not functional on its own.

copy()[source]

Get a _copy of this query.

invert()[source]

A useful way to get the inversion of a query in order of operations.

When chain operations, ~q is evaluated after all . terms. This allows you to cleanly bypass that issue, having:

HasReadings().invert().get_statements(ro)

rather than

(~HasReadings()).get_statements()

which is harder to read.

set_print_only(print_only)[source]

Choose to only print the SQL and not execute it.

This is very useful for debugging the SQL queries that are generated.

get_statements(ro=None, limit=None, offset=None, sort_by='ev_count', ev_limit=None, evidence_filter=None) Optional[StatementQueryResult][source]

Get the statements that satisfy this query.

Parameters
  • ro (DatabaseManager) – A database manager handle that has valid Readonly tables built.

  • limit (int) – Control the maximum number of results returned. As a rule, unless you are quite sure the query will result in a small number of matches, you should limit the query.

  • offset (int) – Get results starting from the value of offset. This along with limit allows you to page through results.

  • sort_by (str) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter.

  • ev_limit (int) – Limit the number of evidence returned for each statement.

  • evidence_filter (None or EvidenceFilter) – If None, no filtering will be applied. Otherwise, an EvidenceFilter class must be provided.

Returns

result – An object holding the JSON result from the database, as well as the metadata for the query.

Return type

StatementQueryResult

get_hashes(ro=None, limit=None, offset=None, sort_by='ev_count', with_src_counts=True) Optional[QueryResult][source]

Get the hashes of statements that satisfy this query.

Parameters
  • ro (DatabaseManager) – A database manager handle that has valid Readonly tables built.

  • limit (int) – Control the maximum number of results returned. As a rule, unless you are quite sure the query will result in a small number of matches, you should limit the query.

  • offset (int) – Get results starting from the value of offset. This along with limit allows you to page through results.

  • sort_by (str) – ‘ev_count’ or ‘belief’: select the parameter by which results are sorted.

  • with_src_counts (bool) – Choose whether source counts are included with the result or not. The default is True (included), but the query may be marginally faster with source counts excluded (False).

Returns

result – An object holding the results of the query, as well as the metadata for the query definition.

Return type

QueryResult

get_interactions(ro=None, limit=None, offset=None, sort_by='ev_count') Optional[QueryResult][source]

Get the simple interaction information from the Statements metadata.

Each entry in the result corresponds to a single preassembled Statement, distinguished by its hash.

Parameters
  • ro (DatabaseManager) – A database manager handle that has valid Readonly tables built.

  • limit (int) – Control the maximum number of results returned. As a rule, unless you are quite sure the query will result in a small number of matches, you should limit the query.

  • offset (int) – Get results starting from the value of offset. This along with limit allows you to page through results.

  • sort_by (str) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter.

get_relations(ro=None, limit=None, offset=None, sort_by='ev_count', with_hashes=False) Optional[QueryResult][source]

Get the agent and type information from the Statements metadata.

Each entry in the result corresponds to a relation, meaning an interaction type, and the names of the agents involved.

Parameters
  • ro (DatabaseManager) – A database manager handle that has valid Readonly tables built.

  • limit (int) – Control the maximum number of results returned. As a rule, unless you are quite sure the query will result in a small number of matches, you should limit the query.

  • offset (int) – Get results starting from the value of offset. This along with limit allows you to page through results.

  • sort_by (str) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter.

  • with_hashes (bool) – Default is False. If True, retrieve all the hashes that fit within each relational grouping.

get_agents(ro=None, limit=None, offset=None, sort_by='ev_count', with_hashes=False, complexes_covered=None) Optional[QueryResult][source]

Get the agent pairs from the Statements metadata.

Each entry is simply a pair (or more) of Agents involved in an interaction.

Parameters
  • ro (Optional[DatabaseManager]) – A database manager handle that has valid Readonly tables built.

  • limit (Optional[int]) – Control the maximum number of results returned. As a rule, unless you are quite sure the query will result in a small number of matches, you should limit the query.

  • offset (Optional[int]) – Get results starting from the value of offset. This along with limit allows you to page through results.

  • sort_by (str) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter.

  • with_hashes (bool) – Default is False. If True, retrieve all the hashes that fit within each agent pair grouping.

  • complexes_covered (Optional[set]) – The set of hashes for complexes that you have already seen and would like skipped.

to_json() dict[source]

Get the JSON representation of this query.

classmethod from_simple_json(json_dict)[source]

Generate a proper query from a simplified JSON.

list_component_queries() list[source]

Get a list of the query elements included, in no particular order.

build_hash_query(ro, type_queries=None)[source]

[Internal] Build the query for hashes.

is_inverse_of(other)[source]

Check if a query is the exact opposite of another.

class indra_db.client.readonly.query.Intersection(query_list)[source]

The Intersection of multiple queries.

Baring special handling, this is what results from q1 & q2.

NOTE: the inverse of an Intersection is a Union (De Morgans’s Law)

ev_filter()[source]

Get an evidence filter composed of the “and” of sub-query filters.

is_inverse_of(other)[source]

Check if this query is the inverse of another.

class indra_db.client.readonly.query.Union(query_list)[source]

The union of multiple queries.

Baring special handling, this is generally the result of q1 | q2.

NOTE: the inverse of a Union is an Intersection (De Morgans’s Law)

ev_filter()[source]

Get an evidence filter composed of the “or” of sub-query filters.

is_inverse_of(other)[source]

Check if this query is the inverse of another.

class indra_db.client.readonly.query.MergeQuery(query_list, *args, **kwargs)[source]

This is the parent of the two merge classes: Intersection and Union.

This class of queries is extremely special, in that the “table” is actually constructed on the fly. This presents various subtle challenges. Moreover an intersection/union is an expensive process, so I go to great lengths to minimize its use, making the __init__ methods quite hefty. It is also in Intersections and Unions that full and empty states are most likely to occur, and in some wonderfully subtle and hard to find ways.

class indra_db.client.readonly.query.HasAgent(agent_id=None, namespace='NAME', role=None, agent_num=None)[source]

Get Statements that have a particular agent in a particular role.

NOTE: At this time 2 agent queries do NOT necessarily imply that the 2 agents are different. E.g. `HasAgent("MEK") & HasAgent("MEK")` will get any Statements that have agent with name MEK, not Statements with two agents called MEK. This may change in the future, however in the meantime you can get around this fairly well by specifying the roles:

>>> HasAgent("MEK", role="SUBJECT") & HasAgent("MEK", role="OBJECT")

Or for a more complicated case, consider a query for Statements where one agent is MEK and the other has namespace FPLX. Naturally any agent labeled as MEK will also have a namespace FPLX (MEK is a famplex identifier), and in general you will not want to constrain which role is MEK and which is the “other” agent. To accomplish this you need to use `|`:

>>> (
>>>   HasAgent("MEK", role="SUBJECT")
>>>   & HasAgent(namespace="FPLX", role="OBJECT")
>>> ) | (
>>>   HasAgent("MEK", role="OBJECT")
>>>   & HasAgent(namespace="FPLX", role="SUBJECT")
>>> )
Parameters
  • agent_id (Optional[str]) – The ID string naming the agent, for example ‘ERK’ (FPLX or NAME) or ‘plx’ (TEXT), and so on. If None, the query must then be constrained by the namespace. (Default is None)

  • namespace (Optional[str]) – By default, this is NAME, indicating the canonical name of the agent. Other options for namespace include FPLX (FamPlex), CHEBI, CHEMBL, HGNC, UP (UniProt), TEXT (for raw text mentions), and many more. If you use the namespace AUTO, GILDA will be used to try and guess the proper namespace and agent ID. If agent_id is None, namespace must be specified and must not be NAME, TEXT, or AUTO.

  • role (Optional[str]) – Options are “SUBJECT”, “OBJECT”, or “OTHER”. (Default is None)

  • agent_num (Optional[int]) – The regularized position of the agent in the Statement’s list of agents. (Default is None)

class indra_db.client.readonly.query.FromMeshIds(mesh_ids: list)[source]

Find Statements whose text sources were given one of a list of MeSH IDs.

This object can be constructed from a list of mixed “D” and “C” type mesh IDs, but for reasons of querying, those IDs will be separated into two separate classes and a Union of the two classes returned.

Parameters

mesh_ids (list) – A canonical MeSH ID, of the “C” or “D” variety, e.g. “D000135”.

mesh_ids

The immutable tuple of mesh IDs, on their original string form.

Type

tuple

_mesh_type

“C” or “D” indicating which types of IDs are held in this object.

Type

str

_mesh_nums

The mesh IDs converted to integers, stripped of their prefix.

Type

list[int]

ev_filter()[source]

Get an evidence filter to enforce mesh constraints at ev level.

class indra_db.client.readonly.query.HasHash(stmt_hashes)[source]

Find Statements from a list of hashes.

Parameters

stmt_hashes (list or set or tuple) – A collection of integers, where each integer is a shallow matches key hash of a Statement (frequently simply called “mk_hash” or “hash”)

class indra_db.client.readonly.query.HasSources(sources)[source]

Find Statements that include a set of sources.

For example, find Statements that have support from both medscan and reach.

Parameters

sources (list or set or tuple) – A collection of strings, each string the canonical name for a source. The result will include statements that have evidence from ALL sources that you include.

class indra_db.client.readonly.query.HasOnlySource(only_source)[source]

Find Statements that come exclusively from a particular source.

For example, find statements that come only from sparser.

Parameters

only_source (str) – The only source that spawned the statement, e.g. signor, or reach.

class indra_db.client.readonly.query.HasReadings[source]

Find Statements that have readings.

class indra_db.client.readonly.query.HasDatabases[source]

Find Statements that have databases.

class indra_db.client.readonly.query.SourceQuery(empty=False, full=False)[source]

The core of all queries that use SourceMeta.

class indra_db.client.readonly.query.SourceIntersection(source_queries)[source]

A special type of intersection between children of SourceQuery.

All SourceQuery queries use the same table, so when doing an intersection it doesn’t make sense to do an actual intersection operation, and instead simply apply all the filters of each query to build a normal multi- conditioned query.

is_inverse_of(other)[source]

Check if this query is the inverse of another.

class indra_db.client.readonly.query.HasType(stmt_types, include_subclasses=False)[source]

Find Statements that are one of a collection of types.

For example, you can find Statements that are Phosphorylations or Activations, or you could find all subclasses of RegulateActivity.

NOTE: when used in an Intersection with other queries, type is handled specially, with each sub query having a type constraint added to it.

Parameters
  • stmt_types (set or list or tuple) – A collection of Strings, where each string is a class name for a type of Statement. Spelling and capitalization are necessary.

  • include_subclasses (bool) – (optional) default is False. If True, each Statement type given in the list will be expanded to include all of its sub classes.

item_type

alias of str

class indra_db.client.readonly.query.IntrusiveQuery(value_list)[source]

This is the parent of all queries that draw on info in all meta tables.

Thus, when using these queries in an Intersection, they are applied to each sub query separately.

class indra_db.client.readonly.query.HasNumAgents(agent_nums)[source]

Find Statements with any one of a listed number of agents.

For example, HasNumAgents([1,3,4]) will return agents with either 2, 3, or 4 agents (the latter two mostly being complexes).

NOTE: when used in an Interaction with other queries, the agent numbers are handled specially, with each sub-query having an agent_count constraint applied to it.

Parameters

agent_nums (tuple) – A list of integers, each indicating a number of agents.

item_type

alias of int

class indra_db.client.readonly.query.HasNumEvidence(evidence_nums)[source]

Find Statements with one of a given number of evidence.

For example, HasNumEvidence([2,3,4]) will return Statements that have either 2, 3, or 4 evidence.

NOTE: when used in an Interaction with other queries, the evidence count is handled specially, with each sub-query having an ev_count constraint added to it.

Parameters

evidence_nums (tuple) – A list of numbers greater than 0, each indicating a number of evidence.

item_type

alias of int

class indra_db.client.readonly.query.FromPapers(paper_list)[source]

Find Statements that have evidence from particular papers.

Parameters

paper_list (list[(<id_type>, <paper_id>)]) – A list of tuples, where each tuple indicates and id-type (e.g. ‘pmid’) and an id value for a particular paper.

class indra_db.client.readonly.query.EvidenceFilter(filters=None, joiner='and')[source]

Object for handling filtering of evidence.

We need to be able to perform logical operations between evidence to handle important cases:

  • HasSource(['reach']) & FromMeshIds(['D0001']): we might reasonably want to filter evidence for the second subquery but not the first.

  • HasOnlySource(['reach']) & FromMeshIds(['D00001']): Here we would likely want to filter the evidence for both sub queries.

  • HasOnlySource(['reach']) | FromMeshIds(['D000001']): It is not clear what this even means (its purpose) or what we’d do for evidence filtering when the original statements are or’ed

  • HasDatabases() & FromMeshIds(['D000001']): Here you COULDN’T perform an & on the evidence, because the two sources are mutually exclusive (only readings connect to mesh annotations). However it could make sense you would want to do an “or” between the evidence, so the evidence is either from a database or from a mesh annotated document.

Both “filter all the evidence” and “filter none of the evidence” should definitely be options. Although “Filter for all” might run into uses with the “HasDatabase and FromMeshIds” scenario. I think no evidence filter should be the default, and if you attempt a bogus “filter all evidence” (as with that scenario) you get an error.

class indra_db.client.readonly.query.FromAgentJson(agent_json, stmt_type=None, hashes=None)[source]

A Very special type of query that is used for digging into results.

class indra_db.client.readonly.query.HasEvidenceBound(evidence_bounds: Iterable[Union[str, Bound]])[source]

Find Statements that fit given evidence bounds.

A list of bounds will be combined using the logic of “or”, so [“<1”, “>3”] will return Statements that are _either_ less than 1 OR greater than 3.

Parameters

evidence_bounds – An iterable containing bounds for the evidence support of Statements to be returned, such as Bound(”< 10”) or simply “< 10” (the string will be parsed into a Bound object, if possible).