The Principal Database Client

This is the set of client tools to access the most-nearly ground truth knowledge stored on the principal database.

Access Readings and Text Content (indra_db.client.principal.content)

This defines a simple API to access the content that we store on the database for external purposes.

indra_db.client.principal.content.get_content_by_refs(db, pmid_list=None, trid_list=None, sources=None, formats=None, content_type='abstract', unzip=True)[source]

Return content from the database given a list of PMIDs or text ref ids.

Note that either pmid_list OR trid_list must be set, and only one can be set at a time.

  • db (DatabaseManager) – Reference to the DB to query

  • pmid_list (list[str] or None) – A list of pmids. Default is None, in which case trid_list must be given.

  • trid_list (list[int] or None) – A list of text ref ids. Default is None, in which case pmid list must be given.

  • sources (list[str] or None) – A list of sources to include (e.g. ‘pmc_oa’, or ‘pubmed’). Default is None, indicating that all sources will be included.

  • formats (list[str]) – A list of the formats to be included (‘xml’, ‘text’). Default is None, indicating that all formats will be included.

  • content_type (str) – Select the type of content to load (‘abstract’ or ‘fulltext’). Note that not all refs will have any, or both, types of content.

  • unzip (Optional[bool]) – If True, the compressed output is decompressed into clear text. Default: True


content_dict – A dictionary whose keys are text ref ids, with each value being the the corresponding content.

Return type


indra_db.client.principal.content.get_reader_output(db, ref_id, ref_type='tcid', reader=None, reader_version=None)[source]

Return reader output for a given text content.

  • db (DatabaseManager) – Reference to the DB to query

  • ref_id (int or str) – The text reference ID whose reader output should be returned

  • ref_type (Optional[str]) – The type of ID to look for, options include ‘tcid’ for the database’s internal unique text content ID, or ‘pmid’, ‘pmcid’, ‘doi, ‘pii’, ‘manuscript_id’ Default: ‘tcid’

  • reader (Optional[str]) – The name of the reader whose output is of interest

  • reader_version (Optional[str]) – The specific version of the reader


reading_results – A dict of reader outputs that match the query criteria, indexed first by text content id, then by reader.

Return type


Submit and Retrieve Curations (indra_db.client.principal.curation)

On our services, users have the ability to curate the results we present, indicating whether they are correct or not, and how they may be incorrect. The API for adding and retrieving that input is defined here.

indra_db.client.principal.curation.get_curations(db=None, **params)[source]

Get all curations for a certain level given certain criteria.


Return a dict of curated groundings from a given database.


db (Optional[DatabaseManager]) – A database manager object used to access the database. If not given, the database configured as primary is used.


A dict whose keys are raw text strings and whose values are dicts of DB name space to DB ID mappings corresponding to the curated grounding.

Return type


indra_db.client.principal.curation.submit_curation(hash_val, tag, curator, ip, text=None, ev_hash=None, source='direct_client', pa_json=None, ev_json=None, db=None)[source]

Submit a curation for a given preassembled or raw extraction.

  • hash_val (int) – The hash corresponding to the statement.

  • tag (str) – A very short phrase categorizing the error or type of curation.

  • curator (str) – The name or identifier for the curator.

  • ip (str) – The ip address of user’s computer.

  • text (str) – A brief description of the problem.

  • ev_hash (int) – A hash of the sentence and other evidence information. Elsewhere referred to as source_hash.

  • source (str) – The name of the access point through which the curation was performed. The default is ‘direct_client’, meaning this function was used directly. Any higher-level application should identify itself here.

  • pa_json (Optional[dict]) – The JSON of a preassembled or raw statement that was curated. If None, we will try to get the pa_json from the database.

  • ev_json (Optional[dict]) – The JSON of the evidence that was curated. This cannot be retrieved from the database if not given.

  • db (DatabaseManager) – A database manager object used to access the database.

Get Raw Statements (indra_db.client.principal.raw_statements)

Get the raw, uncleaned and un-merged Statements based on agent and type or by paper(s) of origin.

indra_db.client.principal.raw_statements.get_raw_stmt_jsons(clauses=None, db=None, max_stmts=None, offset=None)[source]

Get Raw Statements from the principle database, given arbitrary clauses.

indra_db.client.principal.raw_statements.get_raw_stmt_jsons_from_agents(agents=None, stmt_type=None, db=None, max_stmts=None, offset=None)[source]

Get Raw statement jsons from a list of agent refs and Statement type.

indra_db.client.principal.raw_statements.get_raw_stmt_jsons_from_papers(id_list, id_type='pmid', db=None, max_stmts=None, offset=None)[source]

Get raw statement jsons for a given list of papers.

  • id_list (list) – A list of ints or strs that are ids of papers of type id_type.

  • id_type (str) – Default is ‘pmid’. The type of ids given in id_list, e.g. ‘pmid’, ‘pmcid’, ‘trid’.

  • db (DatabaseManager) – Optionally specify a database manager that attaches to something besides the primary database, for example a local database instance.


result_dict – A dictionary keyed by id (of id_type) with a list of raw statement json objects as each value. Ids for which no statements are found will not be included in the dict.

Return type