Skip to content

NOTE NLP table #85

@clairblacketer

Description

@clairblacketer

Addition of NOTE NLP table and new fields in NOTE table


Proposal

Relevant table: NOTE

NOTE table additions

Field Required Type Description
note_id Yes integer A unique identifier for each note.
person_id Yes integer A foreign key identifier to the Person about whom the Note was recorded. The demographic details of that Person are stored in the PERSON table.
note_date Yes date The date the note was recorded.
note_datetime No datetime The date and time the note was recorded.
note_type_concept_id Yes integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the type, origin or provenance of the Note.
note_class_concept_id Yes integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the HL7 LOINC Document Type Vocabulary classification of the note.
note_title No varchar(250) The title of the Note as it appears in the source.
note_text No RBDMS dependent text The content of the Note.
encoding_concept_id Yes integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the note character encoding type
language_concept_id Yes integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the language of the note
provider_id No integer A foreign key to the Provider in the PROVIDER table who took the Note.
note_source_value No varchar(50) The source value associated with the origin of the note
visit_occurrence_id No integer Foreign key to the Visit in the VISIT_OCCURRENCE table when the Note was taken.

New Fields

  • note_class_concept_id: a foreign key to the CONCEPT table to describe a standardized combination of five LOINC axes (role, domain, setting, type of service, and document kind). See Section 3 for description of mapping of clinical documents to Clinical Document Ontology (CDO) and standard terminology.
  • note_title: This field represents the title of a note.
  • encoding_concept_id: a foreign key to the predefined Concept in the Standardized Vocabularies reflecting the note character encoding type. Create the concepts in the CONCEPT table for note encoding type.
  • language_concept_id: a foreign key that refers to an identifier in the CONCEPT table for the note language. Use SNOMED qualifier concepts for all major languages.

Field Changes

note_text type depends on RDBMS, not all the engines support CLOB, e.g. in MS SQL server this will be VARCHAR(MAX).

Outstanding issues

note_id - convert to BIGINT due to a large table size.
Changing identifier fields from INT to BIGINT should have to be a larger group discussion/decision as it would significantly affect all the existing implementations. We should consider whether to change all the identifier fields or a subset. CONDITION_OCCURRENCE, PROCEDURE_OCCURRENCE should be even larger tables.

NOTE_NLP table

This table will encode all output of NLP on clinical notes. Each row represents a single extracted term from a note.

Field Required Type Description
note_nlp_id Yes Big Integer A unique identifier for each term extracted from a note.
note_id Yes integer A foreign key to the Note table note the term was extracted from.
section_concept_id No integer A foreign key to the predefined Concept in the Standardized Vocabularies representing the section of the extracted term.
snippet No varchar(250) A small window of text surrounding the term.
offset No varchar(50) Character offset of the extracted term in the input note.
lexical_variant Yes varchar(250) Raw text extracted from the NLP tool.
note_nlp_concept_id No integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the normalized concept for the extracted term. Domain of the term is represented as part of the Concept table.
note_nlp_source_concept_id no integer A foreign key to a Concept that refers to the code in the source vocabulary used by the NLP system
nlp_system No varchar(250) Name and version of the NLP system that extracted the term.Useful for data provenance.
nlp_date Yes date The date of the note processing.Useful for data provenance.
nlp_datetime No datetime The date and time of the note processing. Useful for data provenance.
term_exists No varchar(1) A summary modifier that signifies presence or absence of the term for a given patient. Useful for quick querying. *
term_temporal No varchar(50) An optional time modifier associated with the extracted term. (for now “past” or “present” only). Standardize it later.
term_modifiers No varchar(2000) A compact description of all the modifiers of the specific term extracted by the NLP system. (e.g. “son has rash” → “negated=no,subject=family,certainty=undef,conditional=false,general=false”).

Term_exists
Term_exists is defined as a flag that indicates if the patient actually has or had the condition. Any of the following modifiers would make Term_exists false:

  • Negation = true
  • Subject = [anything other than the patient]
  • Conditional = true
  • Rule_out = true
  • Uncertain = very low certainty or any lower certainties

A complete lack of modifiers would make Term_exists true.

For the modifiers that are there, they would have to have these values:

  • Negation = false
  • Subject = patient
  • Conditional = false
  • Rule_out = false
  • Uncertain = true or high or moderate or even low (could argue about low)

Term_temporal
Term_temporal is to indicate if a condition is “present” or just in the “past”.

The following would be past:

  • History = true
  • Concept_date = anything before the time of the report

Term_modifiers
Term_modifiers will concatenate all modifiers for different types of entities (conditions, drugs, labs etc) into one string. Lab values will be saved as one of the modifiers. A list of allowable modifiers (e.g., signature for medications) and their possible values will be standardized later.

Mapping of clinical documents to Clinical Document Ontology (CDO) and standard terminology

HL7/LOINC CDO is a standard for consistent naming of documents to support a range of use cases: retrieval, organization, display, and exchange. It guides the creation of LOINC codes for clinical notes. CDO annotates each document with 5 dimensions:

  • Kind of Document: Characterizes the generalc structure of the document at a macro level (e.g. Anesthesia Consent)
  • Type of Service: Characterizes the kind of service or activity (e.g. evaluations, consultations, and summaries). The notion of time sequence, e.g., at the beginning (admission) at the end (discharge) is subsumed in this axis. Example: Discharge Teaching.
  • Setting: Setting is an extension of CMS’s definitions (e.g. Inpatient, Outpatient)
  • Subject Matter Domain (SMD): Characterizes the subject matter domain of a note (e.g. Anesthesiology)
  • Role: Characterizes the training or professional level of the author of the document, but does not break down to specialty or subspecialty (e.g. Physician)

Each combination of these 5 dimensions should roll up to a unique LOINC code. For example, Dentistry Hygienist Outpatient Progress note (LOINC code 34127-1) has the following dimensions:

  • According to CDO requirements, only 2 of the 5 dimensions are required to properly annotate a document: Kind of Document and any one of the other 4 dimensions.
  • However, not all the permutations of the CDO dimensions will necessarily yield an existing LOINC code.2 HL7/LOINC workforce is committed to establish new LOINC codes for each new encountered combination of CDO dimensions. 3

Automation of mapping of clinical notes to a standard terminology based on the note title is possible when it is driven by ontology (aka CDO). Mapping to individual LOINC codes which may or may not exist for a particular note type cannot be fully automated. To support mapping of clinical notes to CDO in OMOP CDM, we propose the following approach:

1. Add all LOINC concepts representing 5 CDO dimensions to the Concept table. For example:

Field Record 1 Record 2
concept_id 55443322132 55443322175
concept_name Administrative note Against medical advice note
concept_code LP173418-7 LP173388-2
vocabulary_id LOINC LOINC

2. Represent CDO hierarchy in the Concept_Relationship table using the “Subsumes” – “Is a” relationship pair. For example:

Field Record 1 Record 2
concept_id_1 55443322132 55443322175
concept_id_2 55443322175 55443322132
relationship_id Subsumes Is a

3. Add LOINC document codes to the Concept table (e.g. Dentistry Hygienist Outpatient Progress note, LOINC code 34127-1). For example:

Field Record 1 Record 2
concept_id 193240 193241
concept_name Dentistry Hygienist Outpatient Progress note Consult note
concept_code 34127-1 11488-4
vocabulary_id LOINC LOINC

4. Represent dimensions of each document concept in Concept_Relationship table by its relationships to the respective concepts from CDO. Use the “Member Of” – “Has Member” (new) relationship pair. Using example from the Dentistry Hygienist Outpatient Progress note (LOINC code 34127-1):

concept_id_1 concept_id_1 relationship_id
193240 55443322132 Member Of
55443322132 193240 Has Member
193240 55443322175 Member Of
55443322175 193240 Has Member
193240 55443322166 Member Of
55443322166 193240 Has Member
193240 55443322107 Member Of
55443322107 193240 Has Member
193240 55443322146 Member Of
55443322146 193240 Has Member

Where concept codes represent the following concepts:

Content Description
193240 Corresponds to LOINC 34127-1, Dentistry Hygienist Outpatient Progress note
55443322132 Corresponds to LOINC LP173418-7, Kind of Document = Note
55443322175 Corresponds to LOINC LP173213-2, Type of Service = Progress
55443322166 Corresponds to LOINC LP173051-6, Setting = Outpatient
55443322107 Corresponds to LOINC LP172934-4, Subject Matter Domain  = Dentistry
55443322146 Corresponds to LOINC LP173071-4, Role = Hygienist

Most of the codes will not have all 5 dimensions. Therefore, they may be represented by 2-5 relationship pairs.

5. If LOINC does not have a code corresponding to a permutation of the 5 CDO encountered in the source, this code will be generated as OMOP vocabulary code. Its relationships to the CDO dimensions will be represented exactly as those of existing LOINC concepts (as described above). If/when a proper LOINC code for this permutation is released, the old code should be deprecated. Transition between the old and new codes should be represented by “Concept replaces” – “Concept replaced by” pairs.

6. Mapping from the source data will be performed to the 2-5 CDO dimensions.

Query below finds LOINC code for Dentistry Hygienist Outpatient Progress note (see example above) that has all 5 dimensions:

SELECT FROM Concept_Relationship WHERE relationship_id = ‘Has Member’ AND (concept_id_1 = 55443322132 OR concept_id_1 = 55443322175 OR concept_id_1 = 55443322166 OR concept_id_1 = 55443322107 OR concept_id_1 = 55443322146) GROUP BY concept_ID_2
If less than 5 dimensions are available, HAVING COUNT(n) clause should be added to get a unique record at the intersection of these dimensions. n is the number of dimensions available:

SELECT FROM Concept_Relationship WHERE relationship_id = ‘Has Member’ AND (concept_id_1 = 55443322132 OR concept_id_1 = 55443322175 OR concept_id_1 = 55443322146) GROUP BY concept_ID_2 HAVING COUNT(*) = 3

To identify appropriate dimension while mapping source documents, use the following concept classes:

  • Note Provider Role
  • Note Domain
  • Note Setting
  • Note Service Type
  • Note Kind

The proposed approach will ensure that any combination of the 5 CDO dimensions encountered in the source data has a corresponding concept in the vocabulary. It will also support consistent approach to the OMOP CDM/Vocabulary conventions:

  • One required _type_concept_id field will be populated in a corresponding domain table, NOTE.
  • Vocabulary-related attributes are stored in a vocabulary data model in a uniform way
  • Usage of a standard vocabulary, LOINC, is ensured where possible
  • Introduction of new OMOP concepts when a standard does not provide adequate coverage of the source data

A similar mapping approach can be applied to labs.

Use Cases

Example 1 - Left ventricular ejection fraction

Left ventricular ejection fraction is an important indicator of heart health. It is measured during echocardiogram procedures but also during a range of various procedures. The value is frequently reported in clinical reports and has to be extracted using natural language processing.

Name Value
Note_NLP_id 123456
note_id 123446425
section_concept_id <foreign key to "Echocardiogram Report">
snippet ejection fraction was estimated at 60%
lexical_variant ejection fraction
Note_NLP_concept_id <foreign key to "Left Ventricular Ejection Fraction" concept>
NLP_system EchoExtractor_EF(v.2016)
NLP_date 3/30/16
Term_exists TRUE
Value_as_concept_id null
Value_as_number 60.0
Unit_concept_id <foreign key to "percent">
Term_temporal present
Term_modifiers null

Example 2 - eMERGE Phenoytpes

Existence of specific report or specific note section

  1. Presence of a Pathology Report [Appendicitis].
  2. Must contain at least two Past Medical History sections and Medication lists (could substitute two non-acute clinic visits or requirement for annual physical) [Hypothyroidism].
  3. At least 1 abdominal CT or colonoscopy [Diverticulosis].
  4. Patients have to have had a colonoscopy [colonPolyp].
  5. Must have at least a problem list and/or note containing non-empty (can say “none”) medication list and past medical history before or immediately after the time of the ECG [QRS].

Term/Concept mentioning in notes or specific sections

  1. Positive result of inflammation and non-inflammation concept (CUI) in post-surgical biopsy report [Appendicitis].
  2. Reported History of Appendicitis [Appendicitis].
  3. Individual’s patient chart includes one or mentions of an ADHD or hyperkinesia [ADHD].
  4. SSTI cases must have the following or similar keywords in the text results of a bacterial culture lab test, such as skin, wound, boil, abscess, but also recognizing that anatomic sites (e.g. foot/hand/leg/buttock, etc.) [caMRSA].
  5. At least on diagnosis code for C. diff and at least one affirmative mention of C. diff infection (unqualified by negation, uncertainty, or historical reference) in progress notes [CDiff].
  6. Retrieve DSM-IV Symptom criteria (Social Interaction/Communication/Behavior, Interests and Activities) terms from notes to confirm Autism [Autism].
  7. Patient has colonoscopy without positive mention of diverticulosis as control [Diverticulosis].
  8. Positive mention of HF in the problem list through either NLP or structured problem list [HeartFailure].
  9. Cases are those that have polyps in any of their colonoscopy or associated pathology reports [colonPolyp].
  10. Notes contain no evidence of heart disease concepts (NLP for notes, Problem Lists at or near ECG time, ignoring Family Medical History and Allergy sections (using section tagger), ICD9 and CPT codes at or near ECG time describing heart disease) before ECG time or within one month following [QRS].

Related terms mentioning in the same line or adjacent lines

  1. Potential cases were identified if they contained at least one term from List 1 (terms identifying an ace-inhibitor, see below) AND List 2 (terms identifying cough, see below) one the same line (e.g., sentence) within the “Allergy section”, “Medication section” or within the entire “Patient summary section” of the EMR [ACEIcough].
  2. At least one non-negated “Disorder related terms” mention and “Anatomical site related terms” mention either in the SAME or adjacent sentences in a ‘section of interest’ [VTE].

Numeric values with/without temporal constraints

  1. Exclude all patients with an Ejection Fraction (EF or LVEF) <35% within 1 year before or after meeting the CASE 1 definition [ResHTN].
  2. Have evidence from a carotid imaging study of >50% carotid artery stenosis (at least unilaterally) [CAAD].
  3. Classify the type of HF using the numeric EF results (use the lowest EF recorded in the time window) [HeartFailure].
  4. In defining “Normal” ECG, QRSd between 65-120ms, ECG designed as “NORMAL”, Heart Rate between 50-100, ECG Impression must not contain evidence of heart disease concepts [QRS].

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions