Entity disambiguation: beyond recognition PHASE 1
Named entity recognition (NER) identifies mentions of people, organisations, places, and relevant concepts within unstructured texts using natural language processing techniques. Once these mentions are detected, disambiguation determines which specific entity is being referred to, distinguishing between homonyms, linking it to unique identifiers, and connecting it to its corresponding regulatory and ontological context.
Argos resolves ambiguity through four integrated mechanisms that transform textual mentions into unambiguous references to structured entities:
1. Contextual analysis
The system processes the surrounding syntactic and semantic structures to infer the correct identity of each entity. Detecting a term is not enough: it is necessary to determine which specific referent it points to within its context of use.
Contextual signals include syntactic position, adjacent terms, thematic markers, and cross-references within the document. The system evaluates these signals in combination to resolve cases where the same term designates different entities depending on the discursive context.
When "Supreme Court" appears, Argos determines whether the reference points to the Spanish, Mexican, or another supreme court based on the procedural context surrounding the mention.
2. Knowledge graph integration
Entities are not matched against text strings but against structured knowledge representations. Each disambiguation links to canonical resources that ensure semantic consistency across document collections.
"Banco Santander" is linked to tax ID A39000013, connected to CNMV supervision, and its regulatory obligations mapped under specific frameworks, distinguishing the Spanish multinational from unrelated homonyms.
3. Confidence scoring
Each entity resolution carries a probabilistic confidence score derived from contextual signals, ontological coherence, and corpus validation. The system quantifies the certainty of each disambiguation.
Ambiguous cases are flagged for review. If a document mentions "Act 3/2014" without additional context and multiple acts share that number across different jurisdictions, the system assigns a low score and marks the case for validation.
4. Temporal awareness
Entities evolve: information is amended, organisations are restructured, frameworks change. The system maintains temporal validity, recognising that relationships between entities vary according to the point in time being referenced. Each entity is associated with temporal validity ranges.
Argos recognises that "Act 3/2014" amends "RDL 1/2007" and adjusts the relationships between entities accordingly, enabling queries that retrieve the regulatory state before or after each amendment.