Hyperdata and Semantic Search Engines


Hyperdata refers to the means by which a dataset is linked to other datasets housed in other locations or information silos, much in the same way that hypertext indicates the relationship between texts scattered throughout the internet. Hyperdata strategies make it possible to condense data into a “network of data”, also known as a Knowledge Graph, the name for the ensemble of data linked using a hyperdata strategy.

A hyperdata link always refers to an entity and, in fact, names it. It may, for example, refer to a [physical thing], such as an [artwork] (“Las Meninas”, for example); to a person (Velázquez, the artist of the aforementioned work); or even an exhibit in which this work has been displayed, the restorations or changes it has seen over time, or a description of its elements (the people, fauna, flora or places represented...).

A hypertext link indicates that there is a connection between two documents; a hyperdata link goes further and expressly marks the semantic relationship of a specific connection class. In other words, thanks to hyperdata, systems are able to know and process the relations between the entities that link two documents, thereby making it even easier for people to recognise them. Unlike hypertext-based strategies, strategies based on hyperdata do not leave it to human beings to resolve the problem of recognising the significant relationships within a set of connected resources. Because hyperdata allow systems to process this class of relations between entities, people are able to query and interpret vast quantities of information that are meaningfully linked in a graph by the systems.

Semantic Search Engines and Knowledge Graphs

A semantic search engine could technically be defined as a search engine that traces hyperdata links. In practice, a set of hyperdata links within an ensemble of resources constitutes a knowledge graph. It follows that a semantic search engine is a search engine that makes it possible to navigate a knowledge graph.

As the front of a sheet of paper is to its back, so the architecture of the Semantic Web is to the format of a web document (usually HTML). These documents are what web crawlers such as Google, among others, traditionally use; this is where they search. A semantic search engine based on hyperdata needs RDF files because these are the means by which the Semantic Web represents entities and, consequently, enable their navigation.

A Knowledge Graph based on hyperdata makes it possible to perform conversational searches using natural language. For example, a set from the graph or hyperdata can be restricted by querying only the hyperdata that meet a given condition, such as, in the case of the Prado Museum graph, having been painted in a certain year, or belonging to a particular school of art. Furthermore, a search engine that uses hyperdata is able not only to restrict the scope of its search, but also to process the exact number of relationships for a specified set of resources, and their classes. This type of semantic search is called a faceted search with summarization. A search of this type additionally allows for queries or restrictions to be aggregated or iterated, thus emulating the manner in which people naturally reason. In the previous example, a second layer could be added to the results generated by our query for artwork in the Prado that belonged to a certain time period and school; the depiction of a certain theme could be queried, for example hunting, or a specific object, a shotgun, or perhaps a given animal, let’s say a dog. What would finally be sought in this case are pieces of art at the Prado that deal with hunting in which shotguns and dogs are additionally shown, the artwork being from a certain period, and in this example, from Spain. Insofar as machines are able to understand the world of entities used by people, they restrict the number of results generated; the answers to our questions thereby become precise and semantically relevant.