How should I build a Knowledge Graph for a custom dataset?

I'm new to machine learning and I'm trying to create a small Knowledge Graph for search purposes similar to Google for a class project.

Okay, so I have been searching on this topic for few days and this is what I have found from the web and research papers.

  1. Create RDF triples or use already existing databases like Freebase, Wikidata, etc.
  2. Then train the model using some algorithms like ComplEx, TransE, etc.
  3. And finally use it for the queries.

My problem is that I don't want to use an already existing database. I have a set of documents with me. Is there any good library for making triples from custom data?

Also after training my model, which database should I use to store my model and how to query it back for answers.

Topic knowledge-base neural-network python machine-learning

Category Data Science


You can build a Knowledge Graph from existing data sources too using RML. RML stands for RDF Mapping Language and allows you to transform heterogeneous data sources into RDF and extends R2RML. I created an example with JSON data as a data source, but other formats are possible such as CSV, XML, relational databases, etc.

RML (RDF Mapping Language)

Turning your existing data sources into RDF goes as followed with RML:

  1. Write RML mapping rules which instruct an RML processor how the data should be transformed into RDF. You can find many example in the RML documentation how to write your mapping rules for your data sources.

RML rules consist of Triples Maps which on its own turn have the following parts:

Logical Source

rml:logicalSource [
    rml:source "people.json" ;
    rml:referenceFormulation ql:JSONPath ;
    rml:iterator "$.people.[*]" ; 
] ;

The data source people.json is accessed using JSONPath expressions defined as $.people.[*]. The expression allows the RML processor to iterate over the JSON data.

Subject Maps

rr:subjectMap [
    rr:template "http://ex.com/Person/{firstname}_{lastname}" ;
    rr:class foaf:Person ; 
] ;

Every subject created by this SubjectMap will look like http://ex.com/Person/{firstname}_{lastname} where firstname and lastname are replaced by the corresponding JSON values during the execution of an RML processor. The subject has the class foaf:Person.

Predicate-Object Maps

rr:predicateObjectMap [
        rr:predicate foaf:givenName ;
        rr:objectMap [ 
            rml:reference "firstname" ; 
        ] 
    ] ;

This map generates a predicate foaf:givenName where the object will receive the JSON value firstname during the mapping process.

  1. Execute your mapping rules with an RML processor. An example of an RML processor is the RML Mapper or the RML Streamer. Other RML processors can be used too, if they comply with the RML specification.

An RML processor will generate the following triples from the JSON data based on the mapping rules shown previously:

<http://ex.com/Person/John_Doe> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>.
<http://ex.com/Person/John_Doe> <http://xmlns.com/foaf/0.1/givenName> "John".
<http://ex.com/Person/John_Doe> <http://xmlns.com/foaf/0.1/familyName> "Doe".
<http://ex.com/Person/Jane_Smith> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>.
<http://ex.com/Person/Jane_Smith> <http://xmlns.com/foaf/0.1/givenName> "Jane".
<http://ex.com/Person/Jane_Smith> <http://xmlns.com/foaf/0.1/familyName> "Smith".
<http://ex.com/Person/Sarah_Bladinck> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>.
<http://ex.com/Person/Sarah_Bladinck> <http://xmlns.com/foaf/0.1/givenName> "Sarah".
<http://ex.com/Person/Sarah_Bladinck> <http://xmlns.com/foaf/0.1/familyName> "Bladinck".

Full example

I created a small demo to create a FOAF Person with a first and last name:

RML Mapping rules

@base <http://example.com> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<#PersonMapping>
    a rr:TriplesMap ;
    rml:logicalSource [
        rml:source "people.json" ;
        rml:referenceFormulation ql:JSONPath ;
        rml:iterator "$.people.[*]" ; 
    ] ;

    rr:subjectMap [
        rr:template "http://ex.com/Person/{firstname}_{lastname}" ;
        rr:class foaf:Person ; 
    ] ;

    rr:predicateObjectMap [
        rr:predicate foaf:givenName ;
        rr:objectMap [ 
            rml:reference "firstname" ; 
        ] 
    ] ;

    rr:predicateObjectMap [
        rr:predicate foaf:familyName ;
        rr:objectMap [ 
            rml:reference "lastname" ; 
        ] 
    ] .

JSON data

{
    "people": [
        {
            "firstname": "John",
            "lastname": "Doe"
        },
        {
            "firstname": "Jane",
            "lastname": "Smith"
        },
        {
            "firstname": "Sarah",
            "lastname": "Bladinck"
        }
    ]
}

Note: I contribute to RML and its technologies.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.