How can I store sources, effective dates, and confidence for every property in a knowledge graph?

What I am wanting to do is ensure that every property in a knowledge base comes from at least one source. I would like to ensure that every edge is spawned (or at least explained) by some event, like a "claim" or "measurement" or "birth." I'd like to rate on a scale the confidence that some property is correct, which could also be inherited from the source's confidence rating. Finally, I want to ensure that effective date(s) are known or …
Category: Data Science

How to find union of nodes in rdflib knowledge graph

Little background on Work : I am working with ontologies and for my usecase I have to apply random walk on the ontology nodes/entities. In order to do the same I have written one function - that given a node it will output all its immediate neighbour nodes. Actual Problem : Recently I came across an ontology which along with normal nodes in the graph also has "union of other nodes" as an entity. But while going over the triple's …
Category: Data Science

How to unify weights in my dataset

I have a symptom-disease network that consists of four attributes: symptom, disease, co-occurrence and TF-IDF. I'm considering the TF-IDF attribute as the weight of my network edges and symptom and disease attributes as nodes of my network. But I think there is a problem that should be addressed. here are some records of my dataset: MeSH Symptom Term MeSH Disease Term PubMed occurrence TFIDF score "Aging, Premature" Scoliosis 1 3.46455149 "Aging, Premature" HIV Infections 3 10.39365447 Weight Loss Scoliosis 1 …
Category: Data Science

what is production selection probability in ACT-R?

I have a problem with declarative and production in instance based learning based on ACT-R. I have a dataset. each record is a instance with some features and label. I want to give payoff for final decision. for example if label is 1 and decision is 0 I give payoff -5. to the best of my knowledge this payoff is for production. each record in my dataset is a chunk so what are the productions in my problem? how can …
Category: Data Science

How should I build a Knowledge Graph for a custom dataset?

I'm new to machine learning and I'm trying to create a small Knowledge Graph for search purposes similar to Google for a class project. Okay, so I have been searching on this topic for few days and this is what I have found from the web and research papers. Create RDF triples or use already existing databases like Freebase, Wikidata, etc. Then train the model using some algorithms like ComplEx, TransE, etc. And finally use it for the queries. My …
Category: Data Science

First steps on a new cleaned dataset

What is the very first thing you do when you get your hands on a new data set (assuming it is cleaned and well structured)? Please share sample code snippets as I am sure this would be extremely helpful for both beginners and experienced.
Category: Data Science

An algorithm for Automatic Tag Clustering

Out website dinf is somewhat like StackExchange: people are submitting small definitions of concepts. We would like to automatically assign those concepts into 'Topics'. The problem is that dinf by default limits any definition to max of 500 characters. Which algorithm / module we can use to assign those concepts assuming that all topics are known in advanced?
Category: Data Science

What are the use cases for Apache Spark vs Hadoop

With Hadoop 2.0 and YARN Hadoop is supposedly no longer tied only map-reduce solutions. With that advancement, what are the use cases for Apache Spark vs Hadoop considering both sit atop of HDFS? I've read through the introduction documentation for Spark, but I'm curious if anyone has encountered a problem that was more efficient and easier to solve with Spark compared to Hadoop.
Category: Data Science

Options to find the most similar question in a dataset of question-answer pairs?

I am building a chatbot that will only handle FAQs, but these FAQs are very specific to an organisation, so I cannot use any existing off-the-shelf solutions, or connect to question-answering APIs. I have a dataset which consists of questions, intents, and answers. Let's say there are 100 intents, which basically group questions into general categories (e.g. fee_payment). Each intent has 50 different specific answers (e.g. 'Fees are paid through the online portal' or 'Fees are due on the 1st …
Category: Data Science

Are ontologies and the Semantic Web dead?

Is the Semantic Web dead? Are ontologies dead? I am developing a work plan for my thesis about "A knowledge base through a set ontology for interest groups around wetlands". I have been researching and developing ontologies for it but I am still unclear about many things. What is the modeling language for ontologies? Which methodology for ontologies is better? OTK or METHONTOLOGY? Is there any program that does as does Cratilo is a software for analyzing of textual corpora …
Category: Data Science

unsupervised learning in medical systems and intelligent systems?

I have a dataset which belongs to a hospital. It contains data about patients and healthy people. The problem is separating healthy ones from patients. I add some new features to dataset to solve this problem. When I reduce the dimensions of data including the new features and visualize the data, the patient and healthy individuals are distinguishable(visually separable). Now if one asks what is the relation between the used approach (feature extraction, visualization, using the human ability, unsupervised methods) …
Category: Data Science

What knoweldge framework supports this type of query?

I have knowledge that is organized as in the example below, where items or nodes can belong to multiple hierarchies and can have arbitrary numbers of children and parents: sports golf baseball tennis equipment golf clubs club1 baseball bats bat1 tennis racket racket1 I need to be able to run queries like "Show me all sports that use clubs or rackets, and their associated equipment." The queries do not need to be posed in natural language, but the results must …
Category: Data Science

Ontologies with user interface

I am working on the development of an ontology for my dissertation project. I have read plenty of resources and tutorials on how to develop ontologies, why are they useful and how to use them. I've been trying to find examples of ontologies that include user interface such as textboxes, buttons etc. to study them for the development of my ontology. I checked databases such as ontoligua for similar projects but I didn't find anything to relate to. I would …
Category: Data Science

Algorithm/Analysis that utilises incremental information?

I am looking for techniques which utilise information in an incremental manner. Example: A day with inclement climate is likely to be followed by another day with inclement climate. Or when an entire dataset sorted by date is available, an algorithm able to identify that a person is likely to call in sick when the previous day had bad weather. Is there any effective analysis which utilises this prior information where not all events are completely independent?
Category: Data Science

Debugging Neural Network for (Natural Language) Tagging

I've been coding a Neural Network for recognizing and tagging parts of speech in English (written in Java). The code itself has no 'errors' or apparent flaws. Nevertheless, it is not learning -- the more I train it does not change its ability to predict the testing data. The following is information about what I've done, please ask me to update this post if I left something important out. I wrote the neural network and tested it on several different …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.