Origin of the Boolean Model of Information Retrieval

Simple question, but I can't really find the answer to that: Who "invented" Boolean Retrieval? Of course, I assume that the concept grew over time, but is there a paper or publication that mentions/defines the Boolean Model as a whole for the first time? On Wikipedia, the book by Lancaster and Fayen (1973) is cited, but I couldn't find any definition in there, either.
Category: Data Science

How would you explain Data Science to someone in simple layman terms?

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains. But if I want to explain it to someone in simple layman terms, I can't explain the concept of Data Science to a layman or grandma. Inspired by this post & the below quote You do not really understand something …
Category: Data Science

Definition adjectives for clustering

For a school project, I need to explain which clustering algorithm of Scikit-Learn we need to use based on the input data. The documentation is very well done, especially thanks to a comparative table of algos, but I have trouble understanding some adjectives. I would very much appreciate some definitions for the following terms : Flat / non-flat geometry Even / non-even cluster size Inductive / Transductive In addition, does "Not scalable" mean that the algorithm is not efficient when …
Category: Data Science

What is the name of this technique involving tracking cumulative errors with a forgiveness parameter?

I'm looking for the name of a technique I've seen used before. Most common in time-series based anomaly detection. It involves keeping a running total of consecutive "error" amounts, generally the difference from a prediction or baseline, and then reacting when the cumulative amount exceeds a specific tolerance level. There needs to be a "forgiveness" amount in this technique that reduces the cumulative error each iteration, to avoid a lot of small errors from eventually stacking up and flipping the …
Category: Data Science

What's The Difference Between The Terms Predictor And Feature

For the term 'predictor', I found the following definition: Predictor Variable: One or more variables that are used to determine or predict the target variable. Whereas Wikipedia contains the following definition of the word 'feature': Feature is an individual measurable property or characteristic of a phenomenon being observed. What is the difference between 'predictor' and 'feature' in machine learning?
Category: Data Science

What does anneal mean in the context of machine learning?

An article released by Open AI gives an overview of how Open AI Five works. There is a paragraph in the article stating: Our agent is trained to maximize the exponentially decayed sum of future rewards, weighted by an exponential decay factor called γ. During the latest training run of OpenAI Five, we annealed γ from 0.998 (valuing future rewards with a half-life of 46 seconds) to 0.9997 (valuing future rewards with a half-life of five minutes). Does annealing in …
Category: Data Science

Is Data Science the Same as Data Mining?

I am sure data science as will be discussed in this forum has several synonyms or at least related fields where large data is analyzed. My particular question is in regards to Data Mining. I took a graduate class in Data Mining a few years back. What are the differences between Data Science and Data Mining and in particular what more would I need to look at to become proficient in Data Mining?
Category: Data Science

What is difference between Standard Normal Distribution and Mean Normalization approaches to feature-scaling?

The tag feature-scaling seems to convey that one of the scaling methods is Standard Normal Distribution. Further, I read an Answer on this site saying that Mean Normalization is a form of feature scaling. What is the difference between two approaches to scaling? Note: I think that statistics and mathematics of normalization do differ.
Category: Data Science

What does it mean when we say an algorithm/metric is agnostic

Problem I have all kinds of machine learning terms that co-occur with the word "agnostic", including model-agnostic learning, model-agnostic metric. From the dictionary, it explains the word "agnostic" in the following way a person who holds the view that any ultimate reality (such as God) is unknown and probably unknowable. which does not make those terms more understandable. In some contexts, I find that "agnostic" refer to "generic" or "free of". For example, in the paper I am reading now, …
Topic: definitions
Category: Data Science

Is it correct to define the F-measure as the harmonic mean of specificity and sensitivity in such a way?

It is common to define the F-measure as a function of precision and recall, as mentioned in [1]: $F_{\beta}=\frac{(1+\beta^2)PR}{\beta^2 P+R}$ However I came across some other cases, another definition is used [2] (without weights): $F = H(sensitivity, 1- specificity)$ Where H is harmonic mean. Reference: F - measure derivation (harmonic mean of precision and recall) https://link.springer.com/chapter/10.1007/978-3-540-68947-8_133. https://stackoverflow.com/a/52892413/2243842
Category: Data Science

Is the _error_ in the context of ML always just the difference of predictions and targets?

Simple definitional question: In the context of machine learning, is the error of a model always the difference of predictions $f(x) = \hat{y}$ and targets $y$? Or are there also other definitions of error? I looked into other posts on this, but they are not sufficiently clear. See my comment to the answer in this post: What's the difference between Error, Risk and Loss?
Category: Data Science

What is the relationship between AI and data science?

I think they share a lot (e.g. machine learning is a subset of both, right?), but maybe both have elements the other doesn't have? Could you name some in that case? Or is one a subset of the other? What is the relationship between AI and data science? For example, when it comes to the relationship of AI and ML, I always say AI is a superset of ML. And the distinguishing set is search algorithms, which I would include …
Category: Data Science

What fits in a Data Description Report/ Data Exploration Report?

So I am trying to get familiar with Crisp-DM and found the terms "Data Description Report" and "Data Exploration Report", which seem oddly vague in their definition. So far I only found this right here: https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.crispdm.help/crisp_data_description_report.htm But this seems to be on the shorter end in my opinion. Is there any example of a Data Description Report anywhere? If not, is there any systematic methodology you personally use to record your findings while trying to understand data?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.