Simple question, but I can't really find the answer to that: Who "invented" Boolean Retrieval? Of course, I assume that the concept grew over time, but is there a paper or publication that mentions/defines the Boolean Model as a whole for the first time? On Wikipedia, the book by Lancaster and Fayen (1973) is cited, but I couldn't find any definition in there, either.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains. But if I want to explain it to someone in simple layman terms, I can't explain the concept of Data Science to a layman or grandma. Inspired by this post & the below quote You do not really understand something …
I went through this comparison of analytic disciplines and this perspective of machine learning, but I am not finding any answers on the following: How is Data Science related to Machine learning? How is it not related to Machine Learning?
For a school project, I need to explain which clustering algorithm of Scikit-Learn we need to use based on the input data. The documentation is very well done, especially thanks to a comparative table of algos, but I have trouble understanding some adjectives. I would very much appreciate some definitions for the following terms : Flat / non-flat geometry Even / non-even cluster size Inductive / Transductive In addition, does "Not scalable" mean that the algorithm is not efficient when …
I would like to know the difference in terms of applications (e.g. which one is credit card fraud detection?) and in terms of used techniques. Example papers which define the task would be welcome.
I'm looking for the name of a technique I've seen used before. Most common in time-series based anomaly detection. It involves keeping a running total of consecutive "error" amounts, generally the difference from a prediction or baseline, and then reacting when the cumulative amount exceeds a specific tolerance level. There needs to be a "forgiveness" amount in this technique that reduces the cumulative error each iteration, to avoid a lot of small errors from eventually stacking up and flipping the …
For the term 'predictor', I found the following definition: Predictor Variable: One or more variables that are used to determine or predict the target variable. Whereas Wikipedia contains the following definition of the word 'feature': Feature is an individual measurable property or characteristic of a phenomenon being observed. What is the difference between 'predictor' and 'feature' in machine learning?
How would you explain Adversarial machine learning in simple layman terms for a non-STEM person? What are the main ideas behind Adversarial machine learning?
An article released by Open AI gives an overview of how Open AI Five works. There is a paragraph in the article stating: Our agent is trained to maximize the exponentially decayed sum of future rewards, weighted by an exponential decay factor called γ. During the latest training run of OpenAI Five, we annealed γ from 0.998 (valuing future rewards with a half-life of 46 seconds) to 0.9997 (valuing future rewards with a half-life of five minutes). Does annealing in …
I am sure data science as will be discussed in this forum has several synonyms or at least related fields where large data is analyzed. My particular question is in regards to Data Mining. I took a graduate class in Data Mining a few years back. What are the differences between Data Science and Data Mining and in particular what more would I need to look at to become proficient in Data Mining?
The tag feature-scaling seems to convey that one of the scaling methods is Standard Normal Distribution. Further, I read an Answer on this site saying that Mean Normalization is a form of feature scaling. What is the difference between two approaches to scaling? Note: I think that statistics and mathematics of normalization do differ.
Problem I have all kinds of machine learning terms that co-occur with the word "agnostic", including model-agnostic learning, model-agnostic metric. From the dictionary, it explains the word "agnostic" in the following way a person who holds the view that any ultimate reality (such as God) is unknown and probably unknowable. which does not make those terms more understandable. In some contexts, I find that "agnostic" refer to "generic" or "free of". For example, in the paper I am reading now, …
It is common to define the F-measure as a function of precision and recall, as mentioned in [1]: $F_{\beta}=\frac{(1+\beta^2)PR}{\beta^2 P+R}$ However I came across some other cases, another definition is used [2] (without weights): $F = H(sensitivity, 1- specificity)$ Where H is harmonic mean. Reference: F - measure derivation (harmonic mean of precision and recall) https://link.springer.com/chapter/10.1007/978-3-540-68947-8_133. https://stackoverflow.com/a/52892413/2243842
Is the result of a search for a specific n-gram like sherlock+holmes equal to the result of a regex search for "sherlock holmes" in the same document corpus? So if i read about n-grams for certain words, that's the same like normal string search? Example: https://books.google.com/ngrams/ https://books.google.com/ngrams/info
Simple definitional question: In the context of machine learning, is the error of a model always the difference of predictions $f(x) = \hat{y}$ and targets $y$? Or are there also other definitions of error? I looked into other posts on this, but they are not sufficiently clear. See my comment to the answer in this post: What's the difference between Error, Risk and Loss?
I am new to data science. I was looking into some datasets and I saw some values like -99, which I discovered later that it means that there is a missing value. Does this mean the same thing as NaN? If it is the same thing, why do we use -99 instead of NaN?
I think they share a lot (e.g. machine learning is a subset of both, right?), but maybe both have elements the other doesn't have? Could you name some in that case? Or is one a subset of the other? What is the relationship between AI and data science? For example, when it comes to the relationship of AI and ML, I always say AI is a superset of ML. And the distinguishing set is search algorithms, which I would include …
So I am trying to get familiar with Crisp-DM and found the terms "Data Description Report" and "Data Exploration Report", which seem oddly vague in their definition. So far I only found this right here: https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.crispdm.help/crisp_data_description_report.htm But this seems to be on the shorter end in my opinion. Is there any example of a Data Description Report anywhere? If not, is there any systematic methodology you personally use to record your findings while trying to understand data?