reference-request

Data Science Podcasts?

Dawny33

2022年5月24日 13:47

What are some podcasts which are related to data science? This is a similar question to the reference request question on CrossValidated. Details/rules: The podcasts (the theme and the episodes) should be related to data science. (For example: A podcast which is about some other domain, with an episode which speaks about data science in that domain, is not a good reference/answer.) Personal opinions/reviews (if any) would be very helpful too.

Topic: reference-request

Category: Data Science

Is there any book for modern optimization in Python?

StatguyUser

2022年4月25日 10:14

I was reading Modern Optimization with R (Use R!) and wondering if a book like this exists in Python too? To be precise something that covers stochastic gradient descent and other advanced optimization techniques. Many thanks!

Topic: books career beginner reference-request tools

Category: Data Science

Word Embeddings fastText in 50 dimension

cris2019

2022年4月18日 04:07

Is there a fastText embedding in 50 dimensions? I'm aware of GloVe embedding is dimensions (50, 100, 200, 300) dimensions. I am trying to sentiment analysis with a very small dataset. If there is please can anyone provide a reference.

Topic: word-embeddings reference-request

Category: Data Science

How to remove outliers properly?

Erik M

2022年4月8日 18:35

I was wondering what is the best practice for removing outliers from data. Plotting a boxplot for each feature (column of the dataset) and removing data that fall outside the whiskers seems like a naive and problematic approach. For example, say you have many individuals with a 'gender' label and an 'income' label. Also assume that there are many more men in the dataset than women. Unfortunately, due to income disparity we may see that women receive a lower wage …

Topic: preprocessing outlier reference-request python data-cleaning

Category: Data Science

Techniques to increase the evaluation speed of a neural network

HXSP1947

2021年11月3日 17:23

This is somewhat of an open ended question and in some respects a literature request (I would love to be pointed to a survey paper if one exists). Suppose I am constructing a neural network to make some arbitrary prediction (either categorical, or numeric, doesn't matter). With this network I am concerned primarily with speed of evaluation. Obviously, I want the network to give as accurate as possible predictions, but I'm more than willing to sacrifice some accuracy if it …

Topic: reference-request neural-network performance efficiency

Category: Data Science

What are the possible applications of a Data Scientist in the design fase of an Aerospace Or Railway Engineering industry?

temporario1001

2021年9月14日 21:45

I have been trying to understand this for a long time, but this information proves to be incredibly elusive online. What are possible jobs that a pure Data Scientist, without much background knowledge, could be hired for in an Engineering team? I am aware, for instance, that supply chain can get some involvement. I don't mean the Business Intelligence positions, I want to get more involved with the engineering team, working on the products themselves (specially Aerospace or Railway). By …

Topic: career reference-request

Category: Data Science

What is the best practice to test a ETL pipeline?

Costa

2021年9月3日 02:02

In traditional software development practice, before going into production, a piece of code should go through various stages of testing (unit test, integration test, user acceptance test) to secure the stability of the software. A ETL pipeline, as a piece of code, should also go through these testing steps to build a healthy system. However due to the nature of ETL process, traditional testing technique may not be applicable. Is there any reference or guideline specifically focus on testing on …

Topic: etl reference-request

Category: Data Science

Which book is a standard for introduction to genetic algorithms?

Martin Thoma

2021年8月21日 01:57

I have heard of genetic algorithms, but I have never seen practical examples and I've never got a systematic introduction to them. I am now looking for a textbook which introduces genetic algorithms in detail and gives practical examples how they are used, what their strengths are compared to other solution methods and what their weaknesses are. Is there any standard textbook for this?

Topic: genetic-algorithms books reference-request

Category: Data Science

Low dimensional manifold in a high dimensional space and Geodesic distance

induction601

2021年7月16日 17:48

It is a common assumption that high-dimensional objects are lying in low-dimensional manifolds. And this constitutes a foundation for manifold learning or dimensional reduction techniques or (a way to beat the curse of dimensionality). My question is that assuming this is valid, how one can utilize this assumption in doing something such as manifold learning? I think the general goal is to find a nonlinear representation of this high-dimensional objective using a small degree of freedom. However, we know neither …

Topic: manifold reference-request dimensionality-reduction

Category: Data Science

Beginner math books for Machine Learning

Tantaros

2021年3月11日 20:03

I'm a Computer Science engineer with no background in statistics or advanced math. I'm studying the book Python Machine Learning by Raschka and Mirjalili, but when I tried to understand the math of the Machine Learning, I wasn't able to understand the great book that a friend suggest me The Elements of Statistical Learning. Do you know any easier statistics and math books for Machine Learning? If you don't, how should I move?

Topic: esl mathematics reference-request statistics machine-learning

Category: Data Science

Rate of convergence - comparison of supervised ML methods

Avatrin

2021年2月24日 20:03

I am working on a project with sparse labelled datasets, and am looking for references regarding the rate of convergence of different supervised ML techniques with respect to dataset size. I know that in general boosting algorithms, and other models that can be found in Scikit-learn like SVM's, converge faster than neural networks. However, I cannot find any academic papers that explore, empirically or theoretically, the difference in how much data different methods need before they reach n% accuracy. I …

Topic: convergence supervised-learning reference-request machine-learning

Category: Data Science

References/tutorials about data mining and machine learning

plpm

2021年1月26日 08:49

I am learning data analytics and I wonder if there are some good references and tutorials about machine learning, data analytics and data mining? What I'm searching for is an understandable reference/tutorial, which isn't very technical and isn't very basic either, in other words the material begins with the basic steps towards advanced steps. Thank you.

Topic: reference-request machine-learning

Category: Data Science

Data science / machine learning books for mathematicians

Burakumin

2021年1月10日 19:28

I have found other requests for references here. In particular in: Where to start, which books and Books about the "Science" in Data Science? I have given a glance to: Artificial Intelligence: A Modern Approach (Russel & Norvig) Machine Learning: The Art and Science of Algorithms that Make Sense of Data (Flach) Learning From Data (Abu-Mostafa et al.) Introduction to Statistical Learning (James et al.) Elements of Statistical Learning (Hastie et al.) Pattern Recognition and Machine Learning (Bishop) Now it …

Topic: books reference-request

Category: Data Science

What are some good resources for setting up and debugging neural networks?

mhdadk

2020年12月20日 08:02

I am aware of Troubleshooting Deep Neural Networks by Josh Tobin and A Recipe for Training Neural Networks by Andrej Karpathy, but I am interested in other resources that can give me some guidelines or steps to setting up and debugging neural networks.

Topic: reference-request neural-network

Category: Data Science

Who invented the concept of over-fitting?

DaL

2020年12月3日 08:15

I list the references that I found so far. Shortly, the first appearance of the term was in 1670, first appearance in in close meaning was in 1827, first appearance in a biological paper was in 1923 and first appearance in statistics was in 1935. However, the references indicate that there are gaps in this chronology. Earliest reference I found was The flying pen-man; or, The art of short-writing by William Hopkins (teacher of stenography.) in 1670. However, it is …

Topic: overfitting history terminology reference-request machine-learning

Category: Data Science

Machine learning for circular sequences

Vladislav Gladkikh

2020年10月17日 01:22

My data are sequences of real numbers $a_0,a_1,...,a_{n-1}$. The length of a sequence is fixed and equals $n$. Each sequence is mapped to a real number $y$ and I want to predict $y$ given the sequence. The arrangement of the elements within a sequence is important. However, the sequences are circular, meaning that $a_0$ is not the first element, and $a_{n-1}$ is not the last one. The sequence $a_0,a_1,...,a_{n-1}$ is indistinguishable from the sequence $a_k, a_{k+1}, ..., a_{n-1}, a_0, ..., …

Topic: rnn supervised-learning sequence reference-request

Category: Data Science

What is the difference between ICR and OCR?

Martin Thoma

2020年8月24日 15:02

I've just found the term "Intelligent Character Recognition" (ICR) on Wikipedia and other pages. According to Wikipedia: In computer science, intelligent character recognition (ICR) is an advanced optical character recognition (OCR) or — rather more specific — handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer during processing to improve accuracy and recognition levels. Is this just a marketing stunt or are there actually techniques which are specified as OCR and other …

Topic: ocr reference-request

Category: Data Science

Why Gradient methods work in finding the parameters in Neural Networks?

induction601

2020年8月18日 16:21

After reading quite a lot of papers (20-30 or so), I feel that I am quite not understanding things. Let us focus on the supervised learnings (for example). Given a set of data $\mathcal{D}_{train}=\{(x_i^{train},y_i^{train})\}$ and $\mathcal{D}_{test}=\{(x_i^{test},y_i^{test})\}$ where we assume $y_i^{test}$ are unknown, the goal is to find a function $$ f_\theta(x), \qquad \text{such that} \quad f_\theta(x_i^{test}) \approx y_i^{test}. $$ To do this, we need a model for $f$. Typically, neural networks are frequently employed. Thus we have $$ f_\theta(x) = …

Topic: loss-function gradient-descent reference-request neural-network machine-learning

Category: Data Science

Do 3D bar charts have advantages over 2D bar charts?

Martin Thoma

2020年6月23日 14:04

I vaguely remember that there was a study / blog post which made a strong point against 3D bar charts. Do you have a source at hand which compares the two - 2D bar charts and 3D bar charts?

Topic: reference-request visualization

Category: Data Science

Where to upload large (0.5Gb) weights anonymously?

Alex

2020年5月28日 22:24

I need to upload a number of checkpoints for ConvNets (weights + optimizers, all dicts of pytorch tensors), each about 0.5Gb anonymously. I don't want to use Google Drive. I trained models on the university cluster (if it's relevant). Where can I load these files anonymously? The files must be publicly available, but my identity must remain anonymous.

Topic: reference-request

Category: Data Science

About