methods

How can I learn and apply the scientific method in machine learning?

Mostafa Touny

2022年3月23日 13:52

Rigor Theory. I wish to learn the scientific method and how to apply it in machine learning. Specifically, how to verify that a model captured the pattern in data; how to rigorously reach conclusions based on well-justified empirical evidence. Verification in Practice. My colleagues in both academia and industry tell me measuring the accuracy of the model on testing data is sufficient, but I don't feel confident such criteria are sufficient. Data Science Books. I have picked up multiple data …

Topic: methodology methods

Category: Data Science

Does the Data Science process (CRISP ) comply with the Agile methodology?

Ormetrom2354

2022年3月3日 15:03

A common method for conduction data science projects is CRISP - https://www.datascience-pm.com/crisp-dm-2/. However in job descriptions they are often combine data science with agile methods, mostly SCRUM. How fits data science together with Agile, Scrum? I get that CRISP and Scrum both use the cycle as a way to approach the end result, but there is a lot of different methodology and terminology. Any ideas or hints for further readings?

Topic: methods

Category: Data Science

Good classifiers when having many labels

Xafer

2021年11月21日 23:34

I am asking myself, if there is another good method than deep artificial neural networks when trying to classify data with many (>100) labels. Are there any suggestions? For example, logistic regression does not seem to fit, as - in its basic form, it only supports two labels, does it?

Topic: methods classification

Category: Data Science

What do "compile", "fit", and "predict" do in Keras sequential models?

WDR

2021年3月13日 23:18

I am a little confused between these two parts of Keras sequential models functions. May someone explains what is exactly the job of each one? I mean compile doing forward pass and calculating cost function then pass it through fit to do backward pass and calculating derivatives and updating weights? Or what? I have seen in some codes, they only used compile function for some of their LSTMs and fit for some other ones! So I need to know each …

Topic: cost-function prediction keras backpropagation methods

Category: Data Science

Modeling count data with time-dependent rate

Bridgeburners

2020年10月23日 15:15

For processes of discrete events occurring in continuous time with time-independent rate, we can use count models like Poisson or Negative Binomial. For discrete events that can occur once per sample in continuous time, with a time-dependent rate, we have survival models like Cox Proportional Hazards. What can we use for discrete event data in continuous time where there is an explicit time-dependence that we want to learn? I understand that sometimes people use sequential models where each node is …

Topic: time counts survival-analysis methods

Category: Data Science

Predicting High-School test scores after a disciplinary action

DataG

2020年3月8日 09:00

I'm somewhat new to machine learning and have learned to apply many of the basic regression and classification methods using python and various packages. However, approaching this problem has me stumped. To illustrate the problem, I created a fictitious scenario where a guidance counselor wants to predict test scores for a student after disciplinary action. Suppose they have data available like the mock-up below: Column definition: Student - Student Identification # Gender - Male/Female Age - Current Age Athlete - …

Topic: methods python machine-learning

Category: Data Science

Previous work Replication and Research ethics Ask Question

arilwan

2020年1月14日 14:39

I am very much concerned about biding by research ethics in my work, especially issues to do with plagiarism. I come across a recent research paper in my field of study that applies state-of-the-art tools (deep learning architectures) in their work using a publically available dataset. I am impressed by their work and feel I should apply the same methodology they used but using my dataset (private). Would this be considered a plagiarised version of their work?

Topic: methodology methods research dataset

Category: Data Science

How is DS used in the case of Payment Gateways?

m2rik

2019年12月31日 09:51

I know it's a general question but what type of analytics can be done in this case? How can we apply machine learning models here?

Topic: data-analysis methods parameter predictive-modeling machine-learning

Category: Data Science

Interpreting DataFrame.where() documentation

Mountain Scott

2019年10月8日 07:41

From examples outside of the documentation, I thought I understood the examples of the .where() method. Basically, it seems to be a another way to filter a dataframe. However, when I checked the documentation itself for an example of how to use .where(), it was counterintuitive. The documentation provides this example: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) df.where(lambda x: x > 4, lambda x: x + 10) [output]: A B C 0 …

Topic: methods pandas

Category: Data Science

drop columns and rows in one line in pandas

Mohamed Abduljawad

2019年6月17日 02:15

I want to drop a range of rows and columns of a dataframe, I did it as follow: df.drop(df.columns[3:], axis=1, inplace=True) df.drop(df.index[3:], axis=0, inplace=True) Can I do the two processes in one method instead of two? Or is there any more sufficient way to accomplish this?

Topic: methods pandas

Category: Data Science

Time-series decomposition to a base level and an effect of another feature

jakes

2018年9月10日 18:26

I've got a time-series data (let's denote it as y) and some feature (let's denote it as x). y is dependent on x, but x is often equal to 0. Even then, y is not 0, so we can assume that there's a base level in y which is independent of x. Additionally, we can observe some seasonality in y. I need to decompose y into base level and an effect of x. And I need some hint about methodology. …

Topic: data-science-model methodology methods time-series

Category: Data Science

Can you provide examples of business application of vector autoregressive model?

user2530062

2018年6月1日 22:03

Vector Autoregressive models are exploited at Economics faculties all around the world. They are just another statistical model that solves problem of forecasting, although in a deeply complexity-uncovering manner. Yet to my surprise, there is no evidence it has been used outside pure economics domain, namely, to solve business problems like we all - Data Scientists - do. Can you share either your experience with application of VAR to solve business problem, a scenario in which it could hypothetically be …

Topic: methodology methods time-series

Category: Data Science

What are some method for pre-processing data in OCR?

GlossMash

2017年12月1日 10:13

I have a dataset for a supervised learning task. Each row is a vector with a value of pixmap value in a range [0,255] of gray colormap, each vector is labeled with a character. I have to assign each vector with a character. My Question: What are some methods that I can try to pre-process the data to gain better accuracy?

Topic: supervised-learning preprocessing methods

Category: Data Science

Standard method to integrate tools coded in multiple languages in an analysis workflow

user345394

2017年6月6日 12:39

I am trying to stitch together multiple packages and tools from multiple languages (R, python, C etc.) in a single analysis workflow. Is there any standard way to do it? Preferably (but not necessarily) in python.

Topic: methods

Category: Data Science

What is behind "A. Grothendieck scheme theory" in Mondobrain?

Laurent Duval

2016年12月18日 20:49

Mondobrain proposes a "big data" technology with: a new generation of algorithms based on A. Grothendieck scheme theory (Field Medal) that extract knowledge and rules from data without any model or distance, and that can explore every part of multi-dimensional spaces independently. What is behind this method that does not use "model of distance"? Are there relations methods like Topological Data Analysis (evoked at math stackexchange)?

Topic: distance methods bigdata

Category: Data Science

How do you define the steps to explore the data?

Daniel

2016年7月3日 09:56

I'm falling in love with data science and I'm spending a lot of time studying it. It seems that a common data science workflow is: Frame the problem Collect the data Clean the data Work on the data Report the results I'm struggling to connect the dots when comes to work on the data. I'm aware that step 4 is where the fun happens, but I don't know where to begin. What are the steps taken when you work on …

Topic: data-wrangling methods visualization statistics machine-learning

Category: Data Science

Perform classification on market basket analysis

tristndev

2016年1月28日 10:45

I have the following problem that I don't know how to solve: I have the data for different market baskets with a corresponding class. So for example I know: Student - {beer, milk, water} Professional - {nuts, pizza, bananas} Student - {oranges, tomatoes, beer} ... Is there a method to create a classification model so that I can use the content of the market basket in order to determine the corresponding class (Student, Professional, ...)? Thank you!

Topic: market-basket-analysis methods association-rules classification machine-learning

Category: Data Science

Data Science Methodologies

Mike Wise

2015年5月11日 15:30

What are the best known Data Science Methodologies today? By methodology I mean a step-by-step phased process that can be used for framing guidance, although I will be grateful for something close too. To help clarify, there are methodologies in the programming world, like Extreme Programming, Feature Driven Development, Unified Process, and many more. I am looking for their equivalents, if they exist. A google search did not turn up much, but I find it hard to believe there is …

Topic: methods

Category: Data Science

Can distribution values of a target variable be used as features in cross-validation?

Climbs_lika_Spyder

2015年1月30日 13:52

I came across an SVM predictive model where the author used the probabilistic distribution value of the target variable as a feature in the feature set. For example: The author built a model for each gesture of each player to guess which gesture would be played next. Calculating over 1000 games played the distribution may look like (20%, 10%, 70%). These numbers were then used as feature variables to predict the target variable for cross-fold validation. Is that legitimate? That …

Topic: methods accuracy

Category: Data Science

About