beginner

Is there any book for modern optimization in Python?

StatguyUser

2022年4月25日 10:14

I was reading Modern Optimization with R (Use R!) and wondering if a book like this exists in Python too? To be precise something that covers stochastic gradient descent and other advanced optimization techniques. Many thanks!

Topic: books career beginner reference-request tools

Category: Data Science

Multicolinear Predictors Effect on Model

Garreth Lee

2022年4月20日 23:55

I know that multicolinear predictors in a model aren't ideal because it causes the model to be sensitive to very minor changes, which then reduces our ability to interpret the effects of each predictor from its coefficient. However, I don't understand why the model becomes sensitive and how the estimated coefficients can vary wildly from just a very minor change in the dataset. Also, does multicolinear predictors affect the accuracy / error on a prediction? Or does it purely affect …

Topic: collinearity feature-engineering linear-regression beginner feature-selection

Category: Data Science

Inspect false classified

MLAlex

2022年4月20日 08:05

Recently, I was able to train a simple classification algorithm (my first ML-Project) and I even got a pretty satisfying precision score. Now I am looking for a way to inspect, which datapoints in my train_data have been falsely classified. My basic idea was something like: If y_train != y_pred Then: (get indices of y_train) (look up the data in my csv and try to find a pattern) My main problem is, that the train_test_split function provides me with a …

Topic: beginner classification

Category: Data Science

Beginner needs guidance. Machine Learning, preparing training data

Sascha Liebmann

2022年4月6日 08:43

i try to dip my feet into the field of computer vision and want to avoid mistakes along the way. The problem I have to solve: Classifiy images of 3D dental scans. For example: I wrote a script to create images of theses files in blender so i have full control over the image dimensions, quality, resolution ect. Now to my questions: Whats the best way to prepare a training dataset if you have full control over the process? Higher …

Topic: image-preprocessing image-classification beginner dataset machine-learning

Category: Data Science

Training, Validation, and Testing Data in Supervised Learning

Garreth Lee

2022年3月7日 01:53

I've come up with some simple definitions for training, testing and validation data in supervised learning. Can anyone verify/improve upon my answers? Training Data - Used by the model to learn parameters and 'fit' to the data (usually involves multiple models fit at once) Validation Data - Used by the model to either a) determine the best hyperparameter(s) for a given model or b) determine the best performing model out of a given selection or c) determine the best hyperparameters …

Topic: supervised-learning beginner

Category: Data Science

Various models giving 99% accuracy for KDDcup 99 dataset for Intrusion Detection, is there some sort of data leak I am missing?

scrimdougy

2022年3月6日 20:02

Student who is quite new to all this here. I am currently working with the KDDcup 99 data for intrusion detection using various ML models (and ANN). My problem is that I am getting 99% often for accuracy. At the moment I am focusing mostly on binary classification (normal vs attack) I have identified problems in my data preprocessing methods and after fixing them I am more confident in the validity of my input data but I am still getting …

Topic: anomaly-detection beginner machine-learning

Category: Data Science

How to apply class weight to a multi-output model?

Gal Avineri

2022年3月3日 15:12

I have a model with 2 categorical outputs. The first output layer can predict 2 classes: [0, 1] and the second output layer can predict 3 classes: [0, 1, 2]. How can I apply different class weight dictionaries for each of the outputs? For example, how could I apply the dictionary {0: 1, 1: 10} to the first output, and {0: 5, 1: 1, 2: 10} to the second output? I've tried to use the following class weights dictionary weight_class={'output1': …

Topic: keras weighted-data multiclass-classification beginner neural-network

Category: Data Science

Is NLP suitable for my legal contract parsing problem?

Posionus

2022年2月24日 01:01

My company has a product that involves the extraction of a variety of fields from legal contract PDFs. The current approach is very time consuming and messy, and I am exploring if NLP is a suitable alternative. The PDFs that need to be parsed usually follow one of a number of "templates". Within a template, almost all of the documents are the same, except for 20 or so specific fields we are trying to extract. That being said, there are …

Topic: spacy named-entity-recognition beginner nlp

Category: Data Science

How do I fine-tune model performance after the initial run? (Scikit-Learn)

Garreth Lee

2022年2月17日 18:08

I've just started learning regression using scikit-learn and stumbled upon a problem. For a given dataset, let's say that I've imputed the missing data and one-hot encoded all categorical features. This point is where it starts getting confusing for me. After hot-encoding categorical features, I usually end up with a lot of columns. How do I know that all of these columns benefit the model's performance? If not, how can I determine which columns/features to keep? Is there a method …

Topic: linear-regression beginner scikit-learn feature-selection

Category: Data Science

How do CNNs use a model and find the object(s) desired?

Parity Bit

2022年1月31日 18:34

Background: I'm studying CNN's outside of my undergraduate CS course on ML. I have a few questions related to CNNs. 1) When training a CNN, we desire tightly bounded/cropped images of the desired classes, correct? I.e. if we were trying to recognize dogs, we would use thousands of images of tightly cropped dogs. We would also feed images of non-dogs, correct? These images are scaled to a specific size, i.e. 255x255. 2) Let's say training is complete. Our model's accuracy …

Topic: convolutional-neural-network computer-vision beginner neural-network

Category: Data Science

For a student who is a beginner in quantitative research and statistics, which is the better statistical tool to start: R or IBM SPSS? Why?

Aidre Cabrera

2021年12月29日 20:32

Currently, I am writing my research design. However, I am still indecisive on what statistical tool should I use for the data analysis. I tried looking up on the internet and there are disparate answers to my question. I have noticed that R (Programming Language) and IBM Statistical Package for the Social Sciences are two of the recurring tools that are mentioned when it comes to this question. So, which is better? I need some insights so I can settle …

Topic: data-analysis research beginner statistics r

Category: Data Science

Group related items by their description and tag each group. [Pen, Eraser] : Stationary

naim5am

2021年12月25日 12:18

So have a list of data similar to the table below. It will be captured by a chatbot so I expect natural language but in the form of a structured command: Add {Qty} {item description} to {location} ID Owner Item Description Location Qty Image 1 Somenick Green apple fridge 1 1.jpg 2 Somenick Jewelry toy box bedroom 2 2.jpg 3 Somenick 12kg rubber quoted grey kettlebell bedroom 1 3.jpg 4 Astrod 60cm never used helmet closet 1 4.jpg 5 Atrod …

Topic: beginner nlp python categorical-data

Category: Data Science

Does Bias always decrease when Complexity increase?

paolopazzo

2021年12月15日 17:20

(I'm just starting learning about ML stuff and so please don't be rude if the following question is to stupid or totally wrong) I'm reading about Bias-Variance Trade off and I don't understand the (probably) most important part: why its a tradeoff? I totally get that the generalization error can be decomposed in 3 parts, an irreducible error due to the noise in our data, a Bias term and a Variance term. In some cases I have a model with …

Topic: bias variance beginner machine-learning

Category: Data Science

kMean clustering for recommendation

Natalia

2021年12月4日 22:04

I have a file with 50000 rows from a library platform. Each individual row saves a user, and shows the order in which the user, has selected. The books could be from various categories (e.g. roman, history, etc..). There are a total of 10 categories. The categories that user has selected could be for example: 334664. This means this user has selected a book from categories 3, 4 and 6. How can I use this data to build a recommendation …

Topic: beginner algorithms k-means clustering data-mining

Category: Data Science

I can't figure out how to improve accuracy for tweet sentiment

noideawhatimdoing

2021年11月13日 14:24

I'm doing a beginning attempt at tweet sentiment analysis (positive, neutral, negative). So far I have cleaned the data and used a BoW to get some feeling of the data (>2.5k tweets). I also made bigrams to try to get clearer sentiment insight. The data is severely skewed so I tried both upsampling and downsampling to view the difference. I finally passed it all through a Random Forest Classifier and I get an accuracy of 0.7 for the upsampled data …

Topic: beginner random-forest sentiment-analysis

Category: Data Science

Model with 2 datasets: combine time series data and statistics

jimmy

2021年11月9日 22:01

I am new to data science modelling so apologies if using wrong terminology in advance. I have a standard time series dataset of historical prices which is used to train/test a simple Random Forest classifier model which predicts the returns direction (+/-). I also have a few general statistics for 'day of the week direction' eg. frequency counts: Monday UP=120, Monday DOWN=90, Tuesday UP=67, Tuesday DOWN=50, Friday UP=55, Friday DOWN=181. How can I combine the results from the time series …

Topic: ensemble-modeling beginner random-forest dataset machine-learning

Category: Data Science

Tableau: Trying to determine the category of one table based on the dynamic aggregate of another table for a Tableau Dashboard

Amberite

2021年10月7日 09:18

I have one table that contains unique rows for all my quote requests: Quote_ID 1234 1235 1236 1237 1238 in a second table that I've joined (1-0*) with a relationship, I have referrals. These referrals represent reasons why the quote should be referred to an expert, but could represent any other attribute for the sake of the problem. Every referral has a key to the Quote_ID, a unique Referral_ID and a name: Quote_ID Referral_ID Referral_name 12345 1 too many X …

Topic: tableau beginner

Category: Data Science

Activity in fermenter has increased suddenly after 2 weeks, why?

mrniaboc

2021年8月8日 04:10

I'm a total beginner and really appreciative of any advice here. I'm making a 1 gallon batch of IPA from a kit. I brewed 16 days ago, and in the first 48 hours of the wort being in the fermenter it was bubbling away like crazy. It then settled down and I could see the liquid become less opaque and darker, as the yeast cake formed at the bottom. Today I went to look again as I had planned to …

Topic: specific-gravity bottling fermentation homebrew beginner

Category: Mac

Caramel Coffee Mead

Thatguy

2021年6月6日 14:43

I'm looking into brewing a mead in the near future. I have exactly 0 experience brewing anything. I'm going to buy a store-bought brewing kit in the next few weeks if everything lines up. I'm wondering if anyone's ever made a Caramel Coffee Mead, and if so do you have a recipe, or any tips on how to make it work. I want at least an 8% ABV, but no more than 16%. I'm looking for it to be a …

Topic: brewing mead homebrew beginner

Category: Mac

Data science without knowledge of a specific topic, is it worth pursuing as a career?

user3754366

2021年5月28日 09:43

I had a conversation with someone recently and mentioned my interest in data analysis and who I intended to learn the necessary skills and tools. They suggested to me that while it is great to learn the tools and build the skills there is little point in doing so unless i have specialized knowledge in a specific field. They basically summed it to that I'd just be like a builder with a pile of tools who could build a few …

Topic: career beginner education

Category: Data Science

About