error-handling

Importing Excel format data into R/R Studio and using glmnet package?

Sympa

2022年5月31日 03:03

I have no problem importing Excel formatted data into R/R Studio and use all other R packages that I use. But, when I want to use the glmnet package to develop a regularization model, I invariably run into the following error (after specifying my regularization model and attempting to run it): Error in storage.mode(y) <- "double": (list) object cannot be coerced to type 'double' Here is what I have already tried to resolve this: De-format the numbers in Excel (no …

Topic: regularization data excel error-handling r

Category: Data Science

sklearn FutureWarning message when running a CNN model

Jack

2022年5月16日 06:51

When I run my model, I am receiving the following error message: FutureWarning: Pass classes=[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24], y=[20 16 4 ... 2 2 2] as keyword args. From version 0.25 passing these as positional arguments will result in an error FutureWarning) I am assuming I need to pass them as keyword args. I am new to the …

Topic: error-handling scikit-learn

Category: Data Science

Making Sense of this Error Message

Mr Prof

2022年4月4日 23:00

I am using a book and a video to learn how to use KNN method to classify movies according to their genres.This is my code: import numpy as np import pandas as pd r_cols = ['user_id', 'movie_id', 'rating'] ratings = pd.read_csv('C:/Users/dell/Downloads/DataScience/DataScience-Python3/ml-100k/u.data', sep='\t', engine='python', names=r_cols, usecols=range(3)) # The file is u.data from MovieLens print(ratings.head()) movieProperties = ratings.groupby('movie_id').agg({'rating': [np.size, np.mean]}) print(movieProperties.head()) movieNumRatings = pd.DataFrame(movieProperties['rating']['size']) movieNormalizedNumRatings = movieNumRatings.apply(lambda x: (x - np.min(x)) / (np.max(x) - np.min(x))) print(movieNormalizedNumRatings.head()) movieDict = {} with open('C:/Users/dell/Downloads/DataScience/DataScience-Python3/ml-100k/u.item') as …

Topic: k-nn error-handling pandas data-cleaning data-mining

Category: Data Science

How to create a system to detect text structure of a file?

Kartikey Singh

2022年2月26日 04:02

Let's say I want to create a Machine Learning system that has a lot of log files of some few types (F1, F2,.. Fn) and I get a new Log file with maybe some errors or missing data. How do I classify it into these class types or classify it is an anomaly if it doesn't belong to anyone of them. I thought about anomaly detection but couldn't figure how to parse structure information from the text classes like (F1, …

Topic: data error-handling machine-learning

Category: Data Science

Characterizing errors in rotation and translation while estimating camera pose from images

Nagabhushan S N

2022年2月19日 06:50

Has anyone characterized the errors in rotation and translation while estimating camera pose of natural images using SFM or visual odometry? I mean, when estimating camera pose, what is the typical amount of error in rotation and translation that one can expect? Any references on errors in odometry sensors are also welcome.

Topic: computer-vision error-handling

Category: Data Science

How to include the sudden peaks/bursts in LSTM based time-series model's training

SJa

2022年2月18日 22:26

I am using LSTM for time-series prediction whereby I am taking past 50 values as my input. Now, the thing is that it is predicting just OKish, and not doing the exact prediction, especially for the peaks. Any help about how can I train my model to tackle this problem and take the peaks into account so that I can predict more accurately (if not EXACTLY). THe model summary and the results are as below:

Topic: lstm training error-handling time-series

Category: Data Science

k nearest neighbors method, temporal trend in error

Snorrlaxxx

2022年2月16日 23:04

I have this set of data that looks like this I was asked o build a $k$-nearest neighbors algorithm for it which I just finished building. I have this question in regards to the data that I do not understand: Do you notice any spatial or temporal trends in error? I am not sure how to proceed in answering that question. Any suggestions would be appreciated.

Topic: data error-handling dataset

Category: Data Science

ImportError: Pandas requires version '0.3.0' or newer of 's3fs'

Twwister8889

2021年9月20日 22:35

I'm trying to read files from S3, using boto3, pandas, anaconda, but I have the following error: ImportError: Pandas requires version '0.3.0' or newer of 's3fs' (version '0.1.6' currently installed). How can I update the s3fs version? This is my code: import boto3 import pandas as pd s3 = boto3.resource('s3') bucket= s3.Bucket('bucketname') files = list(bucket.objects.all()) files objects = bucket.objects.filter(Prefix='bucketname/') objects = bucket.objects.filter(Prefix="Teste/") file_list = [] for obj in objects: df = pd.read_csv(f's3://bucketname/{obj.key}') file_list.append(df) final_df = pd.concat(file_list) print (final_df.head(4))

Topic: error-handling aws pandas python

Category: Data Science

Comparing RMSEs of multiple test sets having different sizes

Aditya Kulkarni

2021年8月14日 10:21

The data I have is a time series data (stock returns), and I am training a Random Forest Regressor on it. Total observations = 2499 To better evaluate the performance, I have implemented rolling windows testing with training window sizes = 500, 700, 900,..., 2100. Though instinctively it would seem obvious to choose a window size which produced lowest RMSE, how can I be sure that the comparison is fair? I mean with increasing window size, the test set size …

Topic: model-evaluations rmse error-handling machine-learning

Category: Data Science

Operands Could not be Broadcast with Shapes (19,)(0,)

Mr Prof

2021年7月30日 15:23

I have googled and read something similar to the problem I have but I do not seem to know how to fix the error I got from this particular code: import operator def getNeighbors(movieID, K): distances = [] for movie in movieDict: if (movie != movieID): dist = ComputeDistance(movieDict[movieID], movieDict[movie]) distances.append((movie, dist)) distances.sort(key=operator.itemgetter(1)) neighbors = [] for x in range(K): neighbors.append(distance[x][0]) return neighbors K = 10 avgRating = 0 neighbors = getNeighbors(1, K) **ValueError:** operands could not be broadcast together …

Topic: k-nn implementation error-handling

Category: Data Science

Xgboost fit won't recognize my custom eval_metric. Why?

Gábor B

2021年7月30日 08:55

Do you know why my custom_eval_metric doesn't work? I get the error: XGBoostError: [07:56:32] C:\Users\Administrator\workspace\xgboost-win64_release_1.4.0\src\metric\metric.cc:49: Unknown metric function custom_eval_metric def custom_eval_metric(preds, dtrain): labels = dtrain.get_label() preds = preds.reshape(-1, 3) preds_binary = [] for element in range(0,len(preds)): tmp = [] tmp = preds[element][2] preds_binary.append(tmp) labels_adj = [0 if x == 1 else x for x in labels] labels_adj = [1 if x == 2 else x for x in labels_adj] preds_binary = np.asarray([preds_binary]) labels_adj = np.asarray([labels_adj]) return 'ndcg score', metrics.ndcg_score(new_items, preds) …

Topic: error-handling classification

Category: Data Science

Multiclass classification oob error

Bobslope

2021年7月4日 13:35

Im implementing a random forrest for a 6 class classification and witnessing a strange phenomenon. I have 10 percent of my set sectioned out as a pseudo validation set. Im training 50 percent of the training items (training items being 90 percent of the whole set) per tree randomly selected. Now my oob error is almost the mirror image of my validation error. Im using averaged f1 error (ie average of the f1 error per class). As more trees are …

Topic: bagging generalization multiclass-classification error-handling random-forest

Category: Data Science

Why does the MAE still remain, at all?

Turnvater

2021年4月16日 15:26

This may seem to be a silly question. But I just wonder why the MAE doesn't reduce to values close to 0. It's the result of an MLP with 2 hidden layers and 6 neurons per hidden layer, trying to estimate one outputvalue depending on three input values. Why is the NN (simple feedforward and backprop, nothing special) not able to maybe even overfit and meet the desired training values? Costfunction = $0.5 (Target - Modeloutput)^2$ EDIT: Indeed I found …

Topic: mlp cost-function error-handling

Category: Data Science

PySpark: java.io.EOFException

dustin

2021年4月7日 12:06

System: 1 name node, 4 cores, 16 GB RAM 1 master node, 4 cores, 16 GB RAM 6 data nodes, 4 cores, 16 GB RAM each 6 worker nodes, 4 cores, 16 GB RAM each around 5 Terabytes of storage space The data nodes and worker nodes exist on the same 6 machines and the name node and master node exist on the same machine. In our docker compose, we have 6 GB set for the master, 8 GB set …

Topic: error-handling pyspark apache-spark python apache-hadoop

Category: Data Science

Reason of use of Product rule of Probability in Predicting Error

Satyendra

2021年3月16日 12:06

P(x,y) = P(y|x)P(x) Why do we use this in estimating expected prediction error? i.e. E{(y - f(x))^2} I researched and I came to know that it helps in figuring out noise but How?

Topic: error-handling regression

Category: Data Science

Python - Logistic (Logit) Regression - why am I getting an Endog error?

Nikita Rogers

2021年2月20日 13:41

I'm running the following code: X = dataset[['X1 transaction date', 'X2 house age', 'X3 distance to the nearest MRT station', 'X4 number of convenience stores', 'X5 latitude', 'X6 longitude', 'X7 distance to Xindian Ditsrict Office', 'X8 distance to Cardinal Tien Hospital', 'X9 distance to Shih Hsin University']] y = dataset['Y house price of unit area'] model = sm.Logit(y, X).fit() print(model.summary()) I'm using a CSV dataframe with information about 414 different residential properties in the Xindian District of Taiwan. My goal …

Topic: error-handling logistic-regression python

Category: Data Science

recognizing the correct word & "Set type is unordered"-error in python-pandas

Jsmoka

2021年2月2日 05:22

My Data Set (CSV): CL1,CL2,CL3 Hello Worrld,Hello ! World,Snack Hello % World,Hello World,Vol 8.5% Alc Hello World,Good! Hello,Hello World Good Morning,Airplane,Good Morning JK^KJ,Good Morning,Talueas My Goal: 1- I would like to search and find the similar values between all columns (CL1-CL3) and sort in a new column (SIM). 2- I would like to find the non-similar values between columns and sort in another column (NON-SIM). What I Would Like: Actually, I would like to use it in supervised learning for …

Topic: text-classification error-handling pandas data-cleaning machine-learning

Category: Data Science

Implementation of reliable rule learning

Tobias B.

2020年10月23日 14:07

I want to perform "reliable rule learning", i.e. mining a set of rules with a very low number of false negatives. I recently read the paper "Reliable agnostic learning" by Kalai et al. (https://doi.org/10.1016/j.jcss.2011.12.026) and they basically describe what I want: Rules are determined to reliably classify data points, and the reliability is partly reached by allowing "I don't know" as an additional answer. Sadly, their paper is purely theoretical and I could not find a corresponding implementation. Is there …

Topic: error-handling software-recommendation algorithms data-mining machine-learning

Category: Data Science

What is the value of AIC criterion if RSS is 0?

user606273

2020年10月19日 10:02

The AIC formula is : $AIC = 2k + n Log(RSS/n)$ So if RSS is equal to 0, it is undefined. How do I deal with this? What value should it take?

Topic: data-science-model estimators error-handling accuracy

Category: Data Science

TypeError: '<' not supported between instances of 'int' and 'str'

TigSh

2020年9月12日 09:15

I have the following code rf = RandomForestClassifier() rf.fit(X_train, Y_train) print("Features sorted by their score:") print(sorted(zip(map(lambda x: round(x, 2), rf.feature_importances_), X_train), reverse=True)) and I get the following error: > TypeError Traceback (most recent call last) > > ipython-input-109-c48c3ffd74e2> in <module>() > > 2 rf.fit(X_train, Y_train) > > 3 print ("Features sorted by their score:") > > ----> 4 print (sorted(zip(map(lambda x: round(x, 2), > rf.feature_importances_), X_train), reverse=True)) > > TypeError: '<' not supported between instances of 'int' and 'str' I …

Topic: error-handling scikit-learn python machine-learning

Category: Data Science

About