Real World Regression R & p-value

I am trying to figure out whether our customer support has an impact on tickets opened by customers. Our employees should contact customers to avoid that a user will open a ticket. The data is quite accurate. I am plotting the (pro-active) contacts per day, the opened tickets per day and I am using a linear fit for both. Both r² values are around 15% and the p-values are pretty bad as well (way above 5%). I wonder if I …
Category: Data Science

Making predictions with limited user generated data

We've trained a ML model and deployed it to production. The trained ML model uses about 50-60 features. A user inputs set of information on our platform which is nowhere close to all the features that the model is trained on. How do we make a prediction with ML algorithm that's trained on far greater number of features than your test point? A credit scoring example. Model is trained on 1000s of users' credit history, demographics, location, income, expenses, financials …
Category: Data Science

How does amazon's reviews that mention extracts topics from reviews?

Amazon product page contains a section called Reviews that mention. The section lists the main things that users liked or dislike about the product. For example see this page. How exactly does it work? This can be done using topic modelling using LDA. But this approach has several drawback. You need to choose number of topics upfront. But in amazon reviews number of topics vary for each product. Number of topics are not the same even for products that belong …
Category: Data Science

Working with three types of data: numeric (integer, floats), images, and text for prediction

So I have three types of data (in title) and am wondering how I can combine the data. The target is numeric (price). My idea is to perform feature extraction on both the images and text, which would result in a 1 dim row vector of size n. So this would produce n features. After this, combine all the data and normalize. The model would be trained on this combined dataset. I have really only worked with one data type …
Category: Data Science

What approach should I take for my product classification ML model with user feedback for improving result accuracy?

I'm trying to implement a product categorization ML model on a dataset with the following structure: Data sample I want to my model to be able to predict the correct category that the product should fall under, based on product description and name. However, I will be implementing this together with a GUI which allows some user input. For example, a new product name with a description gets added to the table: New entry before feedback training The user will …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.