data-product

Predicting products to be sold in a store - problem formulation

Minions

2022年5月13日 15:07

I have a data from a store for the products that sold since more than 5 years. Each sell process has a customer id, date, and the quantity of the product. I want to build a machine learning model to predict the products that will be sold in the next day/s for each of the customers, giving that I have N products (~2k) and M customers (~50). I am not able to formulate this problem. It's a regression task (probably), …

Topic: data-product regression machine-learning

Category: Data Science

ML model deployment architecture?

TheSugoiBoi

2022年3月16日 03:07

I came from a software development background and we have separate servers of the same database (dev, test, prod). The reason for this is because we develop our apps against the dev DB, run tests against the Test DB, and prod is prod. This is so we create a clear separation and won't bring down prod trying to build our app. Do you guys train your models the same way? Have 3 environments of the same database and as your …

Topic: machine-learning-model data-product databases machine-learning

Category: Data Science

Monitoring machine learning models in production

David Masip

2021年9月9日 08:33

I am looking for tools that allow me to monitor machine learning models once they are gone to production. I would like to monitor: Long term changes: changes of distribution in the features with respect to training time, that would suggest retraining the model. Short term changes: bugs in the features (radical changes of distribution). Changes in the performance of the model with respect to a given metric. I have been looking over the Internet, but I don't see any …

Topic: data-product machine-learning

Category: Data Science

How to compute inner product between two networks'parameters

Phong Le

2021年4月23日 05:23

Consider a neural network with $f(x) = w^T_2 \sigma(w^T_1 x) $ where $\sigma(.)$ is a activation function such as ReLU. $w_2 \in R^{d \times k}, w_1 \in R^{k \times o}$ are two matrices. I would like to compute the inner product between two initialization of model's parameters $\theta =(w_2, w_1)$ and $\theta'=(w'_2, w'_1)$. Should we stack all elements of networks parameter into a single vector, i.e $\theta, \theta'$ will be a big vector with the number of entries equal to …

Topic: data-product neural-network

Category: Data Science

Hi, im currently working for a company that has some inventory control problems

Anni666

2021年4月6日 18:49

First, I was asked by the manager to make a plot showing produced vs received items, its a multistage process so we are only in charge of one of the steps which is designing, I made the plot comparing Received cases vs produced here in my country, produced out of the country, total produced and % of advancement. Later on in a meeting she asked me to show the graph and table I made to the production supervisors, and she …

Topic: data-analysis plotting data-product data regression

Category: Data Science

Once a predictive model is in production, how it can be evaluated?

Mahsaem

2021年1月30日 07:17

I have a data science project, predicting customer's next purchase day. Customer's one year behavioral data was split to 9 and 3 months for train and test, using RFM analysis, I trained a model with different classifiers and the best one's result is as follow: Accuracy of XGB classifier on training set: 0.93 Accuracy of XGB classifier on test set: 0.68 This is my school's project, and I was wondering, in real world projects, how can we evaluate a model's …

Topic: metric data-product performance machine-learning

Category: Data Science

Data science Product Question Help

Sunny91x

2020年8月16日 22:53

I was recently asked this in a ds interview pertaining to product thinking - If a product feature is being rolled out, but an a/b test is unable to be performed due to whatever reason, how can we measure the efficacy of this feature? My response was along the lines of an exploratory standpoint in terms of comparing data pre and post rollout, but was curious as if there were better methods for this, thanks and thanks so much for …

Topic: data-analysis data-product ab-test

Category: Data Science

Multi-country model or single model

David Masip

2020年8月1日 08:19

I am working on a ML model to be deployed in a product operating in many countries. The issue that I am having is the following: should I train one model and serve it for all countries? train a model per country and serve each model in its country? I've faced this problem several times, and to me, there's a trade-off in the learning: in the first case, the model has more data to learn, and it'll be more robust …

Topic: data-product xgboost linear-regression deep-learning machine-learning

Category: Data Science

Machine learning model bundled with a library vs. an API

c1377554

2020年2月3日 01:54

I am thinking to "deploy" a machine learning model (in pickle it is sized 3 megabytes) and after discussing with my developer colleagues, they said it would be better if the model is packed as a python library instead of a microservice (like a rest API). I wanted to ask what's your view on this: Pickled model packed in a library specifically meant for it vs. a rest API, pros and cons? I was thinking that having it as a …

Topic: data-product python predictive-modeling

Category: Data Science

How to validate recommender model in healthcare?

tashuhka

2019年4月8日 21:34

In order to validate a recommender model, a usual approach is create a hold-out set that will provide random suggestions (similar to an A/B testing setup). However, in healthcare applications, this cannot be possible as a random suggestion can put at risk a patient's life. Hence, what is a reasonable approach to validate the model?

Topic: data-product recommender-system

Category: Data Science

Modelling if condition of multiple estimators in a pipeline

pratyush

2018年5月28日 12:54

How to correctly model a if condition to choose estimator/predictor(linear regression, gbt) to be used in scikit/spark-ml in a single pipeline. if feature_x < constant: result = pipeline1.predict(feature_vector) else: result = pipeline2.predict(feature_vector) Other than modelling it as custom transformer/predictor, is there a alternate way to model it in a pipeline

Topic: data-product apache-spark scikit-learn python

Category: Data Science

Putting a predictive model into production

nightmarish

2018年1月30日 04:09

Even after all these years of data science from 2010 to 2018, why is there no general framework for putting a predictive model into production?

Topic: data-product

Category: Data Science

Machine Learning models in production environment

trailblazer

2016年8月13日 10:26

Lets say a Model was trained on date $dt1$ using the available labeled data, split into training and test i.e $train_{dt1}$, $test_{dt1}$. This model is then deployed in production and makes predictions on new incoming data. Some $X$ days pass, and there is bunch of labelled data that is collected in between $dt1$ and $dt1 + X$ days, lets call it $Data_x$. In my current approach, I take random samples out of $DATA_x$ (take for e.g 80/20 split) , So, …

Topic: data-product model-selection cross-validation machine-learning

Category: Data Science

How to create Self learning data product

StatguyUser

2015年12月13日 15:28

I am trying to build price recommendation solution for clients in a scalable manner. I have two choices as below. Professional service: Statistician involvement to build regression model or any other kind of predictive model that fits specifically to client data and can be used. Issue: So on the long run there will be issues around scalability as one analyst cannot build model simultaneously for hundreds of clients who want to come on board and use this service. Hiring 1 …

Topic: data-product deep-learning recommender-system scalability machine-learning

Category: Data Science

How do I allow user to run a R script without making a website or web application?

JeanVuda

2015年12月1日 20:27

I have data & a R script that creates a report from the data. I can't expose the data to internet. Also, I cant expose my script to internet / user. But I would like to eliminate myself from the work, and allow couple of users (yeah, only three users that use this script to generate reports, but they do it weekly) to run the script and generate report for them. I would like the community to suggest me how …

Topic: data-product r

Category: Data Science

About