The following is a simplified code snipet that is relevant to storing keras LSTM models in MLFlow. with mlflow.start_run() as run: mlflow.keras.log_model(model,"lstm") mlflow.log_params(model_parameters) mlflow.log_metrics(model_metrics) However, suppose that for each model there is a corresponding data preprocessing function that need be applied to new data before prediction. processed_data = custom_processing_function(new_data) predictions = model.predict(processed_data) Because each model may have a different preprocessing function, I want to keep track of each pair of the form (preprocessing function, model). Ideally, I am looking for …
I'm doing some Bayesian A/B testing and I need to work out an appropriate sample size in order to detect an effect. Unfortunately, there doesn't really seem to be much information about this out there (that I can find, at least). Let's say I'm measuring a continuous metric, which is currently at a value of 100 in the control group. How would I work out the sample size I'd need to detect a minimum effect of a raise to 102 …
I'm running a test on MapReduce algorithm in different environments, like Hadoop and MongoDB, and using different types of data. What are the different methods or techniques to find out the execution time of a query. If I'm inserting a huge amount of data, consider it to be 2-3GB, what are the methods to find out the time for the process to be completed.
So, I'm aware that multi-armed bandits are great for evaluating multiple models and from what I understand, it is mainly used to pick a specific model. I would still like to evaluate two models but I want to do it differently. Take a look at this simple equation: W_A * RecoScore_A + W_B * RecoScore_B = CompScore Rather than optimize for a specific model for a given user, I'd like to optimize for a given set of weights. I'm wondering …
We noticed we had a biased sample in our A/B test and was wondering if difference-in-differences would help us make valid conclusions about the data, or if there was another way to proceed. We ran an new experiment on our site, where we offered 50% of our users a new feature. We assigned users with odd ids into the experiment group and users with even ids into the control group and then ran the experiment. However, we saw that even …
A newbie here. I've brewed thousands of litres of good, bright tasty wine over the years, but this year I've turned my hand to beer. I brewed a Yorkshire bitter already, and am currently brewing a stout. Both from kits. But I like to experiment. I've gathered the following ingredients for a blond beer. 1.5 kg extra light spray malt, 1 kg of Polenta (ground dried sweetcorn) Bottom fermenting lager yeast, East Kent Golding hop pellets, Yeast nutrient for a …
Currently, I'm doing research with experimental data. The data comes from two experiments with two slightly different tasks, but with the same setup in a VR environment. Both experiments were done with different populations but with same two groups of participants: healthy controls and patients of a certain kind. From the experimental data the same set of features (over 200 features) were constructed and extracted for both datasets. The goal in this research is to apply machine learning in order …
We're using a whole year's data to predict a certain target variable.The model works like data - OneHot encoding the categorical variables - MinMaxScaler - PCA (to choose a subset of 2000 components out of the 15k) - MLPRegressor. When we're doing a ShuffleSplit cross-validation and everything is hunky-dory (r^2 scores above 0.9 and low error rates), however in real life, they're not going to use the data in the same format (e.g. a whole year's data), but rather a …
We want to introduce a new price list for the customers of our international SaaS company. Beforehand we want to test this new price list in several countries. A/B test cannot be conducted here because it is forbidden to have different prices for different customers of the same country. Thus we want to introduce the new pricing policy in several countries and then figure out whether the new one is better than the old one. My questions are: How to …
What are some alternatives to switchbacks and cluster Neighborhood sampling to get around network effects? For example, if you have a sales funnel that’s entire phone-based and you want to test an intervention and find out if it improves the probability of conversion. If a switchback was run, and if the variant causes a huge volume of phone leads, then it could impact the control even days later if the backlog of leads to call is large enough
I saw an article about an A/B test that google had performed way back. They wanted to decide what shade of blue a button should be and how that affects click-through rate. They divided users randomly into 100 buckets - each corresponding to a shade of blue they wanted to check (so the color is a factor with 100 levels). Now this is all well and good if all the buckets (or "treatment groups") sufficiently represent the target population. In …
I've couple of recommendation algorithms that I want to A/B test. Algorithm A has 90% user coverage and algorithm B has 95% user coverage. That means if the algorithms are asked to provide recommendations for 1000 users, algorithm A can give it for 900 of the users and algorithm B can give it for 950 other users. Say for example out of these 1000 users 87% has recommendations from both algorithm, 3% has recommendations from only algorithm A and 8% …
I have a dataset. I wanted to do paired t test on it. So I carried out normality test and it showed that it does not follow normal distribution. So I used Wilcoxon test in place of paired t test. The descriptive statistics of my data shows median value of 0 but when I ran wilcoxon test, it gave me a p value <0.05 stating to reject null. Could you please tell me why this discrepancy in my test results? …
I want to know of any repositories that contain complete experimental design in R covering basic test and analyses? I want to take a top-buttom approach to learn step by step through a real project how that works. Do you know of any places to find that, such as an R notebook?
My question on A/B testing is about doing post test segmentation analysis. For example: I run an A/B test on my website to track bounce rate. On the treatment group, i put a video to explain my company. On the control group i put just plain text. I pick a segment of users who are first time users from USA to be split 50/50 into the 2 groups. Metric that i am tracking is average bounce rate (assume 20%). Power …
I have experience with multifactor DoE, but in the context of optimizing treatment of a single or or a small number of populations. Are there any articles people recommend to help get my head around how to approach DoE for more complex machine learning models where there are more personalized forecasts / recommendations so that the experiment best informs future predictions. This seems like it would be a combination of want to get more data away from a local optimum …
I am working on a classification of two feature sets derived from a dataset. We first obtain two feature matrices derived from two feature extraction methods. Now, I need to compare them. However, the recognition accuracy for two feature sets, reaches almost the same recognition accuracy. My question is: Is there a way to design a meaningful experiment to show the difference between the two methods? What are your suggestions?
I hope you all are doing well. Before I proceed with my problem statement, a few terminologies for reference - Territory = Sales Territory - Think of it like a county/region assigned to a particular rep and no overlap of area/customers between 2 Reps Rep/ Sales Rep = Sales Representative who visits customers to convert sales Calls = Number of times a customer is visited by the rep in a month Goal Attainment = % of Target achieved for the …
I know this question doesn't really comply to standards but, I wanted to know if we can mix beers after, say cold crash. I want to create different beers using the ones I already brewed. Can I mix ales with lagers for example? I know big breweries mix different batches to keep the profile stable. But, I want to do this to create new and more complex beers. Thanks in advance.
I have a general inference question regarding scenarios when results from data are not statistically significant but there appears to be an observable trend. For example, treatment A and treatment B are applied to 2 independent populations. Using a ttest to analyze the resulting data (lets say the data is total revenue), the p value == .2, so the effect of treatment on revenue was not statistically significant. However, the total revenue from treatment A was observably higher in treatment …