Sharing Jupyter notebooks within a team

I would like to set up a server which could support a data science team in the following way: be a central point for storing, versioning, sharing and possible also executing Jupyter notebooks. Some desired properties: Different users can access the server and open and execute notebooks that were stored by them or by other team members. The interesting question here is what would be the behavior if user X executes cells in a notebook authored by user Y. I …
Category: Data Science

What are the disadvantages of Azure's ML vs a pure code approach (R/SKlearn)

Good Day, Microsoft offers their Azure Machine Learning Platform: https://azure.microsoft.com/en-ca/services/machine-learning/ Azure Machine Learning is designed for applied machine learning. Use best-in-class algorithms and a simple drag-and-drop interface—and go from idea to deployment in a matter of clicks. ... Use Azure Machine Learning to deploy your model into production as a web service in minutes—a web service that can be called from any device, anywhere, and that can use any data source. By their demo and their photos online it looks …
Category: Data Science

Should the type of Boolean categorical features be numerical or categorical after encoding?

There are categorical features which have two different value in my dataframe next to numerical features. I've converted these categorical values to 0 or 1. I will apply correalation elimination on features after calculating correlation coefficients. Depending on type of features, methods are given below: Numeric - Numeric: Pearson Numeric - Categoric: Cramer_V Categoric - Categoric: Correlation Ratio That's why I could not be sure what should be type of converted categorical features? Numerical or categorical ? Another reason to …
Category: Data Science

Tools / tech stack for generating metrics and insights for games

We run a games platform with millions of users (+- 150,000,000 gameplays / month). We want to find tools or set up a data stack to: collect basic metrics for a specific game such as average gameplay time, 1 day return rate, 7 day return rate,... be able to segment these data by any dimension that we pass along (e.g. by country, by network speed, by ...) generate more advanced insights for a specific game, e.g. this is the distribution …
Category: Data Science

Is there an easy to use .net library for neural networks?

I know 'easy to use' is going to be subjective, so let me qualify the question a little. Is there a library or working scrap of code that I can essentially copy into my project, change the number of neurons in each layer, the number of layers, and the source of the inputs and then click run. I've written at least 15 of these myself, not a single one has worked properly - I need to see something working in …
Category: Data Science

Best image recognition API to implement for eCommerce Lifestyle/Sculpture site

I'm planning an eCommerce site currently. We are likely running WooCommerce and looking to implement Algolia for our search features. We feel that for our particular purposes, a visual search would be a crucial feature to implement, due to our product types. For the purpose of my question, I will use the example of sculptures and ceramics, with various forms both abstract and utilitarian, textures, colors, and so forth. The idea is a customer can upload a photo of their …
Category: Data Science

I need direction for a research project

I am new to machine learning so please bare with me. I'll try to keep this short and sweet. We are building a makeup simulation and recommendation system. My part is to recommend a makeup which is personalized to the user and also on par with the current makeup trends. I will be building a set of rules with the help of a beautician that will say which makeup is suitable for a particular set of features. The outputs will …
Category: Data Science

Software for automated database processing

I faced a problem which I'd like to solve w/o any programming. And looking for a software to do this. I have a dataset, for example: (brand-id, brand-name, product-class-name;) 0, Audi, economy business premium; 1, Rolls Royce, luxury; 2, Seat, economy; 3, Tesla, business premium; And I'd like to automatically process this dataset, resulting in creating an additional table to classify parameters in column 3, like: (product-class-id, product-class-name, brand-id;) 0, economy, 0 2; 1, business, 0 3; 2, premium, 0 …
Category: Data Science

Can I get numeric data from a color map?

In my class I often need to work with color map images. I would show the image and try to make inferences/observations about different subjects. Often times I need to actually quantify some aspects, but it is always very approximate and somehow vague because the images are provided "as is" and I do not necessarily know their content a priori. Let's imagine I'm working with two images (*). Is it possible to indicate the computer "learn" the color scale bar …
Category: Data Science

Train-Test split for a recommender system

In all implementations of recommender systems I've seen so far, the train-test split is performed in this manner: +------+------+--------+ | user | item | rating | +------+------+--------+ | u1 | i1 | 2.3 | | u2 | i2 | 5.3 | | u1 | i4 | 1.0 | | u3 | i5 | 1.6 | | ... | ... | ... | +------+------+--------+ This is transformed into a rating matrix of the form: +------+-------+-------+-------+-------+-------+-----+ | user | item1 | item2 …
Category: Data Science

Implementation of reliable rule learning

I want to perform "reliable rule learning", i.e. mining a set of rules with a very low number of false negatives. I recently read the paper "Reliable agnostic learning" by Kalai et al. (https://doi.org/10.1016/j.jcss.2011.12.026) and they basically describe what I want: Rules are determined to reliably classify data points, and the reliability is partly reached by allowing "I don't know" as an additional answer. Sadly, their paper is purely theoretical and I could not find a corresponding implementation. Is there …
Category: Data Science

R Studio like editor for Python?

I hope this question is okay for the forum. I want to ask for your experiance with Python editors. Currently, I use VS-Code to work with Python. However, in R Studio I really appreciate that it holds data frames in the memory and makes it easy to view/inspect dataframes and other items. I'm "closer to the data" in R Studio. Also line-by-line/blockwise execution of code is really helpful. So my question: Is there anything like R Studio for Python (preferably …
Category: Data Science

Are there any good NLP APIs for comparing strings in terms of semantic similarity?

I want to create a chatbot which informs the user about traffic at the streets but not in real-time for the moment. I have created a small database with MySQL which has some data stored regarding traffic and I fetch them with a PHP script whenever this is appropriate depending on the interaction of the user with the chatbot. I wonder how to deal with the case when the user asks variations of the same question which therefore can be …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.