What's an Idea for a Beginner Project that involves TDA?

I am an undergraduate in math just beginning in data science looking for ideas to do a data science project that involves Topological Data Analysis. I've been reading privately on algebraic topology, and I think it would be really neat to apply some of these ideas to my future data science career. any ideas you have would be great. Please keep in mid that I am very new at this, and I only know Python, R, and am currently learning …
Category: Data Science

Collaborative predictive modeling

How do you share modeling work among several programmers? Our team has split apart the work of writing the SQL code to create our dataset. However, we will soon need to build a machine learning model. My machine learning approach is a linear/iterative process. Read data in, then split data, then normalize, then try model A, model B, score the model, and go back to tweak various hyperparameters. Experiment. Outside of pair programming, how can you distribute/share this work?
Category: Data Science

Project structure - many projects share same large dataset

I have a bunch of projects for my job that are largely unrelated except they use the same data, which is pretty big on disk in csv format. I want these to exist separately from each other and I usually try to use the cookie cutter data science model for project structure, and keep all my data in a data folder in the root of the project. But because this dataset is big, I don't want to have ten copies …
Category: Data Science

Create Period column based on a date column where the first month is 1, second 2, etc

I have a dataset with many project's monthly expendituries (cost curve), like this one: Project Date Expenditure(USD) Project A 12-2020 500 Project A 01-2021 1257 Project A 02-2021 125889 Project A 03-2021 102447 Project A 04-2021 1248 Project A 05-2021 1222 Project A 06-2021 856 Project B 01-2021 5589 Project B 02-2021 52874 Project B 03-2021 5698745 Project B 04-2021 2031487 Project B 05-2021 2359874 Project B 06-2021 25413 Project B 07-2021 2014 Project B 08-2021 2569 Using python, I …
Category: Data Science

Tools for project management Data Science

I'm in charge of a small data science team (3 data scientists, me included). We do our projects with at least one business person (PM) per project ( we have 5 of these). We managed everything with meetings and emails, but as the number of projects and people keeps increasing, I find it necessary to have a proper management tool. I would like to have something were we could, per project, add business needs (requirements). These requirements could translate into …
Category: Data Science

What is the best practice for data folder structuring?

I work for a small data science consultancy firm and we are trying to standardize our project folder structure. We started from the cookiecutter structure which is a great base. However one of the discussion point lies in the subfolders of the data folder, which is structured as: Raw Interim Processed Let's think about the following situations: The client gives you a manually extracted csv file -> This obviously goes into Raw You have acces to SQL databases and make …
Category: Data Science

General equation for getting an idea of the scale of a machine learning project

I'm writing an application for a project where we intend to teach a model to predict one aspect of an environment (traffic safety) using a database with 10 images (about 300x300px and, say, 256 colors) for each of either 100 000 or 15 million locations. I must come to grasp with if both, one or none of these projects are feasible with our hardware constraints. What can I expect? Is there some formula or benchmark that one can refer to? …
Category: Data Science

Which are the strategies to counter the 80/20 dilema in Data Science projects?

Most of the time in Data Science projects is not spent in (performing) actual analytics but rather in other tasks, such as organizing data sources, collecting samples and preparing datasets, compiling and validating business rules in data, etc.This fact has been studied as the 80/20 dilemma in Data Science projects In order to tackle this dilema, I would like to ask which are the strategies used to decrease the 80% of time spent in the other stages (organizing data sources, …
Category: Data Science

General Process for new project

Often when i get a new project in machine learning the client always ask me either to do a particular task like a prediction for one thing or give me data and ask me to find what i can do with it. i've read the book hands-on machine learning with scikit-learn & tensorflow where you can see a full process for starting a project , basically drawing plots and search in correlation matrices what is interesting. Do you guys have …
Category: Data Science

How to plan a model analysis that avoids overfitting?

Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training. I'm working on a little project where my dataset have 6k lines and around 300 features. As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds). As I re-ran my program twice (I've only …
Category: Data Science

Format for proposing a new algorithmic project?

I am working for an organization that is still new to implementing data science projects. I have an idea for a data spider/algorithm project that will require various pieces of code as well as datasets that I will need to gain access to. I know that this is something that I should just start doing instead of proposing, but unless I come up with a formal document, the org might see it as a "waste of time". So are there …
Category: Data Science

Cameras for automatic customer service machine

For my university project, I am planning to build an automated customer service machine. One which recognizes when someone approaches the camera according to says hello, etc. Also, I am planning to add simple speech recognition and language processing features. So my question is. What kind of camera would be suitable? Is there any particular model that you recommend. I was thinking of cameras used for amazon go(as an example).
Category: Data Science

Milestones of data science project

I'm looking at working on a machine learning project for a company, where they are interested in paying in instalments at certain milestones in the project. My initial thoughts are that how to define these milestones? It's development of a recommender system and deployment on their site. It might take up to 12 months in worst case scenario. I am thinking to ask for payment at these points: before starting the project (0 months) when I demo a prototype (low …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.