I am an undergraduate in math just beginning in data science looking for ideas to do a data science project that involves Topological Data Analysis. I've been reading privately on algebraic topology, and I think it would be really neat to apply some of these ideas to my future data science career. any ideas you have would be great. Please keep in mid that I am very new at this, and I only know Python, R, and am currently learning …
How do you share modeling work among several programmers? Our team has split apart the work of writing the SQL code to create our dataset. However, we will soon need to build a machine learning model. My machine learning approach is a linear/iterative process. Read data in, then split data, then normalize, then try model A, model B, score the model, and go back to tweak various hyperparameters. Experiment. Outside of pair programming, how can you distribute/share this work?
I have a bunch of projects for my job that are largely unrelated except they use the same data, which is pretty big on disk in csv format. I want these to exist separately from each other and I usually try to use the cookie cutter data science model for project structure, and keep all my data in a data folder in the root of the project. But because this dataset is big, I don't want to have ten copies …
I have a dataset with many project's monthly expendituries (cost curve), like this one: Project Date Expenditure(USD) Project A 12-2020 500 Project A 01-2021 1257 Project A 02-2021 125889 Project A 03-2021 102447 Project A 04-2021 1248 Project A 05-2021 1222 Project A 06-2021 856 Project B 01-2021 5589 Project B 02-2021 52874 Project B 03-2021 5698745 Project B 04-2021 2031487 Project B 05-2021 2359874 Project B 06-2021 25413 Project B 07-2021 2014 Project B 08-2021 2569 Using python, I …
I'm in charge of a small data science team (3 data scientists, me included). We do our projects with at least one business person (PM) per project ( we have 5 of these). We managed everything with meetings and emails, but as the number of projects and people keeps increasing, I find it necessary to have a proper management tool. I would like to have something were we could, per project, add business needs (requirements). These requirements could translate into …
I work for a small data science consultancy firm and we are trying to standardize our project folder structure. We started from the cookiecutter structure which is a great base. However one of the discussion point lies in the subfolders of the data folder, which is structured as: Raw Interim Processed Let's think about the following situations: The client gives you a manually extracted csv file -> This obviously goes into Raw You have acces to SQL databases and make …
I'm writing an application for a project where we intend to teach a model to predict one aspect of an environment (traffic safety) using a database with 10 images (about 300x300px and, say, 256 colors) for each of either 100 000 or 15 million locations. I must come to grasp with if both, one or none of these projects are feasible with our hardware constraints. What can I expect? Is there some formula or benchmark that one can refer to? …
Most of the time in Data Science projects is not spent in (performing) actual analytics but rather in other tasks, such as organizing data sources, collecting samples and preparing datasets, compiling and validating business rules in data, etc.This fact has been studied as the 80/20 dilemma in Data Science projects In order to tackle this dilema, I would like to ask which are the strategies used to decrease the 80% of time spent in the other stages (organizing data sources, …
Often when i get a new project in machine learning the client always ask me either to do a particular task like a prediction for one thing or give me data and ask me to find what i can do with it. i've read the book hands-on machine learning with scikit-learn & tensorflow where you can see a full process for starting a project , basically drawing plots and search in correlation matrices what is interesting. Do you guys have …
Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training. I'm working on a little project where my dataset have 6k lines and around 300 features. As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds). As I re-ran my program twice (I've only …
I am working for an organization that is still new to implementing data science projects. I have an idea for a data spider/algorithm project that will require various pieces of code as well as datasets that I will need to gain access to. I know that this is something that I should just start doing instead of proposing, but unless I come up with a formal document, the org might see it as a "waste of time". So are there …
For my university project, I am planning to build an automated customer service machine. One which recognizes when someone approaches the camera according to says hello, etc. Also, I am planning to add simple speech recognition and language processing features. So my question is. What kind of camera would be suitable? Is there any particular model that you recommend. I was thinking of cameras used for amazon go(as an example).
I'm looking at working on a machine learning project for a company, where they are interested in paying in instalments at certain milestones in the project. My initial thoughts are that how to define these milestones? It's development of a recommender system and deployment on their site. It might take up to 12 months in worst case scenario. I am thinking to ask for payment at these points: before starting the project (0 months) when I demo a prototype (low …