software-development

memoizing arbitrary resumable time series, trying not to re-invent the wheel

PJ7

2022年2月20日 14:47

I've written a research tool that allows users to write arbitrary expressions to define time series calculated from a set of primary data sources. Many of the functions I provide carry state derived from previous values, such as EMA. In example: EMA(GetData("Foo"), 280) State contained in the component functions of these expressions can be saved and resumed with AST node labeling at compile time. This allows a series to be resumed later when any of its root data sources, which …

Topic: software-development time-series tools

Category: Data Science

Machine Learning Model Input and Output Flow

Brandon

2021年9月28日 19:49

I am working on a backend for structuring and submitting data into a ML model. I have 3 questions regarding this process. What is the best method to feed the model continuous data (updated every 30m interval) What is the best method to deliver the datasets? (There are two that are being compared, should I consolidate the data on my side, or leave that to the model?) How should the results be exported from the model and input into my …

Topic: machine-learning-model software-development dataset machine-learning

Category: Data Science

Is there a corpus of toy datasets specifically designed for finding bugs in data science software?

Shoeboxam

2021年5月25日 00:16

I'm looking for a corpus of toy tabular datasets that can be used to test data profiling, machine learning, data manipulation, etc. software. Some example attributes: Strange column names (empty string, long names, duplicate names, names with spaces, periods, syntax, escaped delimiters and tokens) Non-rectangular Mixed scientific notation in floats, inf literals Row-empty or column-empty Mixed file encodings Numeric and string values designed to overflow memory buffers/cause truncation/rounding to int Ambiguous and invalid dates Diacritics, emojis I was going to …

Topic: software-development csv

Category: Data Science

unit-testing Machine Learning models

Naveen Reddy Marthala

2020年11月2日 18:06

I have been asked to unit-test my machine learning model(not the code that made the model). Since we wouldn't actually know what predictions models make, how to carry out the unit-testing to check the model's predictions against? How is this done? EDIT 1: The machine learning model I have is trained on tabular data of patients. let's take an example of cancer prediction(I am not allowed to disclose the actual one, but this example is very close). It takes multiple …

Topic: software-development data-mining machine-learning

Category: Data Science

How to Deploy your trained ML model in client VM without they getting access to code

cvg

2020年8月11日 08:51

I am new to deployment and have a basic doubt about deploying my ML code on client's vm. So I have built a python project which collects data from client site, processing, predicts and displays the result in dashboard. I have to use client VMs for deployment. Is there a way for me to hide the code or do something to it so that client cannot see my code and reuse my code for other purposes. Might sound trivial but …

Topic: software-development python

Category: Data Science

Surface Pro 6 vs Macbook Pro for Professional Data Science Practice

TwinPenguins

2020年4月4日 08:55

[I strongly agree this is totally very opinionated question, thus narrators feel free to vote to close it if you feel it is right, but I find endless pros and cons on the Internet, I've decided to ask the community here.] Surface Pro 6 or Macbook Pro for Data Scientist Job? About 8 years ago I was a Windows user. The most annoying part was that it was a quite unstable. It is noted that I was not a developer …

Topic: hardware software-development machine-learning

Category: Data Science

Python, Tkinter, Application packaging and distributing with custom packages

Raj Mehta

2018年7月24日 14:29

I have built an application on Tkinter in Python 3 and I want to package that application with all the dependent packages. I would want to build the .exe application of my python script that installs python 3, some of the packages/dependencies, and install my python script as a .exe. I have heard of py2exe, but is that suggested? How shall I have this and which software is recommended for it? I do not have experience in packing and distributing …

Topic: software-development software-recommendation python

Category: Data Science

Keep track of trainings, datasets eetc

Lukas

2018年6月8日 14:19

After searching quite some time for it on Google I could not find a sufficient software/toolbox that can manage trainings of neural networks. I thought of a program that combines visualization techniques without the need to write code as well as having the possibility to compare several trainings of neural networks and be able to store them easily. Does a program like this exist? Regards Lukas

Topic: model-selection software-development management visualization tools

Category: Data Science

Visualizing software metrics

Sudheej

2018年4月2日 01:39

I have the below sets of data per application, you can call them as software metrics. These metrics vary depending on the size of an application. Bugs CodeSmells Vulnerability The size of the application is determined by LOC (Lines of code), how can i showcase the complexity of each app relative to the lines of code if i visualize each of these parameters. Example Bugs LOC SweetApp 10 10000 SourApp 120 5660000 SaltyApp 55 1500 How do i visualize Bugs …

Topic: data-analysis software-development visualization

Category: Data Science

I want to learn how to construct data science packages on top of core packages. Is there a list of excellent data science packages I can learn from?

myopic

2017年12月24日 14:33

Short question I want to learn how to construct data science packages on top of core packages. Is there a list of excellent data science packages I can learn from? Long question I recently came across an excellent video where Joel Grus live codes a neural network library in Python. As an inexperienced data scientist without a software engineering background, this was the first time I saw the construction of a "complete" data science package from scratch. My data analysis …

Topic: programming software-development python

Category: Data Science

Can Machine Learning be applied in software developement

user4290511

2017年5月19日 16:12

I'm from programming background. I'm now learning Analytics. I'm learning concepts from basic statistics to model building like linear regression, logistic regression, time-series analysis, etc., As my previous experience is completely on programming, I would like to do some analysis on the data which programmer has. Say, Lets have the details below(I'm using SVN repository) personname, code check-in date, file checked-in, number of times checkedin, branch, check-in date and time, build version, Number of defects, defect date, file that has …

Topic: software-development predictive-modeling machine-learning

Category: Data Science

Diagramming data science workflow?

Alex Firsov

2017年3月16日 04:35

I'm working on a consulting project for a tech client, and caught myself scratching my head about what the best way to present advanced analytics workflow is. What will be shown to the panel will focus on results, but in this particular case it is warranted to show a visual for what the process behind the scenes is. Specifically, I need to show the following: 1) Some raw data file is used as input to a cleaning script which performs …

Topic: software-development visualization

Category: Data Science

From development environment to production

OAK

2016年6月1日 17:02

I have been working on a project as part of my master degree in participation with a firm. I developed a predictive model in the past few months that is essentially a document classification model. The biggest limitation of the research and model is the lack of data available for training. I have a small data set of 300 documents, where as the features are in excess of 15000 terms (before feature selection). How do we identify or estimate the …

Topic: model-selection software-development predictive-modeling machine-learning

Category: Data Science

Software Testing for Data Science in R

wacax

2016年1月7日 21:47

I often use Nose, Tox or Unittest when testing my python code, specially when it has to be integrated with other modules or other pieces of code. However, now that I've found myself using R more than python for ML modelling and development. I realized that I don't really test my R code (And more importantly I really don't know how to do it well). So my question is, what are good packages that allow you to test R code …

Topic: software-development r

Category: Data Science

How to access maximum volume of tweets using Twitter Streaming API, without firehose access?

mindcrime

2015年9月15日 04:38

Twitter is a popular source of data for many applications, especially involving sentiment analysis and the like. I have some things I'm interested in doing with Twitter data, but here's the issue: To get all Tweets, you have to get special permission from Twitter (which, as I understand it, is never granted) or pay big bucks to Gnip or the like. OTOH, Twitter's API documentation says: Few applications require this level of access. Creative use of a combination of other …

Topic: software-development

Category: Data Science

Big data and data mining for CRM?

latefreak

2015年7月31日 08:33

We are currently developing a customer relationship management software for SME's. What I'd like to structure for our future CRM is developing CRM with a social-based approach (Social CRM). Therefore we will provide our users (SME's) to integrate their CRM into their social network accounts. Also CRM will be enhance intercorporate communication of owner company. All these processes I've just indicated above will certainly generate lots of unstructured data. I am wondering how can we integrate big data and data-mining …

Topic: software-development bigdata data-mining

Category: Data Science