tools

What is the difference between Pachyderm and Git?

Lerner Zhang

2022年6月4日 05:03

I learned that tools like Pachyderm version-control data, but I cannot see any difference between that tool with Git. I learned from this post that: It holds all your data in a central accessible location It updates all depending data sets when data is added to or changed in a data set It can run any transformation, as long as it runs in a Docker, and accepts a file as input and outputs a file as result It versions all …

Topic: data version-control dataset tools bigdata

Category: Data Science

Is there any book for modern optimization in Python?

StatguyUser

2022年4月25日 10:14

I was reading Modern Optimization with R (Use R!) and wondering if a book like this exists in Python too? To be precise something that covers stochastic gradient descent and other advanced optimization techniques. Many thanks!

Topic: books career beginner reference-request tools

Category: Data Science

What's your ideal work environment?

Warlax56

2022年3月6日 04:53

I'm a founder in a data science heavy startup, and I'm currently functioning as the entire dev team. Before I know it we'll have people working together on a project I, currently, work on completely alone. So: What are some must have things data scientists need to work together in a production setting? Where are some things data scientist expect to have done outside their scope of work? What would make their lives easier and more productive? What are some …

Topic: tools

Category: Data Science

What data/analytics tools I need to use at my current e-commerce workplace?

Bob

2022年3月6日 01:23

I recently started a new position as a data scientist at an E-commerce company. The company is founded about 4-5 years ago and is new to many data-related areas. Specifically, I'm their first data science employee. So I have to take care of both data analysis tasks as well as bringing new technologies to the company. They have used Elastic Search (and Kibana) to have reporting dashboards on their daily purchases and user's interactions on their e-commerce website. They also …

Topic: google-bigquery apache-spark aws tools

Category: Data Science

memoizing arbitrary resumable time series, trying not to re-invent the wheel

PJ7

2022年2月20日 14:47

I've written a research tool that allows users to write arbitrary expressions to define time series calculated from a set of primary data sources. Many of the functions I provide carry state derived from previous values, such as EMA. In example: EMA(GetData("Foo"), 280) State contained in the component functions of these expressions can be saved and resumed with AST node labeling at compile time. This allows a series to be resumed later when any of its root data sources, which …

Topic: software-development time-series tools

Category: Data Science

IDE alternatives for R programming (RStudio, IntelliJ IDEA, Eclipse, Visual Studio)

IharS

2022年1月18日 04:57

I use RStudio for R programming. I remember about solid IDE-s from other technology stacks, like Visual Studio or Eclipse. I have two questions: What other IDE-s than RStudio are used (please consider providing some brief description on them). Does any of them have noticeable advantages over RStudio? I mostly mean debug/build/deploy features, besides coding itself (so text editors are probably not a solution).

Topic: programming rstudio r tools

Category: Data Science

How to integrate Zoho analytics with Jupyter notebook?

Pikaschu

2022年1月5日 06:36

I am trying to connect Zoho Analytics and Python for importing data from Zoho Analytics. I have already installed !pip install zoho-analytics-connector. What should I do next? I am new to integrating with other BI tools so unable to find out a better solution. Can you guide me on this? I am referring the instructions from https://pypi.org/project/zoho-analytics-connector/ and https://www.zoho.com/analytics/api/#python-library. from __future__ import with_statement from ReportClient import ReportClient import sys Now, I am getting an error as: Traceback (most recent call …

Topic: data-analysis pip python-3.x dataset tools

Category: Data Science

Cloud-based visual tool to perform NLP on text corpora

Strabonio

2021年11月10日 21:45

I have some text corpora to share with non-programming clients (~50K documents, ~100M tokens) who would like to perform operations like regex searches, co-locations, Named-entity recognition, and word clusters. The tool AntConc is nice and can do some of these things, but comes with severe size limitations, and crashes on these corpora even on powerful machines. What cloud-based tools with a web interface would you recommend for this kind of task? Is there an open-source tool or a cloud service …

Topic: corpus nlp tools

Category: Data Science

Reusable parameter scans wrapper

jgyou

2021年10月22日 14:24

In most of my projects, I come up with models and want to visualize how some property $x$ varies as a function of a subset of parameters $p_1$,$p_2$, .. etc. So I'll often end up with figures of the "parameter scan" which look like this Those are very helpful for explaining a model / process / datasets. The problem is: I put an inordinate amount of work into producing the data necessary to generate these figures. Most of it wasted …

Topic: tools

Category: Data Science

Tool to Generate 2D Data via Mouse Clicking

MD004

2021年10月1日 05:04

Often when I am learning new machine learning methods or experimenting with a data analysis algorithm I need to generate a series of 2D points. Teachers also do this often when making a lesson or tutorial. In some cases I just create a function, add some noise, and plot it, but there are many times when I wish I could just click my mouse on a graph to generate points. For instance, when I want to generate a fairly complex …

Topic: data tools

Category: Data Science

Tools / tech stack for generating metrics and insights for games

user2449397

2021年8月17日 12:03

We run a games platform with millions of users (+- 150,000,000 gameplays / month). We want to find tools or set up a data stack to: collect basic metrics for a specific game such as average gameplay time, 1 day return rate, 7 day return rate,... be able to segment these data by any dimension that we pass along (e.g. by country, by network speed, by ...) generate more advanced insights for a specific game, e.g. this is the distribution …

Topic: software-recommendation tools

Category: Data Science

UI-based Tool for Qualitative Evaluation of Data Quality

SherabWangchuk

2021年8月3日 10:41

Dear DS StackExchange community, I'm currently searching the interwebs for a (near-)ready-to-use solution, to perform a qualitative evaluation of extracted features from video data. In my head the tool looks something like the screenshot below (taken from the annotation tool prodigy), in the sense that a video is displayed at the top and underneath one would see a plot of a corresponding feature (selected e.g. via a drop-down menu) extracted from the video. This includes (nearly) every kind of data …

Topic: data-quality data visualization tools

Category: Data Science

What tools are out there to collect participants' browsing and/or search data as part of an experiment?

Elise

2021年7月10日 18:10

I'm running an experiment where I need to collect and analyse participants' browsing and search histories. The design of the experiment is similar to an "instrumented user panel", described here:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.8971&rep=rep1&type=pdf In the classic case, participants must install some kind of logger on their computers, which collects and sends browsing data to the researcher behind the scenes. Finding such tools is where I get stuck. I could, of course, just ask my participants to export their browsing histories and send them …

Topic: tools

Category: Data Science

How to change data source names in Google Data Studio

Dee

2021年6月25日 16:15

Google Data Studio has connectors to MySQL, PostgreSQL, etc. but these connections will come with default names. Couldn't find out how to set the names for data sources in Google Data Studio. Is it even possible?

Topic: google-data-studio data sql google tools

Category: Data Science

Is there a way to test out simple filters before committing to coding them?

mavavilj

2021年6月23日 12:51

Is there a way to test out simple filters before committing to coding them? Like if I want to estimate the feasibility of recognizing some features from images. Or to estimate the effort/sophistication of required methods. Then can I try out something in Photoshop or something in order to discover "where to look for"? Prior to coding?

Topic: image-segmentation image-classification tools

Category: Data Science

VM image for data science projects

JeanVuda

2021年2月9日 04:28

As there are numerous tools available for data science tasks, and it's cumbersome to install everything and build up a perfect system. Is there a Linux/Mac OS image with Python, R and other open-source data science tools installed and available for people to use right away? An Ubuntu or a light weight OS with latest version of Python, R (including IDEs), and other open source data visualization tools installed will be ideal. I haven't come across one in my quick …

Topic: python r tools

Category: Data Science

Is there any augmentation tool for images and bounding boxes?

Hoang Dang Tuan

2021年2月7日 15:16

I don't have a lot of training data and I'm looking for some tools in python or executable program like labelimg that do some heavy augmentation on images, even better if they also change bounding boxes coordinate accordingly. Any help will be appreciated!

Topic: data-augmentation computer-vision python tools

Category: Data Science

Data science tools for easing the participation of a business into their scoring system

Matthieu Dsprz

2021年1月29日 00:30

I'm a working in a small company. The company sells products on a website and they have a python script that runs everyday to attribute a score to each product based on a set of parameters (google analytics events, similar products popularity, price, etc). The problem is that the scoring outcome is not satisfying, and requiring developers to edit this script arbitrarily, based on business people assumptions, is time consuming and not a proper way to achieve what the business …

Topic: scoring python tools

Category: Data Science

Suggestions for Open-Source Tool for Image Classifications (with Nesting)

Nat Aes

2021年1月15日 11:22

I'm looking for an open source tool to assist my colleagues and I to label images for a machine learning application. We don't actually need bounding boxes or anything to pinpoint regions within each image, and instead need solely global image classifications (e.g. whether the image is one of a cityscape, rural setting etc). The mission-critical functionality we're looking for is: image classification (both radio boxes and checklists) the ability to nest labels, e.g. if label1=cityscape then label2 is required …

Topic: multilabel-classification image-classification classification tools machine-learning

Category: Data Science

Tool for clustering and cleansing data set

Matthew Gertner

2021年1月9日 21:52

I have a large-ish data set (400K records) composed of two fields (both strings). I am looking for a tool that will enable me to cluster the data e.g. around the first column, either using exact matches or some kind of string proximity function like Levenshtein distance. I would also like to be able to find all duplicate records and merge them into one. OpenRefine looks ideal for my purposes but it is so slow when clustering my data or …

Topic: data-cleaning tools clustering

Category: Data Science

About