open-source

Open source ML model/dataset Hub (like HugginFace hub or TF Hub)

Cambri

2022年4月11日 15:03

Do you know about any open source implementation of a service which I could setup on-premise and publish models/datasets so they are visible to other people in my organisation (like HugginFace hub or TensorFlow hub)?

Topic: open-source

Category: Data Science

Video Sentiment Analysis

kumar

2021年10月2日 12:52

I am trying to build a video sentiment analysis feature in python which will take a video and provide the sentiments different people are expressing based on facial expressions in the video. Is there any open source library or project available which I can leverage for my use case?

Topic: sentiment-analysis python open-source

Category: Data Science

Publicly Available Datasets

Amir Ali Akbari

2021年7月8日 20:42

One of the common problems in data science is gathering data from various sources in a somehow cleaned (semi-structured) format and combining metrics from various sources for making a higher level analysis. Looking at the other people's effort, especially other questions on this site, it appears that many people in this field are doing somewhat repetitive work. For example analyzing tweets, facebook posts, Wikipedia articles etc. is a part of a lot of big data problems. Some of these data …

Topic: dataset open-source

Category: Data Science

Search for implementation of Faster RCNN

Torben.

2021年6月2日 03:49

What are the best written and best structured Faster RCNN implementations that you know? Please provide references.

Topic: object-detection implementation classification open-source

Category: Data Science

Open source data science projects to contribute

IharS

2021年3月11日 19:15

Contribution into open source projects is typically a good way to get some practice for newbies, and try a new area for experienced data scientists and analysts. Which projects do you contribute? Please provide some intro + link on Github.

Topic: beginner open-source

Category: Data Science

Publicly available news APIs/datasets?

stevec

2021年2月20日 11:56

In addition to our list of publicly available datasets, I'd like to know if there is any list of publicly available news datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics of the data available were added. Such information should be, and is not limited to: the name of the news network / news aggregator; what kind of news information it provides (title, snippet, full-article, date, author, url, ...); whether it allows for …

Topic: crawling dataset open-source

Category: Data Science

Publicly available social network datasets/APIs

Rubens

2021年2月9日 04:27

As an extension to our great list of publicly available datasets, I'd like to know if there is any list of publicly available social network datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics of the data available were added. Such information should be, and is not limited to: the name of the social network; what kind of user information it provides (posts, profile, friendship network, ...); whether it allows for crawling its …

Topic: crawling dataset open-source

Category: Data Science

Regression dataset with categorical features

David Masip

2020年6月18日 07:26

I have thought of a regression technique that I want to try on several datasets. I would like these datasets to have the following properties: Be a tabular dataset (no images). Have at least 20k rows, and ideally around 100k. Have some categorical variables with many levels (at least a variable with 100 levels or more). Ideally, the target should have long tails. Does anyone any public dataset with these properties? I have found the stack overflow developer survey to …

Topic: data regression dataset categorical-data open-source

Category: Data Science

Where can I find free spatio-temporal dataset for download?

mynameisJEFF

2019年7月12日 12:49

Where can I find free spatio-temporal dataset for download so that I can play with it in R ?

Topic: freebase dataset open-source

Category: Data Science

Code or Package to cluster sequences (or time series) of different lengths based on HMM?

mflowww

2018年10月10日 17:01

Is there any existing code or packages in Python, R, Java, Matlab, or Scala that implements the sequence clustering algorithms in any of the following 2 papers? 1) 'Clustering Sequences with Hidden Markov Models' by Padhraic Smyth (1997): https://papers.nips.cc/paper/1217-clustering-sequences-with-hidden-markov-models.pdf The paper gives a probabilistic model-based approach to clustering sequences (or time series), using hidden Markov models (HMM). 2) 'Visual Cluster Exploration of Web Clickstream Data' by Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma (2012): http://www.cs.tufts.edu/comp/250VIS/papers/VAST2012-ClickStream.pdf The paper is quite …

Topic: markov-hidden-model expectation-maximization sequence clustering open-source

Category: Data Science

which algorithm will be good for detecting and recognition of faces from variety of angles

RISHABH RAI

2018年2月23日 15:42

i am building a face recognition app for my class attendance system , i collect training data from social website like facebook, instagram and other, as you can see the images i got from there is not usually front facial but at variety of angle. i was using haar_cascade for face detection , but it is not good for tilted face, can anyone suggest me the good algorithm for face detection through which i can detect face at variety of …

Topic: object-detection tensorflow computer-vision deep-learning open-source

Category: Data Science

which is the most effective(accurate) face detection method in python

RISHABH RAI

2018年2月20日 13:04

i try haar_cascade for face detection and LBPH for face recognition , but the result wasn't good enough, please suggest good ways to detect and recognize faces. my aim is to create an app which take a photograph of students and by scanning this one photograph, it will predict which student is present and which is absent

Topic: computer-vision deep-learning python open-source machine-learning

Category: Data Science

Data available from industry operations

Juan David

2017年6月30日 14:45

I'm going to start my degree thesis and I want to do a fault detector system using machine learning techniques. I need datasets for my thesis but I don't know where I can get that data. I'm looking for historical operation/maintenance/fault datasets of any kind of machine in the oil & gas industry (drills, steam injectors etc) or electrical companies (transformators, generators etc).

Topic: freebase dataset open-source

Category: Data Science

Are deep - learning toolkits targeted for certain areas or all-purpose tool kits?

Carlton Banks

2017年5月17日 14:25

Are any of the open deep learning toolkits targeted to certain areas, or a all toolkits all purpose toolkits, meaning it is a blackbox for deep learning. My question comes in regards to Microsoft's CNTK which seem to contain examples of speech and text classification, where others usually just have MNIST or CIFAR...

Topic: deep-learning tools open-source

Category: Data Science

Difficulties of getting raw data

LearnByReading

2016年8月5日 03:39

I am trying to obtain raw data for (violent) crime rates of a US/Canadian city (any city would do), but I need the data to be granular and raw. All I could find is either interpretations, summary data or useless editorials. I'm trying to do analysis, and I need day-by-day (detailed-level, granular) data that shows the number of crimes recorded per day. Does anyone have any good sources/suggestions for finding this? Thank you immensely!

Topic: data data-mining open-source

Category: Data Science

Item Based Collaborative Filtering with No Ratings

sheldonkreger

2016年5月31日 16:18

I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited. Our data only shows that a user has either visited a page, or they have not. Users do not provide any ratings of our web pages. This is a good task for item based recommendation. However, most of the algorithms (such as the one in Mahout) requires rating data. …

Topic: apache-mahout recommender-system open-source

Category: Data Science

Tools to preprocess a big data for dashboards?

JeanVuda

2015年11月4日 11:15

I have a complex dataset with more than 16M rows coming from pharmaceutical industry. Regarding the data, it is saved in a sql server with multiple (more than 400) relational tables. Data got several levels of hierachies like province, city, postal code, person, and antigens measures, etc. I would like to create many dashboards in order to observe the changes & trends happening. I can use Pentaho, R (shiny) or Tableau for this purpose. But the problem is data is …

Topic: tableau bigdata open-source

Category: Data Science

Exporting R model to OpenCV's Machine Learning Library

Prophet60091

2014年12月14日 03:42

I'm wonder if it's possible to export a model trained in R, to OpenCV's Machine Learning (ML) library format? The latter appears to save/read models in XML/YAML, whereas the former might be exportable via PMML. Specifically, I'm working with Random Forests, which are classifiers available both in R and OpenCV's ML library. Any advice on how I can get the two to share models would be greatly appreciated.

Topic: r open-source machine-learning

Category: Data Science

Open source solver for large mixed integer programming task?

rnorberg

2014年6月17日 16:18

I'm currently using General Algebraic Modeling System (GAMS), and more specifically CPLEX within GAMS, to solve a very large mixed integer programming problem. This allows me to parallelize the process over 4 cores (although I have more, CPLEX utilizes a maximum of 4 cores), and it finds an optimal solution in a relatively short amount of time. Is there an open source mixed integer programming tool that I could use as an alternative to GAMS and CPLEX? It must be …

Topic: optimization parallel r open-source

Category: Data Science

What open-source books (or other materials) provide a relatively thorough overview of data science?

statsRus

2014年5月16日 13:45

As a researcher and instructor, I'm looking for open-source books (or similar materials) that provide a relatively thorough overview of data science from an applied perspective. To be clear, I'm especially interested in a thorough overview that provides material suitable for a college-level course, not particular pieces or papers.

Topic: open-source education

Category: Data Science

About