Do you know about any open source implementation of a service which I could setup on-premise and publish models/datasets so they are visible to other people in my organisation (like HugginFace hub or TensorFlow hub)?
I am trying to build a video sentiment analysis feature in python which will take a video and provide the sentiments different people are expressing based on facial expressions in the video. Is there any open source library or project available which I can leverage for my use case?
One of the common problems in data science is gathering data from various sources in a somehow cleaned (semi-structured) format and combining metrics from various sources for making a higher level analysis. Looking at the other people's effort, especially other questions on this site, it appears that many people in this field are doing somewhat repetitive work. For example analyzing tweets, facebook posts, Wikipedia articles etc. is a part of a lot of big data problems. Some of these data …
Contribution into open source projects is typically a good way to get some practice for newbies, and try a new area for experienced data scientists and analysts. Which projects do you contribute? Please provide some intro + link on Github.
In addition to our list of publicly available datasets, I'd like to know if there is any list of publicly available news datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics of the data available were added. Such information should be, and is not limited to: the name of the news network / news aggregator; what kind of news information it provides (title, snippet, full-article, date, author, url, ...); whether it allows for …
As an extension to our great list of publicly available datasets, I'd like to know if there is any list of publicly available social network datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics of the data available were added. Such information should be, and is not limited to: the name of the social network; what kind of user information it provides (posts, profile, friendship network, ...); whether it allows for crawling its …
I have thought of a regression technique that I want to try on several datasets. I would like these datasets to have the following properties: Be a tabular dataset (no images). Have at least 20k rows, and ideally around 100k. Have some categorical variables with many levels (at least a variable with 100 levels or more). Ideally, the target should have long tails. Does anyone any public dataset with these properties? I have found the stack overflow developer survey to …
Is there any existing code or packages in Python, R, Java, Matlab, or Scala that implements the sequence clustering algorithms in any of the following 2 papers? 1) 'Clustering Sequences with Hidden Markov Models' by Padhraic Smyth (1997): https://papers.nips.cc/paper/1217-clustering-sequences-with-hidden-markov-models.pdf The paper gives a probabilistic model-based approach to clustering sequences (or time series), using hidden Markov models (HMM). 2) 'Visual Cluster Exploration of Web Clickstream Data' by Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma (2012): http://www.cs.tufts.edu/comp/250VIS/papers/VAST2012-ClickStream.pdf The paper is quite …
i am building a face recognition app for my class attendance system , i collect training data from social website like facebook, instagram and other, as you can see the images i got from there is not usually front facial but at variety of angle. i was using haar_cascade for face detection , but it is not good for tilted face, can anyone suggest me the good algorithm for face detection through which i can detect face at variety of …
i try haar_cascade for face detection and LBPH for face recognition , but the result wasn't good enough, please suggest good ways to detect and recognize faces. my aim is to create an app which take a photograph of students and by scanning this one photograph, it will predict which student is present and which is absent
I'm going to start my degree thesis and I want to do a fault detector system using machine learning techniques. I need datasets for my thesis but I don't know where I can get that data. I'm looking for historical operation/maintenance/fault datasets of any kind of machine in the oil & gas industry (drills, steam injectors etc) or electrical companies (transformators, generators etc).
Are any of the open deep learning toolkits targeted to certain areas, or a all toolkits all purpose toolkits, meaning it is a blackbox for deep learning. My question comes in regards to Microsoft's CNTK which seem to contain examples of speech and text classification, where others usually just have MNIST or CIFAR...
I am trying to obtain raw data for (violent) crime rates of a US/Canadian city (any city would do), but I need the data to be granular and raw. All I could find is either interpretations, summary data or useless editorials. I'm trying to do analysis, and I need day-by-day (detailed-level, granular) data that shows the number of crimes recorded per day. Does anyone have any good sources/suggestions for finding this? Thank you immensely!
I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited. Our data only shows that a user has either visited a page, or they have not. Users do not provide any ratings of our web pages. This is a good task for item based recommendation. However, most of the algorithms (such as the one in Mahout) requires rating data. …
I have a complex dataset with more than 16M rows coming from pharmaceutical industry. Regarding the data, it is saved in a sql server with multiple (more than 400) relational tables. Data got several levels of hierachies like province, city, postal code, person, and antigens measures, etc. I would like to create many dashboards in order to observe the changes & trends happening. I can use Pentaho, R (shiny) or Tableau for this purpose. But the problem is data is …
I'm wonder if it's possible to export a model trained in R, to OpenCV's Machine Learning (ML) library format? The latter appears to save/read models in XML/YAML, whereas the former might be exportable via PMML. Specifically, I'm working with Random Forests, which are classifiers available both in R and OpenCV's ML library. Any advice on how I can get the two to share models would be greatly appreciated.
I'm currently using General Algebraic Modeling System (GAMS), and more specifically CPLEX within GAMS, to solve a very large mixed integer programming problem. This allows me to parallelize the process over 4 cores (although I have more, CPLEX utilizes a maximum of 4 cores), and it finds an optimal solution in a relatively short amount of time. Is there an open source mixed integer programming tool that I could use as an alternative to GAMS and CPLEX? It must be …
As a researcher and instructor, I'm looking for open-source books (or similar materials) that provide a relatively thorough overview of data science from an applied perspective. To be clear, I'm especially interested in a thorough overview that provides material suitable for a college-level course, not particular pieces or papers.