Python and Titanic competition how to get the median of specific range of values where class is 3

I am trying to solve Kaggle's titanic competition. In the test set, there is only 1 row having Fare value as null. It's so easy to replace it with median or average of all Fare values. But I am thinking of getting a graph showing relation between fares and classes (1, 2, 3) and fares with Embark field, to check some patterns and to narrow down the range of fare values before doing the calculations to replace null values. The …
Category: Data Science

How to define the adequate cash prize sizing for hosting a Kaggle or similar compeition?

If you take Kaggle as a well known example of data science competition size, how do you know what is an adequate budget for the cash prize size? At least to determine an order of magnitude, given I am not able to study all previous competitions, cluster them by this factor and assess driving factors? irony - could be a competition itself? I've also looked up the Q&A page at Kaggle and have found more or less same question but …
Category: Data Science

Poker tournament winner prediction

I am trying to solve poker tournament winner prediction problem. I’ve millions of historical records in this format: Players ==> Winner P1,P2,P4,P8 ==> P2 P4,P7,P6 ==> P4 P6,P3,P2,P1 ==> P1 I want to find the best algorithm to predict winner from set of known players. So far I have tried decision trees, XGboost without much success. I’ve done my research and could not find answer anywhere else.My apologies in advance if same problem is answered in different terms on stack-overflow.
Category: Data Science

How to approach the numer.ai competition with anonymous scaled numerical predictors?

Numer.ai has been around for a while now and there seem to be only few posts or other discussions about it on the web. The system has changed from time to time and the set-up today is the following: train (N=96K) and test (N=33K) data with 21 features with continuous values in [0,1] and a binary target. The data is clean (no missing values) and updated every 2 weeks. You can upload your predictions (on the test set) and see …
Category: Data Science

numer.ai: how does their leaderboard system work?

There is a data mining competition website called numer.ai. Presumably, behind the website is a hedge fund which makes use of the predictions that people send. People within the 100th top places continuously make money, until the next dataset is revealed, and the competition resets. What I don't understand is that websites like Kaggle avoid overfitting by having a public and a private leaderboard. The private leaderboard is only revealed at the end of the competition and only then are …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.