Natural Language gender classification task with very small training set

The task involving determining the gender of the creator of a Reddit post. Given a post and its title, I need a model to output a probability vector $[p_{male},p_{female}]$.
The difficulty here is that the training set is very small: we have only labeled 5000 posts. In addition, the average length of sentence exceed 90, making it hard to extract features.
Currently, we are using non-deep learning methods to perform this task because of the small size of dataset: use tf-idf to extract features and regression to generate output.
However, the performance is not good and I wonder if we can use improve the performace by using NN-based feature extraction, like using pretrained encoders to extract features and only train the regression model.

Topic encoder tfidf nlp

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.