Data To Text NLG for financial reports

I am working on a project where I want to replace a template-based approach for financial reporting with an end-to-end approach with NLG.

The template-based approach takes as input some financial data (features about ESG: environment, governance, social of the company with some scores features on these three pillars); and then, it is passed as input to a function (i.e, the template function) which return a text which is present on the report for the company.

For example:

  • score_governance = 40%
  • score_environment = 56%
  • Company : Toto
  • pct_females: 10%

The text could be At Toto's company the governance score is below the average which can be due to a limited percentage of females... The environment score is above the average ....

It's scripted like if pct_females is below certain value text is like that else it's like that, etc.

In fact, this solution is not the best because the text sounds robotic and we need to regularly modify the template.

So I made some research and found out about data to text nlg. This is exactly what I want to do, transforming data (numerical but also some categorical) into text.

By taking a look at this dataset: https://github.com/harvardnlp/boxscore-data, I understand that to train such a model I need for the training a pair of (data, gold text). But unfortunately, I only have the data available.

My ideas are :

  1. Label the data (i.e produce a corpus text for each input sample). I will ask different people to describe the data they see using their own writing style. Then custom training on my dataset using some implemented data-to-text solution.

  2. Using Transfer Learning. But I don't know if it is possible for data to text NLG and also I don't really found out papers that explore it.

My question is:

If I label, how many? I know that I depend on my data, but if someone already works on a similar project (i.e. data to text custom training), what will be the minimum required

Thanks in advance to anyone who helps, I am open to any suggestions/ideas/papers...

Topic nlg transfer-learning nlp machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.