Predict indices of text using deep learning
I want to predict the start and end indices of text where a certain type of propaganda technique is used like smears, name-calling, loaded language etc. Some examples from the dataset are:
['THERE ARE ONLY TWO GENDERS\n\nFEMALE \n\nMALE\n', 'This is not an accident!', SO BERNIE BROS HAVEN'T COMMITTED VIOLENCE EH?\n\nPOWER COMES FROM THE BARREL OF A GUN, COMRADES.\n\nWHAT ABOUT THE ONE WHO SHOT CONGRESSMAN SCALISE OR THE DAYTON OHIO MASS SHOOTER?\n]
[[[0, 41]], [], [[47, 83], [3, 14], [33, 41], [163, 175], [85, 93], [0, 176]]]
So, 0 and 41 mean that the whole text from 1st example comes under a certain category i.e. from index 0 to 41.
The next one has nothing weird in it.
Then we have 'Slogan' from 47 to 83 i.e. 'POWER COMES FROM THE BARREL OF A GUN' , and for 3 to 14 there is 'BERNIE BROS' which is highlighted as 'name calling'.
I have tried using regression here with an LSTM model but the results are very poor which I expected. I am looking for the right approach to solve this problem. Any help will be highly appreciated. Thanks!
Topic lstm multilabel-classification sequence
Category Data Science