Passing Dependency/Constituency trees to a Neural Machine Translator
I am working on a project on Neural Machine Translation in the English-Irish domain. I am not an expert and have researched entirely on my own for a technology exhibition so apologies if my question is simple.
I am trying to parse all of my English corpus to constituency trees. Of course, the format of a sentence when using the Stanford Parser is something like:
(ROOT (S (NP (VBG cohabiting) (NNS partners)) (VP (MD can) (VP (VB make) (NP (NP (NNS wills)) (SBAR (WHNP (WDT that)) (S (VP (VBP favour) (NP (DT each) (JJ other)))))))) (. .)))
Of course, when dealing with simple sequences, each word is used, it's not symbolism like NP
or NNS
in constituency trees.
Right now, I'm working with PyTorch and Fairseq to produce all my models and have gotten a working seq2seq model. But, can I simply just pass my English input like shown above to a model and expect it to train? Do I need to make a new model from scratch that can deal with tree structures? I've tried very hard to research this, reading papers, books and playing around with tools but since I'm not in a class for this and since it's not really documented, I'm finding it hard to find this on my own.
Any help would be greatly appreciated
Topic pytorch machine-translation neural-network nlp
Category Data Science