I am about to put my project on GitHub but the SpaCy models are too big (6GB). What is best practice for handling SpaCy models when pushing to your git? I am very new to this and this is my first SpaCy project - appreciate any help at all, thank you.
Full disclosure, I am a powerBI n00b. I am working on a report for data generated by an external system. This system exports data into different tabs in an excel sheet(lame, I know). I am needing to get a single filter to filter data from all 4 data sets. Let's say this column is "last name". I have tried to create these relationships, and it seems to work....however I am convinced this is not the proper way to handle this. …
In machine translation, we often have bilingual dataset, e.g. for German-English and French-English we will have something that looks like this: /en-de train.de train.en dev.de dev.en test.de test.en /en-fr train.fr train.en dev.fr dev.en test.fr test.en And then we have a third language pair German-French, and we'll have: /de-fr train.fr train.de dev.fr dev.de test.fr test.de But lets say we add Spanish-English and we'll get: /en-es train.es train.en dev.es dev.en test.es test.en /de-es train.es train.de dev.es dev.de test.es test.de /fr-es train.es train.fr …