Deployment in AzureML for NLP with fastText

Question

Deployment in AzureML for NLP with fastText

cris2019

2022年3月14日 14:04

I am new to Azure ML. I am working on sentimental analysis on a small tweet dataset with the help of fastText embedding (fastText file 'wiki-news-300d-1M.vec' is around 2.3 GB which I downloaded in my folder). When I run the program in the Jupyter notebook everything runs well. But when I try to deploy the model in Azure ML, while I attempt to run the experiment:

run = exp.start_logging()                   
run.log(Experiment start time, str(datetime.datetime.now()))

I am getting the error message:

While attempting to take snapshot of .
Your total snapshot size exceeds the limit of 300.0 MB.

The folder where my Jupyter files are there is close to 2.5GB. Is there any way to get over this problem or is it possible to write the NLP program without downloading the fastText embedding? Any suggestions?

Topic jupyter azure-ml

Category Data Science

Alejandro Celis · Accepted Answer · 2021年8月14日 21:10

Seems like the recommended option would be to store the trained embeddings in Azure Blob Storage and add it as a Dataset to the Azure ML workspace see here.

Other option can be to have the embeddings file out of the snapshot see here

To prevent unnecessary files from being included in the snapshot, make an ignore file (.gitignore or .amlignore) in the directory. Add the files and directories to exclude to this file. For more information on the syntax to use inside this file, see syntax and patterns for .gitignore. The .amlignore file uses the same syntax. If both files exist, the .amlignore file is used and the .gitignore file is unused.

Deployment in AzureML for NLP with fastText

About