How to predict the sentiment of the entities form the tweet?

I have a JSON file (tweets.json) that contains tweets (sentences) along with the name of the author. Objective 1: Get the most frequent entities from the tweets. Objective 2: Find out the sentiment/polarity of each author towards each of the entities. Sample Input: Assume we have only 3 tweets: Tweet1 by Author1: Pink Pearl Apples are tasty but Empire Apples are not. Tweet2 by Author2: Empire Apples are very tasty. Tweet3 by Author3: Pink Pearl Apples are not tasty. Sample …
Category: Data Science

Train a spaCy model for semantic similarity

I'm attempting to train a spaCy model for the purposes of computing semantic similarity but I'm not getting the results I would anticipate. I have created two text files that contain many sentences that use a new term, "PROJ123456". For example, "PROJ123456 is on track." I've added each to a DocBin and saved them to disk as train.spacy and dev.spacy. I'm then running: python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy The config.cfg file contains: [paths] train …
Category: Data Science

Does spaCy support multiple GPUs?

I was wondering if spaCy supports multi-GPU via mpi4py? I am currently using spaCy's nlp.pipe for Named Entity Recognition on a high-performance-computing cluster that supports the MPI protocol and has many GPUs. It says here that I would need to specify the GPU to use with cupy, but with PyMPI, I am not sure if the following will work (should I import spacy after calling cupy device?): from mpi4py import MPI import cupy comm = MPI.COMM_WORLD rank = comm.Get_rank() if …
Category: Data Science

Spacy custom POS tagging for medical concepts

We are a group of doctors trying to use linguistic features of "Spacy", especially the part of speech tagging to show relationships between medical concepts like: 'Femoral artery pseudoaneurysm as in ==> "femoral artery" ['Anatomical Location'] --> and "pseudoaneurysm" ['Pathology'] We are new to NLP and spacy, can someone with experience with NLP and Spacy explain if this is a good approach to show these relationships in medical documents? If not what are the other alternative methods? Many thanks!
Category: Data Science

Dealing with near duplicates using NLP

I have a dataframe like as shown below ID,Name,year,output 1,Test Level,2021,1 2,Test Lvele,2022,1 2,dummy Inc,2022,1 2,dummy Pvt Inc,2022,1 3,dasho Ltd,2022,1 4,dasho PVT Ltd,2021,0 5,delphi Ltd,2021,1 6,delphi pvt ltd,2021,1 df = pd.read_clipboard(sep=',') My objective is a) To replace near duplicate strings using a common string. For example - let's pick couple of strings from Name column. We have dummy Inc and dummy Pvt Inc. These both have to be replaced as dummy I manually prepared a mapping df map_df like as …
Category: Data Science

How to train a spaCy language model from scratch?

I am still quite a beginner with spaCy (although I already do enjoy it). I would like to create a language model for a language still unsupported, that is from scratch. I do have comprehensive text corpora in this language. Where do I start and how to proceed? TIA.
Topic: spacy
Category: Data Science

making conclusions after sentiment analysis

After performing some sentiment analysis, I have a dataset that looks like this: For different products, using online reviews, I have obtained some values for positive/negative sentiments. However, now I am unable to figure out how to draw conclusions for this. I had the idea of using correlation but need ideas on what features could be created & what comparisons could be made? The dataset includes different "Features" like webcam, screen, mousepad for different products (product name). id Date Website …
Category: Data Science

AttributeError: 'English' object has no attribute 'predict

I have trained NER model from spacy version 3.2 and trying to predict with my text and below error i am facing "AttributeError: 'English' object has no attribute 'predict". python 3.7 spacy 3.2 using Mac-book-pro here my code: import pickle cv_sections_model1 = pickle.load(open("ml_models/cv_sectionsv3.pkl", "rb")) def predict_sections(self): global cv_sections_model1 # remove strings with only special characters sections = [ section for section in self.sections if len(re.sub(r"[^a-z0-9 ]", "", section.lower()).strip()) > 3 ] predicted = cv_sections_model1.predict(sections) print(predicted) predicted_sections = [zipped for zipped …
Topic: spacy
Category: Data Science

What approach I should take to extract number entity from dataset

I have the training, validation, and test dataset. The first column has store data and the second column has store numbers. I need to develop an entity extractor model which can extract store numbers from the first column. I tried searching about entity extractor models like SpaCy and Stanford NER package but did not quite understand how to implement them in this scenario. As you can see above, a store number is not always numeric data, but what I found …
Category: Data Science

How to identify/recognize that a sentence about talks about future?

Brief Introduction: I have a report/paragraph in which there are sentences with reference to future plans/outlooks/expectations for a particular entity. I want to extract all such sentences for now. Problem statement: How to identify or recognize such futuristic statement (sentence where they refer to their plans) or How to best segregate the futuristic sentences from other non-futuristic sentences. I’m looking for a traditional programming solution and/or Machine Learning solution. Languages and packages preferred: Python, Spacy, scikit-learn, keras (backend - tensorflow) …
Category: Data Science

Traning New Entities in Spacy NER Model

I want to add new entities to python spacy NER module. I have few doubts regarding this. Is it possible to remove some of the presently existing entities and add new entities to the remaining ones. While training new entities, I found we have to provide training data in a particular format. For example, data = [ ("I love chicken", [(8, 13, "FOOD")]), ... ] Instead of sentences like "I love chicken", is it possible to give data like data …
Topic: spacy python
Category: Data Science

Converting to lowercase while creating dataset for NER using spacy

I am trying to make a custom entity model for an NER application using spacy. In several NLP projects, I have converted all the data to lowercase and applied several ML techniques. For NER also should I have to convert the data to lowercase. Or why it is necessary to convert to lower case. Is it a mandate one which will affect the accuracy of the model adversely if not converted to lowercase.
Category: Data Science

Compare Books using book categories list NLP

I have a database of books. Each book has a list of categories that describe the genre/topics of the book (I use Python models). The categories in the list most of the time are composed of 1 to 3 words. Examples of a book category list: ['Children', 'Flour mills', 'Jealousy', 'Nannies', 'Child labor', 'Conduct of life'], ["Children's stories", 'Christian life'], ['Children', 'Brothers and sisters', 'Conduct of life', 'Cheerfulness', 'Christian life'], ['Fugitive slaves', 'African Americans', 'Slavery', 'Plantation life', 'Slaves', 'Christian life', …
Category: Data Science

Can not install spacy package on windows 10 via pip

I have below environment. OS: Windows 10 Python: Python 3.7.4 PIP: pip 19.3.1 I am trying to install spacy in my windows 10 OS. It gives me below error. ERROR: Command errored out with exit status 1: command: 'd:\rajesh\python\env1\scripts\python.exe' 'd:\rajesh\python\env1\lib\site-packages\pip' install --ignore-installed --no-user --prefix 'C:\Users\rajesh.das\AppData\Local\Temp\pip-build-env-vna552d_\normal' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'thinc<7.4.0,>=7.3.0' 'cymem<2.1.0,>=2.0.2' 'preshed<3.1.0,>=3.0.2' wheel 'cython>=0.25' 'murmurhash<1.1.0,>=0.28.0' cwd: None Complete output (460 lines): Collecting thinc<7.4.0,>=7.3.0 Using cached https://files.pythonhosted.org/packages/d4/38/f79bb496ced36f8d69cdbdfe57a322205582ed9508bda5bd0227969d5a77/thinc-7.3.1.tar.gz Collecting cymem<2.1.0,>=2.0.2 Using cached https://files.pythonhosted.org/packages/ce/8d/d095bbb109a004351c85c83bc853782fc27692693b305dd7b170c36a1262/cymem-2.0.3.tar.gz Collecting preshed<3.1.0,>=3.0.2 Using cached https://files.pythonhosted.org/packages/5f/14/de231123ddbe0bf12bd9b1993122d67f22859643bee4dad3b6ce91986336/preshed-3.0.2.tar.gz …
Category: Data Science

Is NLP suitable for my legal contract parsing problem?

My company has a product that involves the extraction of a variety of fields from legal contract PDFs. The current approach is very time consuming and messy, and I am exploring if NLP is a suitable alternative. The PDFs that need to be parsed usually follow one of a number of "templates". Within a template, almost all of the documents are the same, except for 20 or so specific fields we are trying to extract. That being said, there are …
Category: Data Science

Character-level embeddings in python

I'm working on an NLP task that requires the use of character level embeddings, and I've been trying to use Spacy. However, it seems that spacy uses word-level embeddings for the word vectors, and I need character-level embeddings. The only character-level embedding library I've been able to find is chars2vec which does not seem well maintained. Is there a way to get character-level embeddings with either spacy or a more popular package than chars2vec?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.