text-filter

NLP - Paraphrase extraction in Python

Naveen Reddy Marthala

2022年1月11日 09:28

I am trying to develop a NLP model, which takes something like you have high levels of cholesterol(this will be a tag) as input and has to output something like you have high levels of cholesterol, you need to have a low-salt diet that emphasizes fruits, vegetables and whole grains; limit the amount of animal fats and use good fats in moderation(this will be the suggestion; and it is an example suggestion from doctor). So, now when I was researching …

Topic: text-filter text-generation text-mining nlp python

Category: Data Science

How to work with hundreds of CSVs with millions of rows in each?

rick458

2021年6月2日 22:45

So I'm doing a project on the COVID-19 Tweets dataset from the IEEE port and I plan to analyse the tweets over the time period from March 2020 till date. The thing is there's more than 300 CSVs for each data with each having millions of rows. Now I need to hydrate all of these tweets before I can go and filter through them. Hydrating just 1 CSV alone took more than two hours today. I wanted to know if …

Topic: text-filter csv sentiment-analysis databases machine-learning

Category: Data Science

LIterature on query generation from a labelled document term matrix

Bakaburg

2021年5月18日 17:31

I have a labelled dataset of relevant and non-relevant documents for which I built a boolean document term matrix. I am trying to develop an algorithm which given this input would create a text-based boolean search rule which identifies a subset of the data favouring first of all sensitivity and then specificity. I'd like to know published literature on the topic. I made some initial search but couldn't find anything related. I'd be glad if you can point me to …

Topic: document-term-matrix text-filter nlp

Category: Data Science

How does GlobalMaxPooling work on the output of Conv1D?

MSKL

2021年2月8日 11:52

In the field of text classification, it is common to use Conv1D filters running over word embeddings and then getting a single value on the output for each filter using GlobalMaxPooling1D. As I understand the process, the convolutional filter is a matrix of the same size as the $$\text{size of filter matrix} = \text{embedding dim}\cdot\text{width of the filter}$$ The filter matrix is then applied to the input embeddings (multiplied element by element) which produces a matrix of the same size …

Topic: text-filter cnn keras nlp

Category: Data Science

Python library to detect a bank/financial institution name in a string

Quantum Dreamer

2021年1月21日 01:05

I would like to extract bank names from a given text like wells Fargo, chase....is there a python library for this? I know there is entity tagger in space and flair but they only identify the entity (org/person)

Topic: text-filter text-mining

Category: Data Science

How to apply multiple filter in Data Frame?

Vikas Ukani

2020年9月16日 11:37

How to implement multiple filters for checking data cell in a range ? Suppose, I have a list of numbers like, range_1 = [ 70 ,15,5,7,3,7,8,3,2, 63 ] # and range_1 = [ 50, 56, 80, 61, 83, 87, 13, 58, 43, 24, 84, 54, 64,36, 48 ] And I want to check any column values exist within these two lists. Any suggestion would be appreciated

Topic: data-science-model text-filter pandas feature-selection data-cleaning

Category: Data Science

Is there a process flow to follow for text analytics?

Minu

2020年8月7日 07:24

I am trying to draw a process flow (like a template) to be followed while on text analysis projects. So far, I've come up with this. Text Analytics Steps Data Collection Acquire data Convert data into plain text Remove Duplicate Entries Text Parsing and Extracting Features Tokenization Parsing Remove HTML characters Decode complex symbols to UTF-8 Spell check Apostrophe look-up Remove punctuation marks Remove expressions / emojis Split attached words Slangs look-up Remove URLs Lemmatization / Stemming (Normalization of Tokens) …

Topic: text-filter text-mining nlp

Category: Data Science

Method to assess text credibility

lordy

2020年1月29日 10:00

I am searching for an automated method (ideally a python package) that produces a score to assess the credibility of a given text (e.g. from a webpage). I am not searching for: text complexity assessments (i.e. how long sentences are and how many difficult words are used) as for example flesch reading ease, smog index, flesch kincaid grade, coleman liau index, automated readability index, dale chall readability score, difficult words index, linsear write formula, or gunning fog. text coherence (i.e. …

Topic: text-filter nlp python

Category: Data Science

Tokenize text with both American and English words

user3259111

2019年12月3日 09:00

I need to tokenize a corpus of abstracts from an international conference. The abstracts are usually American English but sometimes British English. Consequently, I get 2 tokens for “organization” and “organisation” or “color” and “colour”. Examples : https://en.oxforddictionaries.com/spelling/british-and-spelling Do you know a (python) library converting “British English” to “American English” (or vis versa) ? I would be happy to that ... (but I am french and my english is not soo good) Thanks.

Topic: text-filter nltk text-mining

Category: Data Science

Identifying specific words in text

william

2019年6月3日 17:37

Let's say I have the following text" Is that another kitten playing in the shoes in the top right? I would like my code to extract kitten from that text. Is there any list of animals names readily available?

Topic: text-filter text text-mining

Category: Data Science

Check similarity of table/csv of Product Names

Lisa Anna

2017年2月12日 11:27

We've got a list of approximately 18,000 product names (they're from 80-90 sources, so quite a few that are similar but not duplicates - these were picked as DISTINCT from a table) unfortunately there are different ways of expressing these names. We have to try and normalize the dataset so we present our users with more meaningful names. For example, a list like this: Canon EOS 5D Mark III Canon EOS 5D mk III Canon EOS 5DMK3 Canon EF 70-200mm …

Topic: text-filter fuzzy-logic similarity

Category: Data Science

About