text-classification

Text2Slide multiclass classification

BaseAccount

2022年6月1日 02:55

I am considering an idea of stitching together a slide deck based on text input, e.g. given: An all-hands presentation with business updates, project timelines, and financial report charts the output could be a deck with slides corresponding to Title, List, Calendar, Pie Chart, Conclusion. I have preexisting slides that are mostly categorized by the "form" ranging from very general like List to more specific like Decision Tree or Venn Diagram. Am I on the right track that this sounds …

Topic: text-classification multiclass-classification nlp machine-learning

Category: Data Science

How to train a model to predict if 2 samples refer to the same thing?

Martin

2022年5月30日 14:04

I have 2 ddbb with around 60,000 samples each. Both have the same features (same column names) that represent particular things with text or categories (turned into numbers). Each sample in a ddbb is assumed to refer to a different particular thing. But there are some objects that are represented in both ddbb, yet with somewhat different values in the same-name column (like different open descriptions, or classified as another category). The aim is to train a machine learning model …

Topic: automl text-classification feature-engineering supervised-learning

Category: Data Science

TF Keras Text Processing - Classification Model

Peter

2022年5月25日 22:01

I'm trying to put together a script that classifies comments into either adequate or inadequate. I put a question up here earlier with all my code, but I think I've isolated the problem down into the setup of the model, so I deleted that one, and hopefully this is more streamlined and easy to follow. The example i'm trying to follow is the classic IMDB comment, where the comments are either positive or negative, but again in my instance, adequate …

Topic: text-classification keras tensorflow predictive-modeling

Category: Data Science

Is there anyway to classify the category on give amazon reviews using python

Sanjay Chintha

2022年5月23日 21:49

I am trying to find a model or way to classify text which falls into a category and its a positive or negative feedback. For ex. we have three columns Review : Camera's not good battery backup is not very good. Ok ok product camera's not very good and battery backup is not very good. Rating : 2 Topic :['Camera (Neutral)', 'Battery (Neutral)'] My Whole Dataset is like above and Topic is not standard one , Topic value is based …

Topic: text-classification sentiment-analysis python

Category: Data Science

How to use scikit-learn to extract features from text when I only have positive and unlabeled data?

rbaehr

2022年5月21日 08:03

I'm looking for something similar to this https://scikit-learn.org/stable/auto_examples/text/plot_document_classification_20newsgroups.html#sphx-glr-auto-examples-text-plot-document-classification-20newsgroups-py But instead of positive and negative examples, I have positive examples and a bunch of unlabeled data that will contain some positive examples but is mostly negative. I'm planning on using this in a pipeline to transform text data into a vector, then feeding it into a classifier using https://pulearn.github.io/pulearn/doc/pulearn/ The issue is I'm not sure the best way to build the preprocessing stage where I transform the raw text data into …

Topic: bag-of-words text-classification scikit-learn feature-selection clustering

Category: Data Science

Using a fine-tuned model for a different dataset

Alex

2022年5月20日 20:37

I have a dataset of different sentences from news articles which I need to classify by their sentiment. For that goal I'm planning to use a fine-tuned model which was fine-tuned on different datasets, for example various comments from forums, reviews, tweets. However, news articles are supposedly quite different from that dataset as they are usually more neutral. I understand that a correct way to approach this issue would be by training a model on my own labeled dataset, however …

Topic: text-classification machine-learning-model deep-learning sentiment-analysis classification

Category: Data Science

Interpreting confidence interval results for datasets

dmnte

2022年5月20日 01:02

I have created a dataset automatically and wanted to clarify my interpretation of the amount of noise using the confidence interval. I selected a random sample and manually annotated the sample and found that 98% of the labels were correct. Based on these values I then calculated the confidence interval at 99% which gave a lower bound of 0.9614 and upper bound of 0.9949. Does this mean that the noise in the overall dataset is between the lower and upper …

Topic: confidence text-classification dataset statistics

Category: Data Science

Contextual word embeddings from pretrained word2vec vectors

amks1212

2022年5月17日 18:00

I would like to create word embeddings that take context into account, so the vector of the word Jaguar [animal] would be different from the word Jaguar [car brand]. As you know, word2vec only gives one representation for a given word, and I would like to take already pretrained embeddings and enrich them with context. So far I've tried a simple way with taking an average vector of the word and category word, for example like this. Now I would …

Topic: text-classification word-embeddings deep-learning neural-network nlp

Category: Data Science

Classification using texts as features

sgduran91

2022年5月14日 17:04

I want to build a classification model to match customers and products. I have a description of each product, and a description of each customer, and the label : customer *i* buy/did not buy product *j*. Each sample/row is a pair (customer, product), so Feature 1 is customer's description, Feature 2 is product's description, and the target variable y is: "y = 1 : customer buys product", "y = 0 otherwise". The goal is to predict for new arriving products …

Topic: text-classification tfidf scikit-learn nlp machine-learning

Category: Data Science

Optimal input setup for character-level text classification RNN

Dave White

2022年5月14日 05:05

I want to classify 500-character long text samples as to whether they look like natural language using a character-level RNN. I'm unsure as to the best way to feed the input to the RNN. Here are two approaches I've thought of: Provide the whole 500 characters (one per time step) to the RNN, and predict a binary class, $\{0,1\}$. Provide shorter overlapping segments (e.g. 10 characters) and predict the next (e.g. 11th) character. Convert this to classification by taking the …

Topic: text-classification rnn neural-network language-model nlp

Category: Data Science

Can i use Transformer-XL for text classification task?

Dat Le

2022年5月12日 10:12

I want to use transformer xl for text classification tasks. But I don't know the architect model for the text classification task. I use dense layers with activation softmax for logits output from the transformer xl model, but this doesn't seem right. when training I see accuracy is very low. Output of my model: My training step:

Topic: transformer text-classification tensorflow deep-learning nlp

Category: Data Science

How to use text classification where the training source are txt files in categorized folders?

celsowm

2022年5月10日 18:04

I have 200 *.txt unique files for each folder: Each file is a lawsuit initial text separated by legal areas (folders) of public advocacy. I would like to create training data to predict new lawsuits by their legal area. Last year, I have tried using PHP-ML, but it consumes too much memory, so I would like to migrate to Python. I started the code, loading each text file in a json-alike structure, but I don't know the next steps: import …

Topic: text-classification python machine-learning

Category: Data Science

Binary document classification using keywords for a very small dataset

s21

2022年5月9日 14:00

I have a set of 150 documents with their assigned binary class. I also have 1000 unlabeled documents. Each document is about the length of a journal paper. Each class has 15 associated keywords. I want to be able to predict the assigned class of the documents using this information. Does anyone have any ideas of how I could approach this problem?

Topic: binary-classification text-classification classification nlp machine-learning

Category: Data Science

Ideal Windows Size in Pk Evaluation Metric

Maak

2022年5月3日 12:56

I am very new to nlp. I am doing a text segmentation task and for evaluating my model I need to calculate Pk and Windiff scores. My question is what is the ideal value for window size (k) for Pk score because different window sizes give different results. I am using this function nltk.metrics.segmentation.pk. Thanks.

Topic: text-classification automatic-summarization evaluation nlp machine-learning

Category: Data Science

Suggestions for a multi-class text classification model with a large number of classes?

godzilla

2022年5月2日 14:44

I was working on a text classification problem where I currently have around 40-45 different labels. The input is a text sentence with a keyword. For e.g. This phone is the most durable in the market is the input sentence and the out label is X and all the words in the output with label X will have durable as a keyword. What would be a good model to fit this? I tried basic SVM, Random Forest but to no …

Topic: text-classification multiclass-classification nlp

Category: Data Science

Language Detection using pycld2

natt010

2022年5月2日 08:43

I am trying to use the pycld2 package to detect multiple languages in text. This package provides Python bindings for the Compact Language Detect 2 (CLD2) This is the example I am testing out: import pycld2 as cld2 text = '''The universal connection with an additional advantage: Push-in connection. Terminate solid and stranded (Class B 7 strands or less), as well as ferruled conductors, by simply pushing them in – no tools required. La connessione universale con un ulteriore vantaggio: …

Topic: text-classification multiclass-classification naive-bayes-classifier nlp machine-learning

Category: Data Science

Naive Bayes TfidfVectorizer predicts everything to one class

Justas Vasiljevas

2022年5月1日 14:47

I'm trying to run Multinomial Bayes classificator on various balanced data sets and comparing 2 different vectorizers: TfidfVectorizer and CountVectorizer. I have 3 classes: NEG, NEU and POS. I have 10000 documents. NEG class has 2474, NEU 5894 and POS 1632. Out of that I have made 3 differently balanced data sets like this: text counts: NEU NEG POS Total number NEU balance dataset 5894 2474 1632 10000 NEG balance dataset 2474 2474 1632 6580 POS balance dataset 1632 1632 …

Topic: text-classification tfidf naive-bayes-classifier classification python

Category: Data Science

sentence type classification

mohsen_m

2022年4月30日 10:37

I want to classify the sentences in my dataset as declarative, interrogative, imperative and exclamative. Although It can be classified with respect to punctuation marks such as ?, ! and . but there are many cases and situations that these rules can fail. In NLP area, is there any model or solution that can be applied to reach the mentioned goal?

Topic: text-classification deep-learning classification nlp

Category: Data Science

How can I implement text classification for this problem?

Usama

2022年4月27日 13:17

Given a collection of documents - each corresponding to some economic entity - I am looking to extract information and populate a table with predetermined headings. I have a small sample of this already done by humans and I was wondering if there's an efficient way to automatise it. Grateful for any suggestions.

Topic: text-classification topic-model

Category: Data Science

How to provide Intentional Bias towards recent examples in Text Classification?

Himanshu Tanwani

2022年4月27日 12:32

I have trained an XGBClassifier to classify text issues to a rightful assignee (simple 50-way classification). The source from where I am fetching the data also provides a datetime object which gives us the timestamp at which the issue was created. Logically, the person who has recently worked on an issue (say 2 weeks ago) should be a better suggestion instead of (another) person who has worked on similar issue 2 years ago. That is, if there two examples from …

Topic: bias text-classification xgboost preprocessing classification

Category: Data Science

About