How do work around Kmeans value error?
I am working on a social network analysis project. My data comes from twitter. Before I run the analysis, I intend to apply clustering- specifically Kmeans to determine how to seperate tweets in categories.
I vectorized my data using the following code:
vectorizer3 = TfidfVectorizer(stop_words = stop_words, tokenizer = tokenize, max_features = 1000)
X3=vectorizer3.fit_transform(df_connections['text'].values.astype('str'))
word_features3 = vectorizer3.get_feature_names_out()
len(word_features3)
Next, I run the following code:
from sklearn.cluster import KMeans
clusters = [2, 3, 4, 5, 10, 15]
for i in clusters :
kmeans = KMeans(n_clusters = i, n_init = 5, random_state = 42)
kmeans.fit(i)
print(Number of clusters:, i)
common_words = kmeans.cluster_centers_.argsort()[:,-1:-11:-1]
for num, centroid in enumerate(common_words):
print(str(num) + ' : ' + ', '.join(word_features3[word] for word in centroid))
new_col_name = str(i) + _clusters
tweets[new_col_name] = kmeans.labels_
I receive the following error message:
ValueError: Expected 2D array, got scalar array instead:
array=2.0.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Thank you.
Topic context-vector twitter social-network-analysis k-means clustering
Category Data Science