How do work around Kmeans value error?

Question

How do work around Kmeans value error?

Falah Amro

2022年3月15日 22:42

I am working on a social network analysis project. My data comes from twitter. Before I run the analysis, I intend to apply clustering- specifically Kmeans to determine how to seperate tweets in categories.

I vectorized my data using the following code:

vectorizer3 = TfidfVectorizer(stop_words = stop_words, tokenizer = tokenize, max_features = 1000)


X3=vectorizer3.fit_transform(df_connections['text'].values.astype('str'))

word_features3 = vectorizer3.get_feature_names_out()

len(word_features3)

Next, I run the following code:

from sklearn.cluster import KMeans


clusters = [2, 3, 4, 5, 10, 15]

for i in clusters :
    kmeans = KMeans(n_clusters = i, n_init = 5, random_state = 42)
    kmeans.fit(i)
    print(Number of clusters:, i)
    common_words = kmeans.cluster_centers_.argsort()[:,-1:-11:-1]
    for num, centroid in enumerate(common_words):
        print(str(num) + ' : ' + ', '.join(word_features3[word] for word in centroid))
    new_col_name = str(i) + _clusters
    tweets[new_col_name] = kmeans.labels_

I receive the following error message:

ValueError: Expected 2D array, got scalar array instead:
array=2.0.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Thank you.

Topic context-vector twitter social-network-analysis k-means clustering

Category Data Science

How do work around Kmeans value error?

About