Social Media Analysis (Brandwatch, TalkWalker etc.)

I'm trying to build a simplified version of existing software that do social media analysis, such as Brandwatch. I've seen Mining the Social Web by Matthew Russell being mentioned as a starting point. But this book (and generally what I've seen online) teaches how to get the data from various different platforms. Question: Are there any platforms/tools/services that provide aggregated data in the form of a Web API? Note: complete beginner
Category: Data Science

How do work around Kmeans value error?

I am working on a social network analysis project. My data comes from twitter. Before I run the analysis, I intend to apply clustering- specifically Kmeans to determine how to seperate tweets in categories. I vectorized my data using the following code: vectorizer3 = TfidfVectorizer(stop_words = stop_words, tokenizer = tokenize, max_features = 1000) X3=vectorizer3.fit_transform(df_connections['text'].values.astype('str')) word_features3 = vectorizer3.get_feature_names_out() len(word_features3) Next, I run the following code: from sklearn.cluster import KMeans clusters = [2, 3, 4, 5, 10, 15] for i in clusters …
Category: Data Science

measuring flip-flop behaviour across several topics

I'm trying to analyze a behavior called "sentiment flipping" of users in a dataset, but I'm not able to step on. Let's suppose that I have two groups of users, say them good and bad users. My dataset contains N tweets that classified into 6 topics. The tweets were created by the bad and good users. The 6 topics are about general issues, but 3 of these topics are about organization/individuals supported (A) by the "bad" users and the other …
Category: Data Science

Is there a way to combine both ties (nondirected edges) and wins/losses (directed edges) in a single social network?

I'm currently building social networks for small colonies of animals which I've observed, with the aim of comparing changes in social network structure in response to changes in certain environmental variables. Individuals in these colonies undergo dyadic dominance interactions in which one individual attempts to assert dominance over another. The result of a dominance interaction can be either win/loss (i.e. one individual successfully dominates another) or it can be a tie (neither individual successfully asserts dominance). I want the nodes …
Category: Data Science

Is there a metric for "cliquiness" for social graphs?

Regarding social network graphs, let us say that I am connected to 10 people, and that each of them are connected to 10 people. At one extreme this means that I have 100 unique $2^{nd}$ degree connections. However it is highly likely that in a real social network many of the connections of my first degree connections are following me back and following one another and following the same people outside of my direct connections. At the other extreme, if …
Category: Data Science

How to do hidden variable learning in Bayesian Network with Python?

I learned how to use libpgm in general for Bayesian inference and learning, but I do not understand if I can use it for learning with hidden variable. More precisely, I am trying to implement approach for Social Network Analysing from this paper: Modeling Relationship Strength in Online Social Networks. They suggest to use following architecture Here S(ij) represents vector of similarity between user i and j - Observed z(ij) is a hidden variable - relationship strength (Normal distribution regularised …
Category: Data Science

Sentiment Analysis model for Spanish

I barely know about Data Analysis tools and techniques, so bare with me if I'm asking something too trivial. I'm looking for a Sentiment Analysis tool to process comments in Spanish. I do know some options for Sentiment analysis but those all work for English. Is there a model/tool that already works with Spanish? I'm language agnostic so it does not matter if it's a Java, Python or even Go code.
Category: Data Science

Constructing a Weighted Random Graph

I want to create a weighted random graph (in contrast to the unweighted Erdős–Rényi model). I have a list of weights (derived from a real-world network, very skewed distribution that most weights are 1 and the rest of them are between 6-8). My plan is to first create an Erdős–Rényi model and randomly reassign the weights according to my list of weights. Does it sound like the right approach or my new network essentially lost the key characteristics of a …
Category: Data Science

Data mining: Clique based clustering to make comparison in social network analysis

I am a very beginner in data mining. I want to work on Clique based clustering method. I want to make a comparison between various datasets for social network analysis or community detection of social network analysis. Now I need more than 3 datasets and source code (Python code) to make the comparison in terms of social network analysis. The data set can be older or new that would not be any problem. But I want to work on at …
Category: Data Science

Reduce size of a network graph for bipartite projection

I have a graph that I created from a pandas data frame. The length of the graph is ~450k edges. When I try to run the weighted_projected_graph function, it runs for a long time (I have not seen it finish), presumably because of the size of this data set. What is a good method for reducing the size of this data set before creating the bipartite graph? I have tried narrowing it down by using the most connected components: trim1 …
Category: Data Science

NLP: Getting the top 5 or top 10 predictions

I am working on a social networking application and I have to make its news feed better. For example: If someone searches for 'suggest me some good books', it should yield some names. Now, I have used the Infersent algo (to begin with) in order for my model to be able to answer questions. I am getting only the best output that my model could predict viz., 'Alchemist'. I want at least 4 or 5 other outputs, other words, the …
Category: Data Science

Counting Number of Parameters in Neural Networks

Note: This is an academics based problem. So in a recent in-class quiz, we were asked that if we have an input layer consisting of 20 nodes along with 2 hidden layers (one of size 10 and the other of size 5), what will the total number of parameters in this network? How can we compute this? Additionally, how do we know what shapes are they weights of? How can we determine which activation functions are suitable for such a …
Category: Data Science

LinkedIn web scraping

I recently discovered a new R package for connecting to the LinkedIn API. Unfortunately the LinkedIn API seems pretty limited to begin with; for example, you can only get basic data on companies, and this is detached from data on individuals. I'd like to get data on all employees of a given company, which you can do manually on the site but is not possible through the API. import.io would be perfect if it recognised the LinkedIn pagination (see end …
Category: Data Science

Extract data from facebook

I am learning about social media analysis. I am aware that we can extract the data from twitter using hashtags and API. Ex; If I use #covid19, I will get all tweets that contain this hashtag for the duration that I specify. Similarly, let's say I visit a public Facebook page https://www.facebook.com/A123 Will I be able to extract only the posts which contain the term covid in their text (from all of their posts) using my API credentials? Is it …
Category: Data Science

Biasing SVM algorithm towards particular subset of data

I'm training an SVM model for sentiment analysis, based on social media data eg. tweets. The model will be trained using a small selection of a particular company's tweets in order to classify new ones. However, since the training set is too small to get an accurate model I will be combining the company's data with a much larger general tweets dataset to train the model. Being specialised to one company, the content of the respective data is slightly different …
Category: Data Science

Need some advice on approach to select only the informative emojis from the data set?

I have a giant data set from a local elections, which contains hashtags, emojis, and comments. I wanted to make a network analysis using only emojis. So far I have a network analysis graph made in R which looks like this: Sorry, you may have to zoom in to see the nodes. So, basically my goal is to see what people are talking about as a whole group. Currently there are lot of nodes which don't really say anything concrete …
Category: Data Science

How cluster a twitter data-set?

I have a twitter data-set and I wanna extract their related topics. So, I decided to classify my Tweets into clusters using an unsupervised machine learning algorithm like k-means. This choice is made due the time consuming of the training process in the supervised approaches. So, as a first step after cleaning my tweets, I will extract features (eg. Hashtags...) from them, and enrich them with side information from knowledge bases (eg. Wikipedia). Secondly, they will be represented in a …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.