Are social network analysis and graph analytics the same thing? If not, what are the differences? Is social network analysis perhaps a subset of graph analytics? Are they just modern extensions of graph theory which have become relevant due to modern types of data and data analysis? What role does software like NetworkX and neo4j play?
I have set of bibliometrics data (references). I want to extract the author names, title and the name of the conference/journal from it. Since the referencing style used by different papers vary, I am interested in knowing if there are any per-existing tools to do it? I am happy to provide examples if needed :)
I'm trying to build a simplified version of existing software that do social media analysis, such as Brandwatch. I've seen Mining the Social Web by Matthew Russell being mentioned as a starting point. But this book (and generally what I've seen online) teaches how to get the data from various different platforms. Question: Are there any platforms/tools/services that provide aggregated data in the form of a Web API? Note: complete beginner
I am working on a social network analysis project. My data comes from twitter. Before I run the analysis, I intend to apply clustering- specifically Kmeans to determine how to seperate tweets in categories. I vectorized my data using the following code: vectorizer3 = TfidfVectorizer(stop_words = stop_words, tokenizer = tokenize, max_features = 1000) X3=vectorizer3.fit_transform(df_connections['text'].values.astype('str')) word_features3 = vectorizer3.get_feature_names_out() len(word_features3) Next, I run the following code: from sklearn.cluster import KMeans clusters = [2, 3, 4, 5, 10, 15] for i in clusters …
I have a use-case to calculate betweenness centrality of nodes. I have tried graphx with spark-betweenness but it is a very long running job. Has anyone successfully calculated betweenness centrality of a large network with around 10 million vertices and 100 million edges?
I'm trying to analyze a behavior called "sentiment flipping" of users in a dataset, but I'm not able to step on. Let's suppose that I have two groups of users, say them good and bad users. My dataset contains N tweets that classified into 6 topics. The tweets were created by the bad and good users. The 6 topics are about general issues, but 3 of these topics are about organization/individuals supported (A) by the "bad" users and the other …
I'm currently building social networks for small colonies of animals which I've observed, with the aim of comparing changes in social network structure in response to changes in certain environmental variables. Individuals in these colonies undergo dyadic dominance interactions in which one individual attempts to assert dominance over another. The result of a dominance interaction can be either win/loss (i.e. one individual successfully dominates another) or it can be a tie (neither individual successfully asserts dominance). I want the nodes …
Regarding social network graphs, let us say that I am connected to 10 people, and that each of them are connected to 10 people. At one extreme this means that I have 100 unique $2^{nd}$ degree connections. However it is highly likely that in a real social network many of the connections of my first degree connections are following me back and following one another and following the same people outside of my direct connections. At the other extreme, if …
I learned how to use libpgm in general for Bayesian inference and learning, but I do not understand if I can use it for learning with hidden variable. More precisely, I am trying to implement approach for Social Network Analysing from this paper: Modeling Relationship Strength in Online Social Networks. They suggest to use following architecture Here S(ij) represents vector of similarity between user i and j - Observed z(ij) is a hidden variable - relationship strength (Normal distribution regularised …
I barely know about Data Analysis tools and techniques, so bare with me if I'm asking something too trivial. I'm looking for a Sentiment Analysis tool to process comments in Spanish. I do know some options for Sentiment analysis but those all work for English. Is there a model/tool that already works with Spanish? I'm language agnostic so it does not matter if it's a Java, Python or even Go code.
I want to create a weighted random graph (in contrast to the unweighted Erdős–Rényi model). I have a list of weights (derived from a real-world network, very skewed distribution that most weights are 1 and the rest of them are between 6-8). My plan is to first create an Erdős–Rényi model and randomly reassign the weights according to my list of weights. Does it sound like the right approach or my new network essentially lost the key characteristics of a …
I am a very beginner in data mining. I want to work on Clique based clustering method. I want to make a comparison between various datasets for social network analysis or community detection of social network analysis. Now I need more than 3 datasets and source code (Python code) to make the comparison in terms of social network analysis. The data set can be older or new that would not be any problem. But I want to work on at …
I have a graph that I created from a pandas data frame. The length of the graph is ~450k edges. When I try to run the weighted_projected_graph function, it runs for a long time (I have not seen it finish), presumably because of the size of this data set. What is a good method for reducing the size of this data set before creating the bipartite graph? I have tried narrowing it down by using the most connected components: trim1 …
I am working on a social networking application and I have to make its news feed better. For example: If someone searches for 'suggest me some good books', it should yield some names. Now, I have used the Infersent algo (to begin with) in order for my model to be able to answer questions. I am getting only the best output that my model could predict viz., 'Alchemist'. I want at least 4 or 5 other outputs, other words, the …
Note: This is an academics based problem. So in a recent in-class quiz, we were asked that if we have an input layer consisting of 20 nodes along with 2 hidden layers (one of size 10 and the other of size 5), what will the total number of parameters in this network? How can we compute this? Additionally, how do we know what shapes are they weights of? How can we determine which activation functions are suitable for such a …
I recently discovered a new R package for connecting to the LinkedIn API. Unfortunately the LinkedIn API seems pretty limited to begin with; for example, you can only get basic data on companies, and this is detached from data on individuals. I'd like to get data on all employees of a given company, which you can do manually on the site but is not possible through the API. import.io would be perfect if it recognised the LinkedIn pagination (see end …
I am learning about social media analysis. I am aware that we can extract the data from twitter using hashtags and API. Ex; If I use #covid19, I will get all tweets that contain this hashtag for the duration that I specify. Similarly, let's say I visit a public Facebook page https://www.facebook.com/A123 Will I be able to extract only the posts which contain the term covid in their text (from all of their posts) using my API credentials? Is it …
I'm training an SVM model for sentiment analysis, based on social media data eg. tweets. The model will be trained using a small selection of a particular company's tweets in order to classify new ones. However, since the training set is too small to get an accurate model I will be combining the company's data with a much larger general tweets dataset to train the model. Being specialised to one company, the content of the respective data is slightly different …
I have a giant data set from a local elections, which contains hashtags, emojis, and comments. I wanted to make a network analysis using only emojis. So far I have a network analysis graph made in R which looks like this: Sorry, you may have to zoom in to see the nodes. So, basically my goal is to see what people are talking about as a whole group. Currently there are lot of nodes which don't really say anything concrete …
I have a twitter data-set and I wanna extract their related topics. So, I decided to classify my Tweets into clusters using an unsupervised machine learning algorithm like k-means. This choice is made due the time consuming of the training process in the supervised approaches. So, as a first step after cleaning my tweets, I will extract features (eg. Hashtags...) from them, and enrich them with side information from knowledge bases (eg. Wikipedia). Secondly, they will be represented in a …