Correlation Wikipedia translated pages vs number of in links is weird (scatterplot)?

I'm trying to find a correlation measure for the number of Wikipedia pages an entity (an article) has been translated to vs number of links that point to that page (both measures that can point to the popularity of a page).

For instance I have

Work, links, wikipediaTranslatedPages
The name of the rose, 500, 53

I used a scatterplot but it's weird. Is it wrong?

Topic wikipedia data-science-model correlation dataset python

Category Data Science


I can't say if your scatterplot is correct or not, because I don't know your dataset. I suppose that the point with total = 1.800 and numWikipediaLanguages = 53 is an outlier. So, you can try to delete it and replot the graph.

Another test that you could try is to add a feature called "subject" and divide your data (i.e.: subject -> "history", "math", "science" and so on). Follow youe example:

Work, links, wikipediaTranslatedPages, Subject
The name of the rose, 500, 53, Literature

In this way you can see if there is a particular class of items (subject) that stands out from the others. But I don't know your data or your problem and if you have the possibility to add a feature.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.