Suggestions for improvement? Time series of variation in relative frequency of emotion-related words in academic psychology over time

First time plotting and interpreting time series data and I have used a line plot for ease of use. I am aware this is incredibly basic, but any input/ recommendations would be much appreciated (e.g., is anything unclear?).

  • My main concern is whether I have adequately displayed the data and

    whether I can do anything useful to improve (e.g., moving average)?
  • Additionally, whether I have interpreted this time series data appropriately:

The relative frequency of affect-related tokens (counts per 10,000 tokens in psychology abstracts) increased from 3.51% in 1980 to 4.87% in 2017—-an overall relative increase of 39%. The relative frequency of affect-related tokens shows an increase at a rate of approximately 0.037 units per year (over 37 years). Overall, this displays a rapid growth trend in academics' use of emotion-related terms in psychology abstracts over time.

Topic corpus interpretation data time-series

Category Data Science


I am aware this is incredibly basic, but any input/ recommendations would be much appreciated (e.g., is anything unclear?).

In general it is strongly recommended to communicate knowledge in the most simple way possible, as long as it's accurate. Yes a line plot is simple but there's nothing wrong with that, in this case I can't think of any better way to convey the observation accurately.

My main concern is whether I have adequately displayed the data and whether I can do anything useful to improve (e.g., moving average)?

It perfectly shows the increasing trend which is your main point:

  • A moving average would not really help in this case: it's useful only when there's too much noise/variation in the data and the general trend is hard to see. Here you apparently have enough data and/or the data is stable enough, so there's simply no need.
  • You took care of making the plotted value a proportion per 10,000 tokens, and this is also a good idea since an increase in absolute value could be biased: if the total number of abstracts increases, the proportion of abstracts containing the target terms would naturally increase as well even though this doesn't show anything.

Overall this looks very good to me. Just a small note, in your description you could mention that the rate by year is an average: "approximately 0.037 units per year in average".

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.