How Google Trends is normalized?

I have a daily series from Google Trends, using the range "today 3-m", but it comes that the last day is not available from this query. For example, today is March,24th and the last day using this query is March, 22th and I expected that it would be March, 23th. If I take the series using the range "now 7-d", the day comes hourly and there is March, 23th. I would like to aggregate it and put it in the same measure as the series I obtained before. For this purpose, I need to know how the series is normalized. I understood that each time I take a series (one region and one word), it simple divided each point by the largest number in the range and multiply it by 100. Hence, the maximum point of the series is 100. Using this hypothesis, if I sum all the indexes in the second series by day, each day has the same denominator (but different from the first series). But in this case, the growth between adjacents days it has to be the same in the two series. But it is not what happened. So, I did not understand the normalization. Could anyone help me, please?

Topic normalization google

Category Data Science


First of all, are you pulling data for one query at a time? If you put more than one query the data for each query will be normlised in regards to the other queries in the request. So if you put something very specific, let's say "delirium" (a latin name of a sickness) and something very popular like "UEFA Champions League" the results for the unpopular query will be negligible.

If you are querying one search term at a time, then the data is only normalised for the period of which you are querying. So querying with different timeframes will always produce different results. If you're highest interest was 31 days ago and you query for data for the last 30 days, then there will be a new 100 in the data, which might have been within the 80-90 range in the 31 day query.

Furthermore, the way Google Trends works is by taking a random sample of all searches every day and normalising the results based on that sample. If you are investigating an unpopular term, or even a semi-popular term in a country with a small population or a small internet usage, then you will receive big variantion between each query, because the sample has changed and there is not enough data to create a representative sample every day. In other words, the total data pool is very scarce, so if you send out a random part, let's say 10%, of that already scare data, the result is going to be even scarer data...

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.