Could you generate search queries to poison data analysis by a search engine?

A simple problem with search engines is that you have to trust that they will not build a profile of search queries you submit. (Without Tor or e.g. homomorphic encryption, that is.)

Suppose we put together a search engine server with a use policy that permits constant queries being sent by paid customers.

The search engine's client transmits, at some frequency, generated search queries (e.g. markov, ML-generated, random dictionary words, sourced from news, whatever; up to you) in order to intentionally obscure the real search queries performed by customers. In other words it pretends to be a thousand contradictory personalities, nationalities, genders, races, hobbies, etc.

How difficult would it be to generate enough queries to hide yourself in the data?

Topic adversarial-ml search-engine

Category Data Science


I am not sure how many queries you'd need to perform to drown out your actual search queries, but there is already is an actual browser addon which does this. This addon is called TrackMeNot and is available to install for both Google Chrome and Firefox. More in-depth information on how this addon works can be found on their website and the whitepaper (section 3), but in short it create a dynamic list of queries based on popular search terms.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.