Methodology for parallelising linked data?
If I have some form of data that can have inherent links to all other data in the set but I wish to parallelise out this data in order to increase computation time or to reduce the size of any particular piece of data currently being worked on, is there a methodology to split this out into chunks without reducing the validity of the data?
For example, assume I have a grid of crime across the whole of a country. I wish to treat this grid as a heat map of crime and therefore "smear" the heat from a crime to nearby points.
However, the time to calculate this is too long if I try to do it across the whole of the country. Therefore, I want to split this grid out into manageable chunks.
But if I do this then high crime areas on the edges of the chunks on this grid will not be "smeared" into nearby areas. I don't want to lose this data validity.
What, if any, is the methodology to solve this linking of data in parallelisation?
Topic methodology optimization parallel
Category Data Science