Time Complexity notation in Big Data platforms
I am redesigning some of the classical algorithms for Hadoop/MapReduce framework. I was wondering if there any established approach for denoting Big(O) kind of expressions to measure time complexity?
For example, hypothetically, a simple average calculation of n (=1 billion) numbers is O(n) + C operation using simple for loop, or O(log) I am assuming division to be a constant time operation for the sake for simplicity. If i break this massively parallelizable algorithm for MapReduce, by dividing data over k nodes, my time complexity would simply become O(n/k) + C + C'. Here, C' can be assumed as the job planning time overhead. Note that there was no shuffling involved, and reducer's job was nearly trivial.
I am interested in a more complete analysis of algorithm with iterative loops over data and involve heavy shuffling and reducer operations. I want to incorporate, if possible, the I/O operations and network transfers of data.
Topic map-reduce algorithms bigdata
Category Data Science