Skip to content(if available)orjump to list(if available)

Show HN: I Parallelized RNN Training from O(T) to O(log T) Using CUDA

DoctorOetker

Is it not much simpler to parallelize by having different "readers" (using the same model parameters/weights) process different parts of the corpus in parallel? reader A is reading book A, while reader B is reading book B etc...?

Is there a deeper reason why more complicated parallelization as in the OP or the article it references is more desirable?