From Paddle v1 implementation, I see https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pserver/ParameterServer2.cpp#L1228
here have operations to merge the cost sent from every trainer, but not sure why we need this and do we also have to implement this in current fluid distributed training framework?