On Sun, Sep 17, 2017 at 07:16:15PM +0300, Tariq Toukan wrote: > > It's nice to have the option to dynamically play with the parameter. > But maybe we should also think of changing the default fraction guaranteed > to the PCP, so that unaware admins of networking servers would also benefit. I collected some performance data with will-it-scale/page_fault1 process mode on different machines with different pcp->batch sizes, starting from the default 31(calculated by zone_batchsize(), 31 is the standard value for any zone that has more than 1/2MiB memory), then incremented by 31 upwards till 527. PCP's upper limit is 6*batch. An image is plotted and attached: batch_full.png(full here means the number of process started equals to CPU number). >From the image: - For EX machines, they all see throughput increase with increased batch size and peaked at around batch_size=310, then fall; - For EP machines, Haswell-EP and Broadwell-EP also see throughput increase with increased batch size and peaked at batch_size=279, then fall, batch_size=310 also delivers pretty good result. Skylake-EP is quite different in that it doesn't see any obvious throughput increase after batch_size=93, though the trend is still increasing, but in a very small way and finally peaked at batch_size=403, then fall. Ivybridge EP behaves much like desktop ones. - For Desktop machines, they do not see any obvious changes with increased batch_size. So the default batch size(31) doesn't deliver good enough result, we probbaly should change the default value.