autoNUMA web workload regression

* autoNUMA web workload regression
@ 2015-05-06 10:35 Artem Bityutskiy
  2015-05-06 10:37 ` Bityutskiy, Artem
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Artem Bityutskiy @ 2015-05-06 10:35 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel

Hi Rik,

we observe a tremendous regression between kernel version 3.16 and 3.17
(and up), and I've bisected it to this commit:

a43455a sched/numa: Ensure task_numa_migrate() checks the preferred node

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a43455a1d572daf7b730fe12eb747d1e17411365

We run a Web server (nginx) on a 2-socket Haswell server and we emulate
an e-Commerce Web-site. Clients send requests to the server and measure
the response time. Clients load the server quite heavily - CPU
utilization is more than 90% as measured with turbostat. We use Fedora
20.

If I take 3.17 and revert this patch, I observe 600% or more average
response time improvement comparing to vanilla 3.17.

If I take 4.1-rc1 and revert this patch, I observe 300% or more average
response time improvement comparing to vanilla 3.17.

I asked Fengguang Wu to run LKP workloads on multiple 4 and 8 socket
machines for v4.1-rc1 with and without this patch, and there seem to be
no difference - all the micro-benchmarks performed similarly and the
difference were withing the error range.

IOW, it looks like this patch has bad effect on Web server QoS (slower
response time). What do you think?

Thank you!

^ permalink raw reply	[flat|nested] 21+ messages in thread