rq lock contention due to commit af7f588d8f73

* rq lock contention due to commit af7f588d8f73
@ 2023-03-27  8:05 Aaron Lu
  2023-03-27  9:09 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Aaron Lu @ 2023-03-27  8:05 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Peter Zijlstra, linux-kernel

Hi Mathieu,

I was doing some optimization work[1] for kernel scheduler using a
database workload: sysbench+postgres and before I submit my work, I
rebased my patch on top of latest v6.3-rc kernels to see if everything
still works expected and then I found rq's lock became very heavily
contended as compared to v6.2 based kernels.

Using the above mentioned workload, before commit af7f588d8f73("sched:
Introduce per-memory-map concurrency ID"), the profile looked like:

     7.30%     0.71%  [kernel.vmlinux]            [k] __schedule
     0.03%     0.03%  [kernel.vmlinux]            [k] native_queued_spin_lock_slowpath

After that commit:

    49.01%     0.87%  [kernel.vmlinux]            [k] __schedule
    43.20%    43.18%  [kernel.vmlinux]            [k] native_queued_spin_lock_slowpath

The above profile was captured with sysbench's nr_threads set to 56; if
I used more thread number, the contention would be more severe on that
2sockets/112core/224cpu Intel Sapphire Rapids server.

The docker image I used to do optimization work is not available outside
but I managed to reproduce this problem using only publicaly available
stuffs, here it goes:
1 docker pull postgres
2 sudo docker run --rm --name postgres-instance -e POSTGRES_PASSWORD=mypass -e POSTGRES_USER=sbtest -d postgres -c shared_buffers=80MB -c max_connections=250
3 go inside the container
  sudo docker exec -it $the_just_started_container_id bash
4 install sysbench inside container
  sudo apt update and sudo apt install sysbench
5 prepare
  root@container:/# sysbench --db-driver=pgsql --pgsql-user=sbtest --pgsql_password=mypass --pgsql-db=sbtest --pgsql-port=5432 --tables=16 --table-size=10000 --threads=56 --time=60 --report-interval=2 /usr/share/sysbench/oltp_read_only.lua prepare
6 run
  root@container:/# sysbench --db-driver=pgsql --pgsql-user=sbtest --pgsql_password=mypass --pgsql-db=sbtest --pgsql-port=5432 --tables=16 --table-size=10000 --threads=56 --time=60 --report-interval=2 /usr/share/sysbench/oltp_read_only.lua run

Let it warm up a little bit and after 10-20s you can do profile and see
the increased rq lock contention. You may need a machine that has at
least 56 cpus to see this, I didn't try on other machines.

Feel free to let me know if you need any other info.

[1]: https://lore.kernel.org/lkml/20230327053955.GA570404@ziqianlu-desk2/

Best wishes,
Aaron

^ permalink raw reply	[flat|nested] 14+ messages in thread