[PATCH v3 0/4] kernel/smp.c: add more CSD lock debugging

* [PATCH v3 0/4] kernel/smp.c: add more CSD lock debugging
@ 2021-03-02  6:28 Juergen Gross
  2021-03-02  6:28 ` [PATCH v3 1/4] kernel/smp: add boot parameter for controlling " Juergen Gross
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Juergen Gross @ 2021-03-02  6:28 UTC (permalink / raw)
  To: linux-kernel, linux-doc
  Cc: paulmck, mhocko, peterz, Juergen Gross, Jonathan Corbet

This patch series was created to help catching a rather long standing
problem with smp_call_function_any() and friends.

Very rarely a remote cpu seems not to execute a queued function and
the cpu queueing that function request will wait forever for the
CSD lock to be released by the remote cpu.

This problem has been observed primarily when running as a guest on
top of KVM or Xen, but there are reports of the same pattern for the
bare metal case, too. It seems to exist since about 2 years now, and
there is not much data available.

What is known up to now is that resending an IPI to the remote cpu is
helping.

The patches are adding more debug data being printed in a hang
situation using a kernel with CONFIG_CSD_LOCK_WAIT_DEBUG configured.
Additionally the debug coding can be controlled via a new parameter
in order to make it easier to use such a kernel in a production
environment without too much negative performance impact. Per default
the debugging additions will be switched off and they can be activated
via the new boot parameter:

csdlock_debug=1 will switch on the basic debugging and IPI resend
csdlock_debug=ext will add additional data printed out in a hang
  situation, but this option will have a larger impact on performance.

I hope that the "ext" setting will help to find the root cause of the
problem.

Juergen Gross (4):
  kernel/smp: add boot parameter for controlling CSD lock debugging
  kernel/smp: prepare more CSD lock debugging
  kernel/smp: add more data to CSD lock debugging
  kernel/smp: fix flush_smp_call_function_queue() cpu offline detection

 .../admin-guide/kernel-parameters.txt         |  10 +
 kernel/smp.c                                  | 280 +++++++++++++++++-
 2 files changed, 277 insertions(+), 13 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 13+ messages in thread