All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Frederic Weisbecker <frederic@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>, <rcu@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, Robert Elliott <elliott@hpe.com>
Subject: Re: [PATCH v6 0/2] rcu: Add RCU stall diagnosis information
Date: Thu, 10 Nov 2022 15:29:04 +0800	[thread overview]
Message-ID: <c8b7deaa-1d4f-0e73-269a-32d6105b89a7@huawei.com> (raw)
In-Reply-To: <20221109170317.GA300561@lothringen>



On 2022/11/10 1:03, Frederic Weisbecker wrote:
> On Wed, Nov 09, 2022 at 07:59:01AM -0800, Paul E. McKenney wrote:
>> On Wed, Nov 09, 2022 at 04:26:21PM +0100, Frederic Weisbecker wrote:
>>> Hi Zhen Lei,
>>>
>>> On Wed, Nov 09, 2022 at 05:37:36PM +0800, Zhen Lei wrote:
>>>> v5 --> v6:
>>>> 1. When there are more than two continuous RCU stallings, correctly handle the
>>>>    value of the second and subsequent sampling periods. Update comments and
>>>>    document.
>>>>    Thanks to Elliott, Robert for the test.
>>>> 2. Change "rcu stall" to "RCU stall".
>>>>
>>>> v4 --> v5:
>>>> 1. Resolve a git am conflict. No code change.
>>>>
>>>> v3 --> v4:
>>>> 1. Rename rcu_cpu_stall_deep_debug to rcu_cpu_stall_cputime.
>>>>
>>>> v2 --> v3:
>>>> 1. Fix the return type of kstat_cpu_irqs_sum()
>>>> 2. Add Kconfig option CONFIG_RCU_CPU_STALL_DEEP_DEBUG and boot parameter
>>>>    rcupdate.rcu_cpu_stall_deep_debug.
>>>> 3. Add comments and normalize local variable name
>>>>
>>>>
>>>> v1 --> v2:
>>>> 1. Fixed a bug in the code. If the rcu stall is detected by another CPU,
>>>>    kcpustat_this_cpu cannot be used.
>>>> @@ -451,7 +451,7 @@ static void print_cpu_stat_info(int cpu)
>>>>         if (r->gp_seq != rdp->gp_seq)
>>>>                 return;
>>>>
>>>> -       cpustat = kcpustat_this_cpu->cpustat;
>>>> +       cpustat = kcpustat_cpu(cpu).cpustat;
>>>> 2. Move the start point of statistics from rcu_stall_kick_kthreads() to
>>>>    rcu_implicit_dynticks_qs(), removing the dependency on irq_work.
>>>>
>>>> v1:
>>>> In some extreme cases, such as the I/O pressure test, the CPU usage may
>>>> be 100%, causing RCU stall. In this case, the printed information about
>>>> current is not useful. Displays the number and usage of hard interrupts,
>>>> soft interrupts, and context switches that are generated within half of
>>>> the CPU stall timeout, can help us make a general judgment. In other
>>>> cases, we can preliminarily determine whether an infinite loop occurs
>>>> when local_irq, local_bh or preempt is disabled.
>>>
>>> That looks useful but I have to ask: what does it bring that the softlockup
>>> and hardlockup watchdog can not already solve?
>>
>> This is a good point.  One possible benefit is putting the needed information
>> in one spot, for example, in cases where the soft/hard lockup timeouts are
>> significantly different than the RCU CPU stall warning timeout.
> 
> Arguably, the hardlockup/softlockup detectors usually trigger after RCU stall,
> unless all CPUs are caught into a hardlockup, in which case only the hardlockup
> detector has a chance.

But not all ARCHs support hardlockup, such as s390. Maybe arm64.

config HARDLOCKUP_DETECTOR
        bool "Detect Hard Lockups"
        depends on DEBUG_KERNEL && !S390
        depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH

> 
> Anyway I would say that in this case just lower the delay for the lockup
> detectors to consider the situation is a lockup?

In most architectures, CONFIG_SOFTLOCKUP_DETECTOR is not set by default.
Otherwise 20 is less than 21.

Softlockups are bugs that cause the kernel to loop in kernel
mode for more than 20 seconds, without giving other tasks a
chance to run.

config RCU_CPU_STALL_TIMEOUT
	default 21


In short, hardlockup and softlockup are completely uncontrollable to RCU stall.

> 
> Thanks.
> 
> 
>>
>> Thoughts?
>>
>> 							Thanx, Paul
>>
>>> Thanks.
>>>
>>>>
>>>> Zhen Lei (2):
>>>>   rcu: Add RCU stall diagnosis information
>>>>   doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information
>>>>
>>>>  Documentation/RCU/stallwarn.rst               | 88 +++++++++++++++++++
>>>>  .../admin-guide/kernel-parameters.txt         |  6 ++
>>>>  kernel/rcu/Kconfig.debug                      | 11 +++
>>>>  kernel/rcu/rcu.h                              |  1 +
>>>>  kernel/rcu/tree.c                             | 17 ++++
>>>>  kernel/rcu/tree.h                             | 19 ++++
>>>>  kernel/rcu/tree_stall.h                       | 29 ++++++
>>>>  kernel/rcu/update.c                           |  2 +
>>>>  8 files changed, 173 insertions(+)
>>>>
>>>> -- 
>>>> 2.25.1
>>>>
> .
> 

-- 
Regards,
  Zhen Lei

  parent reply	other threads:[~2022-11-10  7:29 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09  9:37 [PATCH v6 0/2] rcu: Add RCU stall diagnosis information Zhen Lei
2022-11-09  9:37 ` [PATCH v6 1/2] " Zhen Lei
2022-11-09 15:20   ` Frederic Weisbecker
2022-11-10  6:55     ` Leizhen (ThunderTown)
2022-11-09 16:55   ` Elliott, Robert (Servers)
2022-11-09 17:03     ` Elliott, Robert (Servers)
2022-11-10  8:27     ` Leizhen (ThunderTown)
2022-11-09  9:37 ` [PATCH v6 2/2] doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information Zhen Lei
2022-11-09 15:08   ` Frederic Weisbecker
2022-11-10  2:54     ` Leizhen (ThunderTown)
2022-11-09 15:26 ` [PATCH v6 0/2] rcu: Add RCU stall diagnosis information Frederic Weisbecker
2022-11-09 15:59   ` Paul E. McKenney
2022-11-09 17:03     ` Frederic Weisbecker
2022-11-09 17:22       ` Paul E. McKenney
2022-11-10  2:27         ` Leizhen (ThunderTown)
2022-11-12 18:59           ` Paul E. McKenney
2022-11-15  9:11             ` Leizhen (ThunderTown)
2022-11-10  7:29       ` Leizhen (ThunderTown) [this message]
2022-11-10 11:35         ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8b7deaa-1d4f-0e73-269a-32d6105b89a7@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=elliott@hpe.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=quic_neeraju@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.