On Mon, Mar 18, 2024 at 05:05:31PM -0400, Joel Fernandes wrote:
>
>
> > On Mar 18, 2024, at 2:58 PM, Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > Hello, Joel!
> >
> > Sorry for late checking, see below few comments:
> >
> >> In the synchronize_rcu() common case, we will have less than
> >> SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker
> >> is pointless just to free the last injected wait head since at that point,
> >> all the users have already been awakened.
> >>
> >> Introduce a new counter to track this and prevent the wakeup in the
> >> common case.
> >>
> >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >> ---
> >> Rebased on paul/dev of today.
> >>
> >> kernel/rcu/tree.c | 36 +++++++++++++++++++++++++++++++-----
> >> kernel/rcu/tree.h | 1 +
> >> 2 files changed, 32 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index 9fbb5ab57c84..bd29fe3c76bf 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = {
> >> .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED,
> >> .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work,
> >> rcu_sr_normal_gp_cleanup_work),
> >> + .srs_cleanups_pending = ATOMIC_INIT(0),
> >> };
> >>
> >> /* Dump rcu_node combining tree at boot to verify correct setup. */
> >> @@ -1642,8 +1643,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
> >> * the done tail list manipulations are protected here.
> >> */
> >> done = smp_load_acquire(&rcu_state.srs_done_tail);
> >> - if (!done)
> >> + if (!done) {
> >> + /* See comments below. */
> >> + atomic_dec_return_release(&rcu_state.srs_cleanups_pending);
> >> return;
> >> + }
> >>
> >> WARN_ON_ONCE(!rcu_sr_is_wait_head(done));
> >> head = done->next;
> >> @@ -1666,6 +1670,9 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
> >>
> >> rcu_sr_put_wait_head(rcu);
> >> }
> >> +
> >> + /* Order list manipulations with atomic access. */
> >> + atomic_dec_return_release(&rcu_state.srs_cleanups_pending);
> >> }
> >>
> >> /*
> >> @@ -1673,7 +1680,7 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
> >> */
> >> static void rcu_sr_normal_gp_cleanup(void)
> >> {
> >> - struct llist_node *wait_tail, *next, *rcu;
> >> + struct llist_node *wait_tail, *next = NULL, *rcu = NULL;
> >> int done = 0;
> >>
> >> wait_tail = rcu_state.srs_wait_tail;
> >> @@ -1699,16 +1706,35 @@ static void rcu_sr_normal_gp_cleanup(void)
> >> break;
> >> }
> >>
> >> - // concurrent sr_normal_gp_cleanup work might observe this update.
> >> - smp_store_release(&rcu_state.srs_done_tail, wait_tail);
> >> + /*
> >> + * Fast path, no more users to process. Remove the last wait head
> >> + * if no inflight-workers. If there are in-flight workers, let them
> >> + * remove the last wait head.
> >> + */
> >> + WARN_ON_ONCE(!rcu);
> >>
> > This assumption is not correct. An "rcu" can be NULL in fact.
>
> Hmm I could never trigger that. Are you saying that is true after Neeraj recent patch or something else?
> Note, after Neeraj patch to handle the lack of heads availability, it could be true so I requested
> him to rebase his patch on top of this one.
>
> However I will revisit my patch and look for if it could occur but please let me know if you knew of a sequence of events to make it NULL.
> >
I think we should agree on your patch first otherwise it becomes a bit
messy or go with a Neeraj as first step and then work on youth. So, i
reviewed this patch based on latest Paul's dev branch. I see that Neeraj
needs further work.
So this is true without Neeraj patch. Consider the following case:
3 2 1 0
wh -> cb -> cb -> cb -> NULL
we start to process from 2 and handle all clients, in the end,
an "rcu" points to NULL and trigger the WARN_ON_ONCE. I see the
splat during the boot:
<snip>
[ 0.927699][ T16] ------------[ cut here ]------------
[ 0.930867][ T16] WARNING: CPU: 0 PID: 16 at kernel/rcu/tree.c:1721 rcu_gp_cleanup+0x37b/0x4a0
[ 0.930490][ T1] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 0.931401][ T16] Modules linked in:
[ 0.932400][ T1] PCI: Using configuration type 1 for base access
[ 0.932771][ T16]
[ 0.932773][ T16] CPU: 0 PID: 16 Comm: rcu_sched Not tainted 6.8.0-rc2-00089-g65ae0a6b86f0-dirty #1156
[ 0.937780][ T16] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.939402][ T16] RIP: 0010:rcu_gp_cleanup+0x37b/0x4a0
[ 0.940636][ T16] Code: b0 4b bd 72 09 48 81 ff e8 b0 4b bd 76 1e 4c 8b 27 48 83 c7 10 e8 a5 8e fb ff 4c 89 23 83 ed 01 74 0a 4c 89 e7 48 85 ff 75 d2 <0f> 0b 48 8b 35 14 d0 fd 02 48 89 1d 8d 64 d0 01 48 83 c4 08 48 c7
[ 0.942402][ T16] RSP: 0018:ffff9b4a8008fe88 EFLAGS: 00010246
[ 0.943648][ T16] RAX: 0000000000000000 RBX: ffffffffbd4bb0a8 RCX: 6c9b26c9b26c9b27
[ 0.944751][ T16] RDX: 0000000000000000 RSI: 00000000374b92b6 RDI: 0000000000000000
[ 0.945757][ T16] RBP: 0000000000000004 R08: fffffffffff54ea1 R09: 0000000000000000
[ 0.946753][ T16] R10: ffff89070098c278 R11: 0000000000000001 R12: 0000000000000000
[ 0.947752][ T16] R13: fffffffffffffcbc R14: 0000000000000000 R15: ffffffffbd3f1300
[ 0.948764][ T16] FS: 0000000000000000(0000) GS:ffff8915efe00000(0000) knlGS:0000000000000000
[ 0.950403][ T16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.951656][ T16] CR2: ffff89163ffff000 CR3: 00000002eae26000 CR4: 00000000000006f0
[ 0.952755][ T16] Call Trace:
[ 0.953597][ T16] <TASK>
[ 0.955404][ T16] ? __warn+0x80/0x140
[ 0.956608][ T16] ? rcu_gp_cleanup+0x37b/0x4a0
[ 0.957621][ T16] ? report_bug+0x15d/0x180
[ 0.959403][ T16] ? handle_bug+0x3c/0x70
[ 0.960616][ T16] ? exc_invalid_op+0x17/0x70
[ 0.961620][ T16] ? asm_exc_invalid_op+0x1a/0x20
[ 0.962627][ T16] ? rcu_gp_cleanup+0x37b/0x4a0
[ 0.963622][ T16] ? rcu_gp_cleanup+0x36b/0x4a0
[ 0.965403][ T16] ? __pfx_rcu_gp_kthread+0x10/0x10
[ 0.967402][ T16] rcu_gp_kthread+0xf7/0x180
[ 0.968619][ T16] kthread+0xd3/0x100
[ 0.969602][ T16] ? __pfx_kthread+0x10/0x10
[ 0.971402][ T16] ret_from_fork+0x34/0x50
[ 0.972613][ T16] ? __pfx_kthread+0x10/0x10
[ 0.973615][ T16] ret_from_fork_asm+0x1b/0x30
[ 0.974624][ T16] </TASK>
[ 0.975587][ T16] ---[ end trace 0000000000000000 ]---
<snip>
--
Uladzislau Rezki
On Mon, Mar 18, 2024 at 05:34:10PM +0800, Zqiang wrote: > This commit is mainly to enable rcutorture testing to support print > rcu-tasks related gp state, and remove redundant function pointer > initialization in rcu_torture_ops structure's objects. > > Zqiang (2): > rcutorture: Make rcutorture support print rcu-tasks gp state > rcutorture: Removing redundant function pointer initialization Queued for v6.10, thank you! Thanx, Paul > kernel/rcu/rcu.h | 20 ++++++++++---------- > kernel/rcu/rcutorture.c | 38 ++++++++++++++++++-------------------- > kernel/rcu/srcutree.c | 5 +---- > kernel/rcu/tasks.h | 21 +++++++++++++++++++++ > kernel/rcu/tree.c | 13 +++---------- > 5 files changed, 53 insertions(+), 44 deletions(-) > > -- > 2.17.1 >
On Mon, Mar 18, 2024 at 9:32 PM Yan Zhai <yan@cloudflare.com> wrote: > > On Mon, Mar 18, 2024 at 5:59 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote: > > > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote: > > > > There are several scenario in network processing that can run > > > > extensively under heavy traffic. In such situation, RCU synchronization > > > > might not observe desired quiescent states for indefinitely long period. > > > > Create a helper to safely raise the desired RCU quiescent states for > > > > such scenario. > > > > > > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is > > > > sufficient to address existing problems around RCU tasks. It's unclear > > > > yet if there is any future scenario for it to be further tuned down. > > > > > > I suggest something like the following for the commit log: > > > > > > ------------------------------------------------------------------------ > > > > > > When under heavy load, network processing can run CPU-bound for many tens > > > of seconds. Even in preemptible kernels, this can block RCU Tasks grace > > > periods, which can cause trace-event removal to take more than a minute, > > > which is unacceptably long. > > > > > > This commit therefore creates a new helper function that passes > > > through both RCU and RCU-Tasks quiescent states every 100 milliseconds. > > > This hard-coded value suffices for current workloads. > > > > FWIW, this sounds good to me. > > > > > > > > ------------------------------------------------------------------------ > > > > > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> > > > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > > > --- > > > > v3->v4: comment fixup > > > > > > > > --- > > > > include/linux/rcupdate.h | 24 ++++++++++++++++++++++++ > > > > 1 file changed, 24 insertions(+) > > > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > > index 0746b1b0b663..da224706323e 100644 > > > > --- a/include/linux/rcupdate.h > > > > +++ b/include/linux/rcupdate.h > > > > @@ -247,6 +247,30 @@ do { \ > > > > cond_resched(); \ > > > > } while (0) > > > > > > > > +/** > > > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states > > > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. > > > > + * > > > > + * This helper is for network processing in non-RT kernels, where there could > > > > + * be busy polling threads that block RCU synchronization indefinitely. In > > > > + * such context, simply calling cond_resched is insufficient, so give it a > > > > + * stronger push to eliminate all potential blockage of all RCU types. > > > > + * > > > > + * NOTE: unless absolutely sure, this helper should in general be called > > > > + * outside of bh lock section to avoid reporting a surprising QS to updaters, > > > > + * who could be expecting RCU read critical section to end at local_bh_enable(). > > > > + */ > > > > > > How about something like this for the kernel-doc comment? > > > > > > /** > > > * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states > > > * @old_ts: jiffies at start of processing. > > > * > > > * This helper is for long-running softirq handlers, such as those > > > * in networking. The caller should initialize the variable passed in > > > * as @old_ts at the beginning of the softirq handler. When invoked > > > * frequently, this macro will invoke rcu_softirq_qs() every 100 > > > * milliseconds thereafter, which will provide both RCU and RCU-Tasks > > > * quiescent states. Note that this macro modifies its old_ts argument. > > > * > > > * Note that although cond_resched() provides RCU quiescent states, > > > * it does not provide RCU-Tasks quiescent states. > > > * > > > * Because regions of code that have disabled softirq act as RCU > > > * read-side critical sections, this macro should be invoked with softirq > > > * (and preemption) enabled. > > > * > > > * This macro has no effect in CONFIG_PREEMPT_RT kernels. > > > */ > > > > Considering the note about cond_resched(), does does cond_resched() actually > > provide an RCU quiescent state for fully-preemptible kernels? IIUC for those > > cond_resched() expands to: > > > > __might_resched(); > > klp_sched_try_switch() > > > > ... and AFAICT neither reports an RCU quiescent state. > > > > So maybe it's worth dropping the note? > > > > Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that > > avoid the problem through other means, or are people just not running effected > > workloads on that? > > > It's a bit anti-intuition but yes the RT kernel avoids the problem. > This is because "schedule()" reports task RCU QS actually, and on RT > kernel cond_resched() call won't call "__cond_resched()" or > "__schedule(PREEMPT)" as you already pointed out, which would clear > need-resched flag. This then allows "schedule()" to be called on hard > IRQ exit time by time. > And these are excellent questions that I should originally include in the comment. Thanks for bringing it up. Let me send another version tomorrow, allowing more thoughts on this if any. thanks Yan > Yan > > > Mark. > > > > > > > > Thanx, Paul > > > > > > > +#define rcu_softirq_qs_periodic(old_ts) \ > > > > +do { \ > > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ > > > > + time_after(jiffies, (old_ts) + HZ / 10)) { \ > > > > + preempt_disable(); \ > > > > + rcu_softirq_qs(); \ > > > > + preempt_enable(); \ > > > > + (old_ts) = jiffies; \ > > > > + } \ > > > > +} while (0) > > > > + > > > > /* > > > > * Infrastructure to implement the synchronize_() primitives in > > > > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > > > > -- > > > > 2.30.2 > > > > > > > >
On Mon, Mar 18, 2024 at 5:59 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote: > > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote: > > > There are several scenario in network processing that can run > > > extensively under heavy traffic. In such situation, RCU synchronization > > > might not observe desired quiescent states for indefinitely long period. > > > Create a helper to safely raise the desired RCU quiescent states for > > > such scenario. > > > > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is > > > sufficient to address existing problems around RCU tasks. It's unclear > > > yet if there is any future scenario for it to be further tuned down. > > > > I suggest something like the following for the commit log: > > > > ------------------------------------------------------------------------ > > > > When under heavy load, network processing can run CPU-bound for many tens > > of seconds. Even in preemptible kernels, this can block RCU Tasks grace > > periods, which can cause trace-event removal to take more than a minute, > > which is unacceptably long. > > > > This commit therefore creates a new helper function that passes > > through both RCU and RCU-Tasks quiescent states every 100 milliseconds. > > This hard-coded value suffices for current workloads. > > FWIW, this sounds good to me. > > > > > ------------------------------------------------------------------------ > > > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> > > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > > --- > > > v3->v4: comment fixup > > > > > > --- > > > include/linux/rcupdate.h | 24 ++++++++++++++++++++++++ > > > 1 file changed, 24 insertions(+) > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > index 0746b1b0b663..da224706323e 100644 > > > --- a/include/linux/rcupdate.h > > > +++ b/include/linux/rcupdate.h > > > @@ -247,6 +247,30 @@ do { \ > > > cond_resched(); \ > > > } while (0) > > > > > > +/** > > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states > > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. > > > + * > > > + * This helper is for network processing in non-RT kernels, where there could > > > + * be busy polling threads that block RCU synchronization indefinitely. In > > > + * such context, simply calling cond_resched is insufficient, so give it a > > > + * stronger push to eliminate all potential blockage of all RCU types. > > > + * > > > + * NOTE: unless absolutely sure, this helper should in general be called > > > + * outside of bh lock section to avoid reporting a surprising QS to updaters, > > > + * who could be expecting RCU read critical section to end at local_bh_enable(). > > > + */ > > > > How about something like this for the kernel-doc comment? > > > > /** > > * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states > > * @old_ts: jiffies at start of processing. > > * > > * This helper is for long-running softirq handlers, such as those > > * in networking. The caller should initialize the variable passed in > > * as @old_ts at the beginning of the softirq handler. When invoked > > * frequently, this macro will invoke rcu_softirq_qs() every 100 > > * milliseconds thereafter, which will provide both RCU and RCU-Tasks > > * quiescent states. Note that this macro modifies its old_ts argument. > > * > > * Note that although cond_resched() provides RCU quiescent states, > > * it does not provide RCU-Tasks quiescent states. > > * > > * Because regions of code that have disabled softirq act as RCU > > * read-side critical sections, this macro should be invoked with softirq > > * (and preemption) enabled. > > * > > * This macro has no effect in CONFIG_PREEMPT_RT kernels. > > */ > > Considering the note about cond_resched(), does does cond_resched() actually > provide an RCU quiescent state for fully-preemptible kernels? IIUC for those > cond_resched() expands to: > > __might_resched(); > klp_sched_try_switch() > > ... and AFAICT neither reports an RCU quiescent state. > > So maybe it's worth dropping the note? > > Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that > avoid the problem through other means, or are people just not running effected > workloads on that? > It's a bit anti-intuition but yes the RT kernel avoids the problem. This is because "schedule()" reports task RCU QS actually, and on RT kernel cond_resched() call won't call "__cond_resched()" or "__schedule(PREEMPT)" as you already pointed out, which would clear need-resched flag. This then allows "schedule()" to be called on hard IRQ exit time by time. Yan > Mark. > > > > > Thanx, Paul > > > > > +#define rcu_softirq_qs_periodic(old_ts) \ > > > +do { \ > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ > > > + time_after(jiffies, (old_ts) + HZ / 10)) { \ > > > + preempt_disable(); \ > > > + rcu_softirq_qs(); \ > > > + preempt_enable(); \ > > > + (old_ts) = jiffies; \ > > > + } \ > > > +} while (0) > > > + > > > /* > > > * Infrastructure to implement the synchronize_() primitives in > > > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > > > -- > > > 2.30.2 > > > > > >
On Mon, Mar 18, 2024 at 04:43:38PM +0100, Marco Elver wrote: > On Mon, 18 Mar 2024 at 11:01, Marco Elver <elver@google.com> wrote: > > > > On Sun, 17 Mar 2024 at 22:55, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > > > > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > > > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > > > > > > > Because not all architectures support 1-byte cmpxchg? > > > > > What prevents us from implementing it? > > > > > > > > Nothing that I know of, but I didn't want to put up with the KCSAN report > > > > in the interim. > > > > > > And here is a lightly tested patch to emulate one-byte and two-byte > > > cmpxchg() for architectures that do not support it. This is just the > > > emulation, and would be followed up with patches to make the relevant > > > architectures make use of it. > > > > > > The one-byte emulation has been lightly tested on x86. > > > > > > Thoughts? > > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > commit d72e54166b56d8b373676e1e92a426a07d53899a > > > Author: Paul E. McKenney <paulmck@kernel.org> > > > Date: Sun Mar 17 14:44:38 2024 -0700 > > > > > > lib: Add one-byte and two-byte cmpxchg() emulation functions > > > > > > Architectures are required to provide four-byte cmpxchg() and 64-bit > > > architectures are additionally required to provide eight-byte cmpxchg(). > > > However, there are cases where one-byte and two-byte cmpxchg() > > > would be extremely useful. Therefore, provide cmpxchg_emu_u8() and > > > cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms > > > of four-byte cmpxchg(). > > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > Cc: Marco Elver <elver@google.com> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > > > Cc: Douglas Anderson <dianders@chromium.org> > > > Cc: Petr Mladek <pmladek@suse.com> > > > Cc: <linux-arch@vger.kernel.org> > > > > > > diff --git a/arch/Kconfig b/arch/Kconfig > > > index 154f994547632..eef11e9918ec7 100644 > > > --- a/arch/Kconfig > > > +++ b/arch/Kconfig > > > @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT > > > default 4 if FUNCTION_ALIGNMENT_4B > > > default 0 > > > > > > +config ARCH_NEED_CMPXCHG_1_2_EMU > > > + bool > > > + > > > endmenu > > > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h > > > new file mode 100644 > > > index 0000000000000..fee8171fa05eb > > > --- /dev/null > > > +++ b/include/linux/cmpxchg-emu.h > > > @@ -0,0 +1,16 @@ > > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > > +/* > > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > > + * lacking direct support for these sizes. These are implemented in terms > > > + * of 4-byte cmpxchg operations. > > > + * > > > + * Copyright (C) 2024 Paul E. McKenney. > > > + */ > > > + > > > +#ifndef __LINUX_CMPXCHG_EMU_H > > > +#define __LINUX_CMPXCHG_EMU_H > > > + > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); > > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); > > > + > > > +#endif /* __LINUX_CMPXCHG_EMU_H */ > > > diff --git a/lib/Makefile b/lib/Makefile > > > index 6b09731d8e619..fecd7b8c09cbd 100644 > > > --- a/lib/Makefile > > > +++ b/lib/Makefile > > > @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o > > > lib-$(CONFIG_GENERIC_BUG) += bug.o > > > > > > obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o > > > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o > > > > Since you add instrumentation explicitly, we need to suppress > > instrumentation somehow. For the whole file this can be done with: > > > > KCSAN_SANITIZE_cmpxchg-emu.o := n > > Hrm, I recall this doesn't actually work as-is because it also > disables instrument_read_write() instrumentation. > > So I think the most reliable would be to use data_race() after all. > It'll be a bit slower because of double-instrumenting, but I think > that's not a major concern with an instrumented build anyway. And I have added data_race(), thank you! > > Note, since you use cmpxchg, which pulls in its own > > instrument_read_write(), we can't use a function attribute (like > > __no_kcsan) if the whole-file no-instrumentation seems like overkill. > > Alternatively the cmpxchg could be wrapped into a data_race() (like > > your original RCU use case was doing). > > > > But I think "KCSAN_SANITIZE_cmpxchg-emu.o := n" would be my preferred way. > > > > With the explicit "instrument_read_write()" also note that this would > > do double-instrumentation with other sanitizers (KASAN, KMSAN). But I > > think we actually want to instrument the whole real access with those > > tools - would it be bad if we accessed some memory out-of-bounds, but > > that memory isn't actually used? I don't have a clear answer to that. > > > > Also, it might be useful to have an alignment check somewhere, because > > otherwise we end up with split atomic accesses (or whatever other bad > > thing the given arch does if that happens). Excellent point, added. I also fixed an embarrassing pointer-arithmetic bug which the act of coding the alignment check uncovered, so two for one! ;-) Please see below for a patch to the patched code. Thanx, Paul > > Thanks, > > -- Marco > > > > > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o > > > #ensure exported functions have prototypes > > > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c > > > new file mode 100644 > > > index 0000000000000..508b55484c2b6 > > > --- /dev/null > > > +++ b/lib/cmpxchg-emu.c > > > @@ -0,0 +1,68 @@ > > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > > +/* > > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > > + * lacking direct support for these sizes. These are implemented in terms > > > + * of 4-byte cmpxchg operations. > > > + * > > > + * Copyright (C) 2024 Paul E. McKenney. > > > + */ > > > + > > > +#include <linux/types.h> > > > +#include <linux/export.h> > > > +#include <linux/instrumented.h> > > > +#include <linux/atomic.h> > > > +#include <asm-generic/rwonce.h> > > > + > > > +union u8_32 { > > > + u8 b[4]; > > > + u32 w; > > > +}; > > > + > > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) > > > +{ > > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); > > > + int i = ((uintptr_t)p) & 0x3; > > > + union u8_32 old32; > > > + union u8_32 new32; > > > + u32 ret; > > > + > > > + old32.w = READ_ONCE(*p32); > > > + do { > > > + if (old32.b[i] != old) > > > + return old32.b[i]; > > > + new32.w = old32.w; > > > + new32.b[i] = new; > > > + instrument_atomic_read_write(p, 1); > > > + ret = cmpxchg(p32, old32.w, new32.w); > > > + } while (ret != old32.w); > > > + return old; > > > +} > > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); > > > + > > > +union u16_32 { > > > + u16 h[2]; > > > + u32 w; > > > +}; > > > + > > > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > > > +{ > > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); > > > + int i = ((uintptr_t)p) & 0x1; > > > + union u16_32 old32; > > > + union u16_32 new32; > > > + u32 ret; > > > + > > > + old32.w = READ_ONCE(*p32); > > > + do { > > > + if (old32.h[i] != old) > > > + return old32.h[i]; > > > + new32.w = old32.w; > > > + new32.h[i] = new; > > > + instrument_atomic_read_write(p, 2); > > > + ret = cmpxchg(p32, old32.w, new32.w); > > > + } while (ret != old32.w); > > > + return old; > > > +} > > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16); diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c index 508b55484c2b6..b904f954dd4fc 100644 --- a/lib/cmpxchg-emu.c +++ b/lib/cmpxchg-emu.c @@ -11,6 +11,8 @@ #include <linux/export.h> #include <linux/instrumented.h> #include <linux/atomic.h> +#include <linux/panic.h> +#include <linux/bug.h> #include <asm-generic/rwonce.h> union u8_32 { @@ -34,7 +36,7 @@ uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) new32.w = old32.w; new32.b[i] = new; instrument_atomic_read_write(p, 1); - ret = cmpxchg(p32, old32.w, new32.w); + ret = data_race(cmpxchg(p32, old32.w, new32.w)); } while (ret != old32.w); return old; } @@ -48,12 +50,13 @@ union u16_32 { /* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) { - u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); - int i = ((uintptr_t)p) & 0x1; + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); + int i = (((uintptr_t)p) & 0x2) / 2; union u16_32 old32; union u16_32 new32; u32 ret; + WARN_ON_ONCE(((uintptr_t)p) & 0x1); old32.w = READ_ONCE(*p32); do { if (old32.h[i] != old) @@ -61,7 +64,7 @@ uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) new32.w = old32.w; new32.h[i] = new; instrument_atomic_read_write(p, 2); - ret = cmpxchg(p32, old32.w, new32.w); + ret = data_race(cmpxchg(p32, old32.w, new32.w)); } while (ret != old32.w); return old; }
On Sat, Mar 16, 2024 at 12:41 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote: > > There are several scenario in network processing that can run > > extensively under heavy traffic. In such situation, RCU synchronization > > might not observe desired quiescent states for indefinitely long period. > > Create a helper to safely raise the desired RCU quiescent states for > > such scenario. > > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is > > sufficient to address existing problems around RCU tasks. It's unclear > > yet if there is any future scenario for it to be further tuned down. > > I suggest something like the following for the commit log: > > ------------------------------------------------------------------------ > > When under heavy load, network processing can run CPU-bound for many tens > of seconds. Even in preemptible kernels, this can block RCU Tasks grace > periods, which can cause trace-event removal to take more than a minute, > which is unacceptably long. > > This commit therefore creates a new helper function that passes > through both RCU and RCU-Tasks quiescent states every 100 milliseconds. > This hard-coded value suffices for current workloads. > > ------------------------------------------------------------------------ > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > --- > > v3->v4: comment fixup > > > > --- > > include/linux/rcupdate.h | 24 ++++++++++++++++++++++++ > > 1 file changed, 24 insertions(+) > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 0746b1b0b663..da224706323e 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -247,6 +247,30 @@ do { \ > > cond_resched(); \ > > } while (0) > > > > +/** > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. > > + * > > + * This helper is for network processing in non-RT kernels, where there could > > + * be busy polling threads that block RCU synchronization indefinitely. In > > + * such context, simply calling cond_resched is insufficient, so give it a > > + * stronger push to eliminate all potential blockage of all RCU types. > > + * > > + * NOTE: unless absolutely sure, this helper should in general be called > > + * outside of bh lock section to avoid reporting a surprising QS to updaters, > > + * who could be expecting RCU read critical section to end at local_bh_enable(). > > + */ > > How about something like this for the kernel-doc comment? > > /** > * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states > * @old_ts: jiffies at start of processing. > * > * This helper is for long-running softirq handlers, such as those > * in networking. The caller should initialize the variable passed in > * as @old_ts at the beginning of the softirq handler. When invoked > * frequently, this macro will invoke rcu_softirq_qs() every 100 > * milliseconds thereafter, which will provide both RCU and RCU-Tasks > * quiescent states. Note that this macro modifies its old_ts argument. > * > * Note that although cond_resched() provides RCU quiescent states, > * it does not provide RCU-Tasks quiescent states. > * > * Because regions of code that have disabled softirq act as RCU > * read-side critical sections, this macro should be invoked with softirq > * (and preemption) enabled. > * > * This macro has no effect in CONFIG_PREEMPT_RT kernels. > */ > It would be more accurate this way, I like it. Thanks! Yan > Thanx, Paul > > > +#define rcu_softirq_qs_periodic(old_ts) \ > > +do { \ > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ > > + time_after(jiffies, (old_ts) + HZ / 10)) { \ > > + preempt_disable(); \ > > + rcu_softirq_qs(); \ > > + preempt_enable(); \ > > + (old_ts) = jiffies; \ > > + } \ > > +} while (0) > > + > > /* > > * Infrastructure to implement the synchronize_() primitives in > > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > > -- > > 2.30.2 > > > >
> On Mar 18, 2024, at 2:58 PM, Uladzislau Rezki <urezki@gmail.com> wrote: > > Hello, Joel! > > Sorry for late checking, see below few comments: > >> In the synchronize_rcu() common case, we will have less than >> SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker >> is pointless just to free the last injected wait head since at that point, >> all the users have already been awakened. >> >> Introduce a new counter to track this and prevent the wakeup in the >> common case. >> >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >> --- >> Rebased on paul/dev of today. >> >> kernel/rcu/tree.c | 36 +++++++++++++++++++++++++++++++----- >> kernel/rcu/tree.h | 1 + >> 2 files changed, 32 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >> index 9fbb5ab57c84..bd29fe3c76bf 100644 >> --- a/kernel/rcu/tree.c >> +++ b/kernel/rcu/tree.c >> @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { >> .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, >> .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, >> rcu_sr_normal_gp_cleanup_work), >> + .srs_cleanups_pending = ATOMIC_INIT(0), >> }; >> >> /* Dump rcu_node combining tree at boot to verify correct setup. */ >> @@ -1642,8 +1643,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) >> * the done tail list manipulations are protected here. >> */ >> done = smp_load_acquire(&rcu_state.srs_done_tail); >> - if (!done) >> + if (!done) { >> + /* See comments below. */ >> + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); >> return; >> + } >> >> WARN_ON_ONCE(!rcu_sr_is_wait_head(done)); >> head = done->next; >> @@ -1666,6 +1670,9 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) >> >> rcu_sr_put_wait_head(rcu); >> } >> + >> + /* Order list manipulations with atomic access. */ >> + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); >> } >> >> /* >> @@ -1673,7 +1680,7 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) >> */ >> static void rcu_sr_normal_gp_cleanup(void) >> { >> - struct llist_node *wait_tail, *next, *rcu; >> + struct llist_node *wait_tail, *next = NULL, *rcu = NULL; >> int done = 0; >> >> wait_tail = rcu_state.srs_wait_tail; >> @@ -1699,16 +1706,35 @@ static void rcu_sr_normal_gp_cleanup(void) >> break; >> } >> >> - // concurrent sr_normal_gp_cleanup work might observe this update. >> - smp_store_release(&rcu_state.srs_done_tail, wait_tail); >> + /* >> + * Fast path, no more users to process. Remove the last wait head >> + * if no inflight-workers. If there are in-flight workers, let them >> + * remove the last wait head. >> + */ >> + WARN_ON_ONCE(!rcu); >> > This assumption is not correct. An "rcu" can be NULL in fact. Hmm I could never trigger that. Are you saying that is true after Neeraj recent patch or something else? Note, after Neeraj patch to handle the lack of heads availability, it could be true so I requested him to rebase his patch on top of this one. However I will revisit my patch and look for if it could occur but please let me know if you knew of a sequence of events to make it NULL. > >> ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); >> >> + if (rcu && rcu_sr_is_wait_head(rcu) && rcu->next == NULL && >> + /* Order atomic access with list manipulation. */ >> + !atomic_read_acquire(&rcu_state.srs_cleanups_pending)) { >> + wait_tail->next = NULL; >> + rcu_sr_put_wait_head(rcu); >> + smp_store_release(&rcu_state.srs_done_tail, wait_tail); >> + return; >> + } >> + >> + /* Concurrent sr_normal_gp_cleanup work might observe this update. */ >> + smp_store_release(&rcu_state.srs_done_tail, wait_tail); >> + >> /* >> * We schedule a work in order to perform a final processing >> * of outstanding users(if still left) and releasing wait-heads >> * added by rcu_sr_normal_gp_init() call. >> */ >> - queue_work(sync_wq, &rcu_state.srs_cleanup_work); >> + atomic_inc(&rcu_state.srs_cleanups_pending); >> + if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) { >> + atomic_dec(&rcu_state.srs_cleanups_pending); >> + } >> } > No need an extra "{}" pair. I do prefer it for readability but I am ok with dropping it. Thanks! - Joel > >> >> /* >> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h >> index bae7925c497f..affcb92a358c 100644 >> --- a/kernel/rcu/tree.h >> +++ b/kernel/rcu/tree.h >> @@ -420,6 +420,7 @@ struct rcu_state { >> struct llist_node *srs_done_tail; /* ready for GP users. */ >> struct sr_wait_node srs_wait_nodes[SR_NORMAL_GP_WAIT_HEAD_MAX]; >> struct work_struct srs_cleanup_work; >> + atomic_t srs_cleanups_pending; /* srs inflight worker cleanups. */ >> }; >> >> /* Values for rcu_state structure's gp_flags field. */ >> -- >> 2.34.1 >>
Hello, Joel! Sorry for late checking, see below few comments: > In the synchronize_rcu() common case, we will have less than > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > is pointless just to free the last injected wait head since at that point, > all the users have already been awakened. > > Introduce a new counter to track this and prevent the wakeup in the > common case. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > Rebased on paul/dev of today. > > kernel/rcu/tree.c | 36 +++++++++++++++++++++++++++++++----- > kernel/rcu/tree.h | 1 + > 2 files changed, 32 insertions(+), 5 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 9fbb5ab57c84..bd29fe3c76bf 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > rcu_sr_normal_gp_cleanup_work), > + .srs_cleanups_pending = ATOMIC_INIT(0), > }; > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > @@ -1642,8 +1643,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > * the done tail list manipulations are protected here. > */ > done = smp_load_acquire(&rcu_state.srs_done_tail); > - if (!done) > + if (!done) { > + /* See comments below. */ > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > return; > + } > > WARN_ON_ONCE(!rcu_sr_is_wait_head(done)); > head = done->next; > @@ -1666,6 +1670,9 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > > rcu_sr_put_wait_head(rcu); > } > + > + /* Order list manipulations with atomic access. */ > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > } > > /* > @@ -1673,7 +1680,7 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > */ > static void rcu_sr_normal_gp_cleanup(void) > { > - struct llist_node *wait_tail, *next, *rcu; > + struct llist_node *wait_tail, *next = NULL, *rcu = NULL; > int done = 0; > > wait_tail = rcu_state.srs_wait_tail; > @@ -1699,16 +1706,35 @@ static void rcu_sr_normal_gp_cleanup(void) > break; > } > > - // concurrent sr_normal_gp_cleanup work might observe this update. > - smp_store_release(&rcu_state.srs_done_tail, wait_tail); > + /* > + * Fast path, no more users to process. Remove the last wait head > + * if no inflight-workers. If there are in-flight workers, let them > + * remove the last wait head. > + */ > + WARN_ON_ONCE(!rcu); > This assumption is not correct. An "rcu" can be NULL in fact. > ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); > > + if (rcu && rcu_sr_is_wait_head(rcu) && rcu->next == NULL && > + /* Order atomic access with list manipulation. */ > + !atomic_read_acquire(&rcu_state.srs_cleanups_pending)) { > + wait_tail->next = NULL; > + rcu_sr_put_wait_head(rcu); > + smp_store_release(&rcu_state.srs_done_tail, wait_tail); > + return; > + } > + > + /* Concurrent sr_normal_gp_cleanup work might observe this update. */ > + smp_store_release(&rcu_state.srs_done_tail, wait_tail); > + > /* > * We schedule a work in order to perform a final processing > * of outstanding users(if still left) and releasing wait-heads > * added by rcu_sr_normal_gp_init() call. > */ > - queue_work(sync_wq, &rcu_state.srs_cleanup_work); > + atomic_inc(&rcu_state.srs_cleanups_pending); > + if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) { > + atomic_dec(&rcu_state.srs_cleanups_pending); > + } > } No need an extra "{}" pair. > > /* > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h > index bae7925c497f..affcb92a358c 100644 > --- a/kernel/rcu/tree.h > +++ b/kernel/rcu/tree.h > @@ -420,6 +420,7 @@ struct rcu_state { > struct llist_node *srs_done_tail; /* ready for GP users. */ > struct sr_wait_node srs_wait_nodes[SR_NORMAL_GP_WAIT_HEAD_MAX]; > struct work_struct srs_cleanup_work; > + atomic_t srs_cleanups_pending; /* srs inflight worker cleanups. */ > }; > > /* Values for rcu_state structure's gp_flags field. */ > -- > 2.34.1 >
On Mon, 18 Mar 2024 at 11:01, Marco Elver <elver@google.com> wrote: > > On Sun, 17 Mar 2024 at 22:55, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > > > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > > > > > Because not all architectures support 1-byte cmpxchg? > > > > What prevents us from implementing it? > > > > > > Nothing that I know of, but I didn't want to put up with the KCSAN report > > > in the interim. > > > > And here is a lightly tested patch to emulate one-byte and two-byte > > cmpxchg() for architectures that do not support it. This is just the > > emulation, and would be followed up with patches to make the relevant > > architectures make use of it. > > > > The one-byte emulation has been lightly tested on x86. > > > > Thoughts? > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit d72e54166b56d8b373676e1e92a426a07d53899a > > Author: Paul E. McKenney <paulmck@kernel.org> > > Date: Sun Mar 17 14:44:38 2024 -0700 > > > > lib: Add one-byte and two-byte cmpxchg() emulation functions > > > > Architectures are required to provide four-byte cmpxchg() and 64-bit > > architectures are additionally required to provide eight-byte cmpxchg(). > > However, there are cases where one-byte and two-byte cmpxchg() > > would be extremely useful. Therefore, provide cmpxchg_emu_u8() and > > cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms > > of four-byte cmpxchg(). > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Cc: Marco Elver <elver@google.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > > Cc: Douglas Anderson <dianders@chromium.org> > > Cc: Petr Mladek <pmladek@suse.com> > > Cc: <linux-arch@vger.kernel.org> > > > > diff --git a/arch/Kconfig b/arch/Kconfig > > index 154f994547632..eef11e9918ec7 100644 > > --- a/arch/Kconfig > > +++ b/arch/Kconfig > > @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT > > default 4 if FUNCTION_ALIGNMENT_4B > > default 0 > > > > +config ARCH_NEED_CMPXCHG_1_2_EMU > > + bool > > + > > endmenu > > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h > > new file mode 100644 > > index 0000000000000..fee8171fa05eb > > --- /dev/null > > +++ b/include/linux/cmpxchg-emu.h > > @@ -0,0 +1,16 @@ > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > +/* > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > + * lacking direct support for these sizes. These are implemented in terms > > + * of 4-byte cmpxchg operations. > > + * > > + * Copyright (C) 2024 Paul E. McKenney. > > + */ > > + > > +#ifndef __LINUX_CMPXCHG_EMU_H > > +#define __LINUX_CMPXCHG_EMU_H > > + > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); > > + > > +#endif /* __LINUX_CMPXCHG_EMU_H */ > > diff --git a/lib/Makefile b/lib/Makefile > > index 6b09731d8e619..fecd7b8c09cbd 100644 > > --- a/lib/Makefile > > +++ b/lib/Makefile > > @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o > > lib-$(CONFIG_GENERIC_BUG) += bug.o > > > > obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o > > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o > > Since you add instrumentation explicitly, we need to suppress > instrumentation somehow. For the whole file this can be done with: > > KCSAN_SANITIZE_cmpxchg-emu.o := n Hrm, I recall this doesn't actually work as-is because it also disables instrument_read_write() instrumentation. So I think the most reliable would be to use data_race() after all. It'll be a bit slower because of double-instrumenting, but I think that's not a major concern with an instrumented build anyway. > Note, since you use cmpxchg, which pulls in its own > instrument_read_write(), we can't use a function attribute (like > __no_kcsan) if the whole-file no-instrumentation seems like overkill. > Alternatively the cmpxchg could be wrapped into a data_race() (like > your original RCU use case was doing). > > But I think "KCSAN_SANITIZE_cmpxchg-emu.o := n" would be my preferred way. > > With the explicit "instrument_read_write()" also note that this would > do double-instrumentation with other sanitizers (KASAN, KMSAN). But I > think we actually want to instrument the whole real access with those > tools - would it be bad if we accessed some memory out-of-bounds, but > that memory isn't actually used? I don't have a clear answer to that. > > Also, it might be useful to have an alignment check somewhere, because > otherwise we end up with split atomic accesses (or whatever other bad > thing the given arch does if that happens). > > Thanks, > -- Marco > > > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o > > #ensure exported functions have prototypes > > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c > > new file mode 100644 > > index 0000000000000..508b55484c2b6 > > --- /dev/null > > +++ b/lib/cmpxchg-emu.c > > @@ -0,0 +1,68 @@ > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > +/* > > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > > + * lacking direct support for these sizes. These are implemented in terms > > + * of 4-byte cmpxchg operations. > > + * > > + * Copyright (C) 2024 Paul E. McKenney. > > + */ > > + > > +#include <linux/types.h> > > +#include <linux/export.h> > > +#include <linux/instrumented.h> > > +#include <linux/atomic.h> > > +#include <asm-generic/rwonce.h> > > + > > +union u8_32 { > > + u8 b[4]; > > + u32 w; > > +}; > > + > > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) > > +{ > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); > > + int i = ((uintptr_t)p) & 0x3; > > + union u8_32 old32; > > + union u8_32 new32; > > + u32 ret; > > + > > + old32.w = READ_ONCE(*p32); > > + do { > > + if (old32.b[i] != old) > > + return old32.b[i]; > > + new32.w = old32.w; > > + new32.b[i] = new; > > + instrument_atomic_read_write(p, 1); > > + ret = cmpxchg(p32, old32.w, new32.w); > > + } while (ret != old32.w); > > + return old; > > +} > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); > > + > > +union u16_32 { > > + u16 h[2]; > > + u32 w; > > +}; > > + > > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ > > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > > +{ > > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); > > + int i = ((uintptr_t)p) & 0x1; > > + union u16_32 old32; > > + union u16_32 new32; > > + u32 ret; > > + > > + old32.w = READ_ONCE(*p32); > > + do { > > + if (old32.h[i] != old) > > + return old32.h[i]; > > + new32.w = old32.w; > > + new32.h[i] = new; > > + instrument_atomic_read_write(p, 2); > > + ret = cmpxchg(p32, old32.w, new32.w); > > + } while (ret != old32.w); > > + return old; > > +} > > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
> 2024年3月18日 07:02,Paul E. McKenney <paulmck@kernel.org> wrote: > > On Mon, Mar 18, 2024 at 01:47:43AM +0800, Alan Huang wrote: >> Hi, >> >> I’m playing with the LKMM, then I saw the ISA2+pooncelock+pooncelock+pombonce. >> >> The original litmus is as follows: >> ------------------------------------------------------ >> P0(int *x, int *y, spinlock_t *mylock) >> { >> spin_lock(mylock); >> WRITE_ONCE(*x, 1); >> WRITE_ONCE(*y, 1); >> spin_unlock(mylock); >> } >> >> P1(int *y, int *z, spinlock_t *mylock) >> { >> int r0; >> >> spin_lock(mylock); >> r0 = READ_ONCE(*y); >> WRITE_ONCE(*z, 1); >> spin_unlock(mylock); >> } >> >> P2(int *x, int *z) >> { >> int r1; >> int r2; >> >> r2 = READ_ONCE(*z); >> smp_mb(); >> r1 = READ_ONCE(*x); >> } >> >> exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) >> ------------------------------------------------------ >> Of course, the result is Never. >> >> But when I delete P0’s spin_lock and P1’s spin_unlock: >> ------------------------------------------------------- >> P0(int *x, int *y, spinlock_t *mylock) >> { >> WRITE_ONCE(*x, 1); >> WRITE_ONCE(*y, 1); >> spin_unlock(mylock); >> } >> >> P1(int *y, int *z, spinlock_t *mylock) >> { >> int r0; >> >> spin_lock(mylock); >> r0 = READ_ONCE(*y); >> WRITE_ONCE(*z, 1); >> } >> >> P2(int *x, int *z) >> { >> int r1; >> int r2; >> >> r2 = READ_ONCE(*z); >> smp_mb(); >> r1 = READ_ONCE(*x); >> } >> >> exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) >> ------------------------------------------------------ >> Then herd told me the result is Sometimes. > > You mean like this? > > Test ISA2+pooncelock+pooncelock+pombonce Allowed > States 8 > 1:r0=0; 2:r1=0; 2:r2=0; > 1:r0=0; 2:r1=0; 2:r2=1; > 1:r0=0; 2:r1=1; 2:r2=0; > 1:r0=0; 2:r1=1; 2:r2=1; > 1:r0=1; 2:r1=0; 2:r2=0; > 1:r0=1; 2:r1=0; 2:r2=1; > 1:r0=1; 2:r1=1; 2:r2=0; > 1:r0=1; 2:r1=1; 2:r2=1; > Ok > Witnesses > Positive: 1 Negative: 7 > Flag unmatched-unlock > Condition exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) > Observation ISA2+pooncelock+pooncelock+pombonce Sometimes 1 7 > Time ISA2+pooncelock+pooncelock+pombonce 0.01 > Hash=f55b8515e48310f812aa676084f2cc88 > >> Is this expected? > > There are no locks held initially, so why can't the following > sequence of events unfold: > > o P1() acquires the lock. > > o P0() does WRITE_ONCE(*y, 1). (Yes, out of order) > > o P1() does READ_ONCE(*y), and gets 1. > > o P1() does WRITE_ONCE(*z, 1). > > o P2() does READ_ONCE(*z) and gets 1. > > o P2() does smp_mb(), but there is nothing to order with. > > o P2() does READ_ONCE(*x) and gets 0. > > o P0() does WRITE_ONCE(*x, 1), but too late to affect P2(). > > o P0() releases the lock that is does not hold, which is why you see > the "Flag unmatched-unlock" in the output. LKMM is complaining > that the litmus test is not legitimate, and rightly so! Oh! I missed that line, Thank you for pointing this out! :) > > Or am I missing your point? > > Thanx, Paul
On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote: > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote: > > There are several scenario in network processing that can run > > extensively under heavy traffic. In such situation, RCU synchronization > > might not observe desired quiescent states for indefinitely long period. > > Create a helper to safely raise the desired RCU quiescent states for > > such scenario. > > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is > > sufficient to address existing problems around RCU tasks. It's unclear > > yet if there is any future scenario for it to be further tuned down. > > I suggest something like the following for the commit log: > > ------------------------------------------------------------------------ > > When under heavy load, network processing can run CPU-bound for many tens > of seconds. Even in preemptible kernels, this can block RCU Tasks grace > periods, which can cause trace-event removal to take more than a minute, > which is unacceptably long. > > This commit therefore creates a new helper function that passes > through both RCU and RCU-Tasks quiescent states every 100 milliseconds. > This hard-coded value suffices for current workloads. FWIW, this sounds good to me. > > ------------------------------------------------------------------------ > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > --- > > v3->v4: comment fixup > > > > --- > > include/linux/rcupdate.h | 24 ++++++++++++++++++++++++ > > 1 file changed, 24 insertions(+) > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 0746b1b0b663..da224706323e 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -247,6 +247,30 @@ do { \ > > cond_resched(); \ > > } while (0) > > > > +/** > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. > > + * > > + * This helper is for network processing in non-RT kernels, where there could > > + * be busy polling threads that block RCU synchronization indefinitely. In > > + * such context, simply calling cond_resched is insufficient, so give it a > > + * stronger push to eliminate all potential blockage of all RCU types. > > + * > > + * NOTE: unless absolutely sure, this helper should in general be called > > + * outside of bh lock section to avoid reporting a surprising QS to updaters, > > + * who could be expecting RCU read critical section to end at local_bh_enable(). > > + */ > > How about something like this for the kernel-doc comment? > > /** > * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states > * @old_ts: jiffies at start of processing. > * > * This helper is for long-running softirq handlers, such as those > * in networking. The caller should initialize the variable passed in > * as @old_ts at the beginning of the softirq handler. When invoked > * frequently, this macro will invoke rcu_softirq_qs() every 100 > * milliseconds thereafter, which will provide both RCU and RCU-Tasks > * quiescent states. Note that this macro modifies its old_ts argument. > * > * Note that although cond_resched() provides RCU quiescent states, > * it does not provide RCU-Tasks quiescent states. > * > * Because regions of code that have disabled softirq act as RCU > * read-side critical sections, this macro should be invoked with softirq > * (and preemption) enabled. > * > * This macro has no effect in CONFIG_PREEMPT_RT kernels. > */ Considering the note about cond_resched(), does does cond_resched() actually provide an RCU quiescent state for fully-preemptible kernels? IIUC for those cond_resched() expands to: __might_resched(); klp_sched_try_switch() ... and AFAICT neither reports an RCU quiescent state. So maybe it's worth dropping the note? Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that avoid the problem through other means, or are people just not running effected workloads on that? Mark. > > Thanx, Paul > > > +#define rcu_softirq_qs_periodic(old_ts) \ > > +do { \ > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ > > + time_after(jiffies, (old_ts) + HZ / 10)) { \ > > + preempt_disable(); \ > > + rcu_softirq_qs(); \ > > + preempt_enable(); \ > > + (old_ts) = jiffies; \ > > + } \ > > +} while (0) > > + > > /* > > * Infrastructure to implement the synchronize_() primitives in > > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > > -- > > 2.30.2 > > > >
On Sun, 17 Mar 2024 at 22:55, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > > > Because not all architectures support 1-byte cmpxchg? > > > What prevents us from implementing it? > > > > Nothing that I know of, but I didn't want to put up with the KCSAN report > > in the interim. > > And here is a lightly tested patch to emulate one-byte and two-byte > cmpxchg() for architectures that do not support it. This is just the > emulation, and would be followed up with patches to make the relevant > architectures make use of it. > > The one-byte emulation has been lightly tested on x86. > > Thoughts? > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit d72e54166b56d8b373676e1e92a426a07d53899a > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Sun Mar 17 14:44:38 2024 -0700 > > lib: Add one-byte and two-byte cmpxchg() emulation functions > > Architectures are required to provide four-byte cmpxchg() and 64-bit > architectures are additionally required to provide eight-byte cmpxchg(). > However, there are cases where one-byte and two-byte cmpxchg() > would be extremely useful. Therefore, provide cmpxchg_emu_u8() and > cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms > of four-byte cmpxchg(). > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Cc: Marco Elver <elver@google.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > Cc: Douglas Anderson <dianders@chromium.org> > Cc: Petr Mladek <pmladek@suse.com> > Cc: <linux-arch@vger.kernel.org> > > diff --git a/arch/Kconfig b/arch/Kconfig > index 154f994547632..eef11e9918ec7 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT > default 4 if FUNCTION_ALIGNMENT_4B > default 0 > > +config ARCH_NEED_CMPXCHG_1_2_EMU > + bool > + > endmenu > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h > new file mode 100644 > index 0000000000000..fee8171fa05eb > --- /dev/null > +++ b/include/linux/cmpxchg-emu.h > @@ -0,0 +1,16 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > +/* > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > + * lacking direct support for these sizes. These are implemented in terms > + * of 4-byte cmpxchg operations. > + * > + * Copyright (C) 2024 Paul E. McKenney. > + */ > + > +#ifndef __LINUX_CMPXCHG_EMU_H > +#define __LINUX_CMPXCHG_EMU_H > + > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); > + > +#endif /* __LINUX_CMPXCHG_EMU_H */ > diff --git a/lib/Makefile b/lib/Makefile > index 6b09731d8e619..fecd7b8c09cbd 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o > lib-$(CONFIG_GENERIC_BUG) += bug.o > > obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o Since you add instrumentation explicitly, we need to suppress instrumentation somehow. For the whole file this can be done with: KCSAN_SANITIZE_cmpxchg-emu.o := n Note, since you use cmpxchg, which pulls in its own instrument_read_write(), we can't use a function attribute (like __no_kcsan) if the whole-file no-instrumentation seems like overkill. Alternatively the cmpxchg could be wrapped into a data_race() (like your original RCU use case was doing). But I think "KCSAN_SANITIZE_cmpxchg-emu.o := n" would be my preferred way. With the explicit "instrument_read_write()" also note that this would do double-instrumentation with other sanitizers (KASAN, KMSAN). But I think we actually want to instrument the whole real access with those tools - would it be bad if we accessed some memory out-of-bounds, but that memory isn't actually used? I don't have a clear answer to that. Also, it might be useful to have an alignment check somewhere, because otherwise we end up with split atomic accesses (or whatever other bad thing the given arch does if that happens). Thanks, -- Marco > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o > #ensure exported functions have prototypes > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c > new file mode 100644 > index 0000000000000..508b55484c2b6 > --- /dev/null > +++ b/lib/cmpxchg-emu.c > @@ -0,0 +1,68 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > +/* > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > + * lacking direct support for these sizes. These are implemented in terms > + * of 4-byte cmpxchg operations. > + * > + * Copyright (C) 2024 Paul E. McKenney. > + */ > + > +#include <linux/types.h> > +#include <linux/export.h> > +#include <linux/instrumented.h> > +#include <linux/atomic.h> > +#include <asm-generic/rwonce.h> > + > +union u8_32 { > + u8 b[4]; > + u32 w; > +}; > + > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) > +{ > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); > + int i = ((uintptr_t)p) & 0x3; > + union u8_32 old32; > + union u8_32 new32; > + u32 ret; > + > + old32.w = READ_ONCE(*p32); > + do { > + if (old32.b[i] != old) > + return old32.b[i]; > + new32.w = old32.w; > + new32.b[i] = new; > + instrument_atomic_read_write(p, 1); > + ret = cmpxchg(p32, old32.w, new32.w); > + } while (ret != old32.w); > + return old; > +} > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); > + > +union u16_32 { > + u16 h[2]; > + u32 w; > +}; > + > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > +{ > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); > + int i = ((uintptr_t)p) & 0x1; > + union u16_32 old32; > + union u16_32 new32; > + u32 ret; > + > + old32.w = READ_ONCE(*p32); > + do { > + if (old32.h[i] != old) > + return old32.h[i]; > + new32.w = old32.w; > + new32.h[i] = new; > + instrument_atomic_read_write(p, 2); > + ret = cmpxchg(p32, old32.w, new32.w); > + } while (ret != old32.w); > + return old; > +} > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
For these rcu_torture_ops structure's objects defined by using static, if the value of the function pointer in its member is not set, the default value will be NULL, this commit therefore remove the pre-existing initialization of function pointers to NULL. Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> --- kernel/rcu/rcutorture.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 85ff8a32f75a..3f9c3766f52b 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -566,7 +566,6 @@ static struct rcu_torture_ops rcu_ops = { .call = call_rcu_hurry, .cb_barrier = rcu_barrier, .fqs = rcu_force_quiescent_state, - .stats = NULL, .gp_kthread_dbg = show_rcu_gp_kthreads, .check_boost_failed = rcu_check_boost_fail, .stall_dur = rcu_jiffies_till_stall_check, @@ -614,9 +613,6 @@ static struct rcu_torture_ops rcu_busted_ops = { .sync = synchronize_rcu_busted, .exp_sync = synchronize_rcu_busted, .call = call_rcu_busted, - .cb_barrier = NULL, - .fqs = NULL, - .stats = NULL, .irq_capable = 1, .name = "busted" }; @@ -847,8 +843,6 @@ static struct rcu_torture_ops trivial_ops = { .get_gp_seq = rcu_no_completed, .sync = synchronize_rcu_trivial, .exp_sync = synchronize_rcu_trivial, - .fqs = NULL, - .stats = NULL, .irq_capable = 1, .name = "trivial" }; @@ -892,8 +886,6 @@ static struct rcu_torture_ops tasks_ops = { .cb_barrier = rcu_barrier_tasks, .gp_kthread_dbg = show_rcu_tasks_classic_gp_kthread, .get_gp_data = rcu_tasks_get_gp_data, - .fqs = NULL, - .stats = NULL, .irq_capable = 1, .slow_gps = 1, .name = "tasks" @@ -934,8 +926,6 @@ static struct rcu_torture_ops tasks_rude_ops = { .gp_kthread_dbg = show_rcu_tasks_rude_gp_kthread, .get_gp_data = rcu_tasks_rude_get_gp_data, .cbflood_max = 50000, - .fqs = NULL, - .stats = NULL, .irq_capable = 1, .name = "tasks-rude" }; @@ -987,8 +977,6 @@ static struct rcu_torture_ops tasks_tracing_ops = { .gp_kthread_dbg = show_rcu_tasks_trace_gp_kthread, .get_gp_data = rcu_tasks_trace_get_gp_data, .cbflood_max = 50000, - .fqs = NULL, - .stats = NULL, .irq_capable = 1, .slow_gps = 1, .name = "tasks-tracing" -- 2.17.1
This commit make rcu-tasks related rcutorture test support rcu-tasks gp state printing when the writer stall occurs or the at the end of rcutorture test, and generate rcu_ops->get_gp_data() operation to simplify the acquisition of gp state for different types of rcutorture tests. Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> --- kernel/rcu/rcu.h | 20 ++++++++++---------- kernel/rcu/rcutorture.c | 26 ++++++++++++++++++-------- kernel/rcu/srcutree.c | 5 +---- kernel/rcu/tasks.h | 21 +++++++++++++++++++++ kernel/rcu/tree.c | 13 +++---------- 5 files changed, 53 insertions(+), 32 deletions(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 86fce206560e..38238e595a61 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -522,12 +522,18 @@ static inline void show_rcu_tasks_gp_kthreads(void) {} #ifdef CONFIG_TASKS_RCU struct task_struct *get_rcu_tasks_gp_kthread(void); +void rcu_tasks_get_gp_data(int *flags, unsigned long *gp_seq); #endif // # ifdef CONFIG_TASKS_RCU #ifdef CONFIG_TASKS_RUDE_RCU struct task_struct *get_rcu_tasks_rude_gp_kthread(void); +void rcu_tasks_rude_get_gp_data(int *flags, unsigned long *gp_seq); #endif // # ifdef CONFIG_TASKS_RUDE_RCU +#ifdef CONFIG_TASKS_TRACE_RCU +void rcu_tasks_trace_get_gp_data(int *flags, unsigned long *gp_seq); +#endif + #ifdef CONFIG_TASKS_RCU_GENERIC void tasks_cblist_init_generic(void); #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ @@ -557,8 +563,7 @@ static inline void rcu_set_jiffies_lazy_flush(unsigned long j) { } #endif #if defined(CONFIG_TREE_RCU) -void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, - unsigned long *gp_seq); +void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq); void do_trace_rcu_torture_read(const char *rcutorturename, struct rcu_head *rhp, unsigned long secs, @@ -566,8 +571,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename, unsigned long c); void rcu_gp_set_torture_wait(int duration); #else -static inline void rcutorture_get_gp_data(enum rcutorture_type test_type, - int *flags, unsigned long *gp_seq) +static inline void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq) { *flags = 0; *gp_seq = 0; @@ -587,20 +591,16 @@ static inline void rcu_gp_set_torture_wait(int duration) { } #ifdef CONFIG_TINY_SRCU -static inline void srcutorture_get_gp_data(enum rcutorture_type test_type, - struct srcu_struct *sp, int *flags, +static inline void srcutorture_get_gp_data(struct srcu_struct *sp, int *flags, unsigned long *gp_seq) { - if (test_type != SRCU_FLAVOR) - return; *flags = 0; *gp_seq = sp->srcu_idx; } #elif defined(CONFIG_TREE_SRCU) -void srcutorture_get_gp_data(enum rcutorture_type test_type, - struct srcu_struct *sp, int *flags, +void srcutorture_get_gp_data(struct srcu_struct *sp, int *flags, unsigned long *gp_seq); #endif diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 2f43d31fb7a5..85ff8a32f75a 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -381,6 +381,7 @@ struct rcu_torture_ops { void (*gp_kthread_dbg)(void); bool (*check_boost_failed)(unsigned long gp_state, int *cpup); int (*stall_dur)(void); + void (*get_gp_data)(int *flags, unsigned long *gp_seq); long cbflood_max; int irq_capable; int can_boost; @@ -569,6 +570,7 @@ static struct rcu_torture_ops rcu_ops = { .gp_kthread_dbg = show_rcu_gp_kthreads, .check_boost_failed = rcu_check_boost_fail, .stall_dur = rcu_jiffies_till_stall_check, + .get_gp_data = rcutorture_get_gp_data, .irq_capable = 1, .can_boost = IS_ENABLED(CONFIG_RCU_BOOST), .extendables = RCUTORTURE_MAX_EXTEND, @@ -628,6 +630,11 @@ static struct srcu_struct srcu_ctld; static struct srcu_struct *srcu_ctlp = &srcu_ctl; static struct rcu_torture_ops srcud_ops; +static void srcu_get_gp_data(int *flags, unsigned long *gp_seq) +{ + srcutorture_get_gp_data(srcu_ctlp, flags, gp_seq); +} + static int srcu_torture_read_lock(void) { if (cur_ops == &srcud_ops) @@ -736,6 +743,7 @@ static struct rcu_torture_ops srcu_ops = { .call = srcu_torture_call, .cb_barrier = srcu_torture_barrier, .stats = srcu_torture_stats, + .get_gp_data = srcu_get_gp_data, .cbflood_max = 50000, .irq_capable = 1, .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), @@ -774,6 +782,7 @@ static struct rcu_torture_ops srcud_ops = { .call = srcu_torture_call, .cb_barrier = srcu_torture_barrier, .stats = srcu_torture_stats, + .get_gp_data = srcu_get_gp_data, .cbflood_max = 50000, .irq_capable = 1, .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), @@ -882,6 +891,7 @@ static struct rcu_torture_ops tasks_ops = { .call = call_rcu_tasks, .cb_barrier = rcu_barrier_tasks, .gp_kthread_dbg = show_rcu_tasks_classic_gp_kthread, + .get_gp_data = rcu_tasks_get_gp_data, .fqs = NULL, .stats = NULL, .irq_capable = 1, @@ -922,6 +932,7 @@ static struct rcu_torture_ops tasks_rude_ops = { .call = call_rcu_tasks_rude, .cb_barrier = rcu_barrier_tasks_rude, .gp_kthread_dbg = show_rcu_tasks_rude_gp_kthread, + .get_gp_data = rcu_tasks_rude_get_gp_data, .cbflood_max = 50000, .fqs = NULL, .stats = NULL, @@ -974,6 +985,7 @@ static struct rcu_torture_ops tasks_tracing_ops = { .call = call_rcu_tasks_trace, .cb_barrier = rcu_barrier_tasks_trace, .gp_kthread_dbg = show_rcu_tasks_trace_gp_kthread, + .get_gp_data = rcu_tasks_trace_get_gp_data, .cbflood_max = 50000, .fqs = NULL, .stats = NULL, @@ -2264,10 +2276,8 @@ rcu_torture_stats_print(void) int __maybe_unused flags = 0; unsigned long __maybe_unused gp_seq = 0; - rcutorture_get_gp_data(cur_ops->ttype, - &flags, &gp_seq); - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, - &flags, &gp_seq); + if (cur_ops->get_gp_data) + cur_ops->get_gp_data(&flags, &gp_seq); wtp = READ_ONCE(writer_task); pr_alert("??? Writer stall state %s(%d) g%lu f%#x ->state %#x cpu %d\n", rcu_torture_writer_state_getname(), @@ -3390,8 +3400,8 @@ rcu_torture_cleanup(void) fakewriter_tasks = NULL; } - rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); + if (cur_ops->get_gp_data) + cur_ops->get_gp_data(&flags, &gp_seq); pr_alert("%s: End-test grace-period state: g%ld f%#x total-gps=%ld\n", cur_ops->name, (long)gp_seq, flags, rcutorture_seq_diff(gp_seq, start_gp_seq)); @@ -3762,8 +3772,8 @@ rcu_torture_init(void) nrealreaders = 1; } rcu_torture_print_module_parms(cur_ops, "Start of test"); - rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); + if (cur_ops->get_gp_data) + cur_ops->get_gp_data(&flags, &gp_seq); start_gp_seq = gp_seq; pr_alert("%s: Start-test grace-period state: g%ld f%#x\n", cur_ops->name, (long)gp_seq, flags); diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index d19326486edd..98f79ba01b0a 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -1828,12 +1828,9 @@ static void process_srcu(struct work_struct *work) srcu_reschedule(ssp, curdelay); } -void srcutorture_get_gp_data(enum rcutorture_type test_type, - struct srcu_struct *ssp, int *flags, +void srcutorture_get_gp_data(struct srcu_struct *ssp, int *flags, unsigned long *gp_seq) { - if (test_type != SRCU_FLAVOR) - return; *flags = 0; *gp_seq = rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq); } diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index e83adcdb49b5..e7ac9138a4fd 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -1182,6 +1182,13 @@ struct task_struct *get_rcu_tasks_gp_kthread(void) } EXPORT_SYMBOL_GPL(get_rcu_tasks_gp_kthread); +void rcu_tasks_get_gp_data(int *flags, unsigned long *gp_seq) +{ + *flags = 0; + *gp_seq = rcu_seq_current(&rcu_tasks.tasks_gp_seq); +} +EXPORT_SYMBOL_GPL(rcu_tasks_get_gp_data); + /* * Protect against tasklist scan blind spot while the task is exiting and * may be removed from the tasklist. Do this by adding the task to yet @@ -1361,6 +1368,13 @@ struct task_struct *get_rcu_tasks_rude_gp_kthread(void) } EXPORT_SYMBOL_GPL(get_rcu_tasks_rude_gp_kthread); +void rcu_tasks_rude_get_gp_data(int *flags, unsigned long *gp_seq) +{ + *flags = 0; + *gp_seq = rcu_seq_current(&rcu_tasks_rude.tasks_gp_seq); +} +EXPORT_SYMBOL_GPL(rcu_tasks_rude_get_gp_data); + #endif /* #ifdef CONFIG_TASKS_RUDE_RCU */ //////////////////////////////////////////////////////////////////////// @@ -2020,6 +2034,13 @@ struct task_struct *get_rcu_tasks_trace_gp_kthread(void) } EXPORT_SYMBOL_GPL(get_rcu_tasks_trace_gp_kthread); +void rcu_tasks_trace_get_gp_data(int *flags, unsigned long *gp_seq) +{ + *flags = 0; + *gp_seq = rcu_seq_current(&rcu_tasks_trace.tasks_gp_seq); +} +EXPORT_SYMBOL_GPL(rcu_tasks_trace_get_gp_data); + #else /* #ifdef CONFIG_TASKS_TRACE_RCU */ static void exit_tasks_rcu_finish_trace(struct task_struct *t) { } #endif /* #else #ifdef CONFIG_TASKS_TRACE_RCU */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9fbb5ab57c84..e229a12afe31 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -539,17 +539,10 @@ static struct rcu_node *rcu_get_root(void) /* * Send along grace-period-related data for rcutorture diagnostics. */ -void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, - unsigned long *gp_seq) +void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq) { - switch (test_type) { - case RCU_FLAVOR: - *flags = READ_ONCE(rcu_state.gp_flags); - *gp_seq = rcu_seq_current(&rcu_state.gp_seq); - break; - default: - break; - } + *flags = READ_ONCE(rcu_state.gp_flags); + *gp_seq = rcu_seq_current(&rcu_state.gp_seq); } EXPORT_SYMBOL_GPL(rcutorture_get_gp_data); -- 2.17.1
This commit is mainly to enable rcutorture testing to support print rcu-tasks related gp state, and remove redundant function pointer initialization in rcu_torture_ops structure's objects. Zqiang (2): rcutorture: Make rcutorture support print rcu-tasks gp state rcutorture: Removing redundant function pointer initialization kernel/rcu/rcu.h | 20 ++++++++++---------- kernel/rcu/rcutorture.c | 38 ++++++++++++++++++-------------------- kernel/rcu/srcutree.c | 5 +---- kernel/rcu/tasks.h | 21 +++++++++++++++++++++ kernel/rcu/tree.c | 13 +++---------- 5 files changed, 53 insertions(+), 44 deletions(-) -- 2.17.1
On Mon, Mar 18, 2024 at 10:19:58AM +0800, Z qiang wrote: > > > > On Sun, Mar 17, 2024 at 07:38:11PM +0800, Zqiang wrote: > > > This commit make rcu-tasks related rcutorture test support rcu-tasks > > > gp state printing when the writer stall occurs or the at the end of > > > rcutorture test, and generate rcu_ops->get_gp_data() operation to > > > simplify the acquisition of gp state for different types of rcutorture > > > tests. > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > Much better!!! > > > > A few questions below. One small change left, a question on testing, > > and the possibility of a few cleanup patches if you are interested. > > > > Thanx, Paul > > > > > --- > > > > > > [ 47.582683] rcu: Start-test grace-period state: g4313 f0x0 > > > [ 73.780444] rcu: End-test grace-period state: g15728 f0x0 total-gps=2854 > > > > > > [ 99.013921] srcu: Start-test grace-period state: g0 f0x0 > > > [ 126.596727] srcu: End-test grace-period state: g10292 f0x0 total-gps=10292 > > > > > > [ 175.664991] srcud: Start-test grace-period state: g0 f0x0 > > > [ 195.012774] srcud: End-test grace-period state: g7628 f0x0 total-gps=7628 > > > > > > [ 216.943521] tasks: Start-test grace-period state: g8 f0x0 > > > [ 234.245093] tasks: End-test grace-period state: g300 f0x0 total-gps=292 > > > > > > [ 267.139368] tasks-rude: Start-test grace-period state: g8 f0x0 > > > [ 296.132748] tasks-rude: End-test grace-period state: g684 f0x0 total-gps=676 > > > > > > [ 316.044241] tasks-tracing: Start-test grace-period state: g8 f0x0 > > > [ 342.020447] tasks-tracing: End-test grace-period state: g348 f0x0 total-gps=340 > > > > > > kernel/rcu/rcu.h | 20 ++++++++++---------- > > > kernel/rcu/rcutorture.c | 29 +++++++++++++++++++++-------- > > > kernel/rcu/srcutree.c | 5 +---- > > > kernel/rcu/tasks.h | 21 +++++++++++++++++++++ > > > kernel/rcu/tree.c | 13 +++---------- > > > 5 files changed, 56 insertions(+), 32 deletions(-) > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > index 86fce206560e..38238e595a61 100644 > > > --- a/kernel/rcu/rcu.h > > > +++ b/kernel/rcu/rcu.h > > > @@ -522,12 +522,18 @@ static inline void show_rcu_tasks_gp_kthreads(void) {} > > > > > > #ifdef CONFIG_TASKS_RCU > > > struct task_struct *get_rcu_tasks_gp_kthread(void); > > > +void rcu_tasks_get_gp_data(int *flags, unsigned long *gp_seq); > > > #endif // # ifdef CONFIG_TASKS_RCU > > > > > > #ifdef CONFIG_TASKS_RUDE_RCU > > > struct task_struct *get_rcu_tasks_rude_gp_kthread(void); > > > +void rcu_tasks_rude_get_gp_data(int *flags, unsigned long *gp_seq); > > > #endif // # ifdef CONFIG_TASKS_RUDE_RCU > > > > > > +#ifdef CONFIG_TASKS_TRACE_RCU > > > +void rcu_tasks_trace_get_gp_data(int *flags, unsigned long *gp_seq); > > > +#endif > > > + > > > > If you have not already done so, could you please test with the rcutorture > > scenarios, while artificially forcing that warning to trigger? > > I used the following command to force writer stall: > > insmod rcutorture.ko torture_type=rcu* fwd_progress=0 stat_interval=4 > stall_cpu_block=1 stall_cpu=40 stall_cpu_holdoff=10 > > tasks_trace: > [52392.689229] ??? Writer stall state RTWS_SYNC(21) g447101 f0x0 > ->state 0x2 cpu 2 > [52392.689235] rcu_tasks_trace: RTGS_WAIT_SCAN_HOLDOUTS(6) since 8 > g:447101 i:0/26562 kCuU l:1 N1 h:44/225560/436368 > > tasks_rude: > [52614.705062] ??? Writer stall state RTWS_SYNC(21) g261 f0x0 ->state 0x2 cpu 2 > [52614.705067] rcu_tasks_rude: RTGS_WAIT_GP(2) since 13807 g:261 > i:0/261 kCuU l:250 > > tasks: > [52751.857305] ??? Writer stall state RTWS_EXP_SYNC(4) g149 f0x0 > ->state 0x2 cpu 2 > [52751.858163] rcu_tasks: RTGS_WAIT_SCAN_HOLDOUTS(6) since 57 g:149 > i:0/0 kCuU l:250 > > rcu: > ??? Writer stall state RTWS_POLL_WAIT_EXP_FULL(20) g952633 f0x0 > ->state 0x402 cpu 2 > [52826.481118] task:rcu_torture_wri state:I stack:0 pid:8655 tgid:8655 > ppid:2 flags:0x00004000 > rcu: rcu_preempt: wait state: RCU_GP_WAIT_FQS(5) ->state: 0x402 > ->rt_priority 1 delta ->gp_start 6093 ->gp_activity 0 > ->gp_req_activity 6093 ->gp_wake_time 6093 ->gp_wake_seq 952629 > ->gp_s0 > [52826.481462] rcu: rcu_node 0:3 ->gp_seq 952633 ->gp_seq_needed > 952640 ->qsmask 0x0 b.EG ->n_boosts 2328 > [52826.481466] rcu: cpu 0 ->gp_seq_needed 952640 > [52826.481469] rcu: cpu 1 ->gp_seq_needed 952640 > [52826.481472] rcu: cpu 2 ->gp_seq_needed 952636 > [52826.481474] rcu: cpu 3 ->gp_seq_needed 952640 > [52826.481477] rcu: RCU callbacks invoked since boot: 127957840 > [52826.481481] rcu_tasks: RTGS_WAIT_CBS(11) since 30792 g:268 i:0/0 k.u. l:250 > [52826.481486] rcu_tasks_rude: RTGS_WAIT_CBS(11) since 164976 g:680 > i:0/677 k.u. l:250 > [52826.481492] rcu_tasks_trace: RTGS_WAIT_CBS(11) since 395536 > g:447148 i:0/26567 k.u. l:1 N0 h:44/225572/436420 > > srcu: > rcu: srcu-torture: Tree SRCU g3586 state 0 (SRCU_SIZE_SMALL) > per-CPU(idx=1): 0(16,6 C) 1(2,-12 .) 2(-11,7 .) 3(-6,-1 .) T(1,0) > [52940.785131] ??? Writer stall state RTWS_EXP_SYNC(4) g3586 f0x0 > ->state 0x2 cpu 0 > [52940.785136] task:rcu_torture_wri state:D stack:0 pid:8748 tgid:8748 > ppid:2 flags:0x00004000 Very good, looking forward to the version that avoids creating the redundant NULL initializations. Thanx, Paul > > For example, CONFIG_TASKS_TRACE_RCU can be tested using this command > > line: > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5m --configs "TRACE01 TRACE02" --trust-make > > > > You can add more scenarios to the --configs argument, and the rest of > > them can be found in tools/testing/selftests/rcutorture/configs/rcu. > > > > > #ifdef CONFIG_TASKS_RCU_GENERIC > > > void tasks_cblist_init_generic(void); > > > #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ > > > @@ -557,8 +563,7 @@ static inline void rcu_set_jiffies_lazy_flush(unsigned long j) { } > > > #endif > > > > > > #if defined(CONFIG_TREE_RCU) > > > -void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, > > > - unsigned long *gp_seq); > > > +void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq); > > > void do_trace_rcu_torture_read(const char *rcutorturename, > > > struct rcu_head *rhp, > > > unsigned long secs, > > > @@ -566,8 +571,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename, > > > unsigned long c); > > > void rcu_gp_set_torture_wait(int duration); > > > #else > > > -static inline void rcutorture_get_gp_data(enum rcutorture_type test_type, > > > - int *flags, unsigned long *gp_seq) > > > +static inline void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq) > > > { > > > *flags = 0; > > > *gp_seq = 0; > > > @@ -587,20 +591,16 @@ static inline void rcu_gp_set_torture_wait(int duration) { } > > > > > > #ifdef CONFIG_TINY_SRCU > > > > > > -static inline void srcutorture_get_gp_data(enum rcutorture_type test_type, > > > - struct srcu_struct *sp, int *flags, > > > +static inline void srcutorture_get_gp_data(struct srcu_struct *sp, int *flags, > > > unsigned long *gp_seq) > > > { > > > - if (test_type != SRCU_FLAVOR) > > > - return; > > > *flags = 0; > > > *gp_seq = sp->srcu_idx; > > > } > > > > > > #elif defined(CONFIG_TREE_SRCU) > > > > > > -void srcutorture_get_gp_data(enum rcutorture_type test_type, > > > - struct srcu_struct *sp, int *flags, > > > +void srcutorture_get_gp_data(struct srcu_struct *sp, int *flags, > > > unsigned long *gp_seq); > > > > > > #endif > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > > index 2f43d31fb7a5..2e8a037072eb 100644 > > > --- a/kernel/rcu/rcutorture.c > > > +++ b/kernel/rcu/rcutorture.c > > > @@ -381,6 +381,7 @@ struct rcu_torture_ops { > > > void (*gp_kthread_dbg)(void); > > > bool (*check_boost_failed)(unsigned long gp_state, int *cpup); > > > int (*stall_dur)(void); > > > + void (*get_gp_data)(int *flags, unsigned long *gp_seq); > > > long cbflood_max; > > > int irq_capable; > > > int can_boost; > > > @@ -569,6 +570,7 @@ static struct rcu_torture_ops rcu_ops = { > > > .gp_kthread_dbg = show_rcu_gp_kthreads, > > > .check_boost_failed = rcu_check_boost_fail, > > > .stall_dur = rcu_jiffies_till_stall_check, > > > + .get_gp_data = rcutorture_get_gp_data, > > > .irq_capable = 1, > > > .can_boost = IS_ENABLED(CONFIG_RCU_BOOST), > > > .extendables = RCUTORTURE_MAX_EXTEND, > > > @@ -612,6 +614,7 @@ static struct rcu_torture_ops rcu_busted_ops = { > > > .sync = synchronize_rcu_busted, > > > .exp_sync = synchronize_rcu_busted, > > > .call = call_rcu_busted, > > > + .get_gp_data = NULL, > > > > When this is NULL, please just leave it out. > > > > If you wish, you can also send separate patches removing the pre-existing > > initialization of function pointers to NULL, one per function pointer. > > I will record it in my todo document. > > Thanks > > > > > > > .cb_barrier = NULL, > > > .fqs = NULL, > > > .stats = NULL, > > > @@ -628,6 +631,11 @@ static struct srcu_struct srcu_ctld; > > > static struct srcu_struct *srcu_ctlp = &srcu_ctl; > > > static struct rcu_torture_ops srcud_ops; > > > > > > +static void srcu_get_gp_data(int *flags, unsigned long *gp_seq) > > > +{ > > > + srcutorture_get_gp_data(srcu_ctlp, flags, gp_seq); > > > +} > > > + > > > static int srcu_torture_read_lock(void) > > > { > > > if (cur_ops == &srcud_ops) > > > @@ -736,6 +744,7 @@ static struct rcu_torture_ops srcu_ops = { > > > .call = srcu_torture_call, > > > .cb_barrier = srcu_torture_barrier, > > > .stats = srcu_torture_stats, > > > + .get_gp_data = srcu_get_gp_data, > > > .cbflood_max = 50000, > > > .irq_capable = 1, > > > .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), > > > @@ -774,6 +783,7 @@ static struct rcu_torture_ops srcud_ops = { > > > .call = srcu_torture_call, > > > .cb_barrier = srcu_torture_barrier, > > > .stats = srcu_torture_stats, > > > + .get_gp_data = srcu_get_gp_data, > > > .cbflood_max = 50000, > > > .irq_capable = 1, > > > .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), > > > @@ -796,6 +806,7 @@ static struct rcu_torture_ops busted_srcud_ops = { > > > .call = srcu_torture_call, > > > .cb_barrier = srcu_torture_barrier, > > > .stats = srcu_torture_stats, > > > + .get_gp_data = NULL, > > > .irq_capable = 1, > > > .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), > > > .extendables = RCUTORTURE_MAX_EXTEND, > > > @@ -838,6 +849,7 @@ static struct rcu_torture_ops trivial_ops = { > > > .get_gp_seq = rcu_no_completed, > > > .sync = synchronize_rcu_trivial, > > > .exp_sync = synchronize_rcu_trivial, > > > + .get_gp_data = NULL, > > > .fqs = NULL, > > > .stats = NULL, > > > .irq_capable = 1, > > > @@ -882,6 +894,7 @@ static struct rcu_torture_ops tasks_ops = { > > > .call = call_rcu_tasks, > > > .cb_barrier = rcu_barrier_tasks, > > > .gp_kthread_dbg = show_rcu_tasks_classic_gp_kthread, > > > + .get_gp_data = rcu_tasks_get_gp_data, > > > .fqs = NULL, > > > .stats = NULL, > > > .irq_capable = 1, > > > @@ -922,6 +935,7 @@ static struct rcu_torture_ops tasks_rude_ops = { > > > .call = call_rcu_tasks_rude, > > > .cb_barrier = rcu_barrier_tasks_rude, > > > .gp_kthread_dbg = show_rcu_tasks_rude_gp_kthread, > > > + .get_gp_data = rcu_tasks_rude_get_gp_data, > > > .cbflood_max = 50000, > > > .fqs = NULL, > > > .stats = NULL, > > > @@ -974,6 +988,7 @@ static struct rcu_torture_ops tasks_tracing_ops = { > > > .call = call_rcu_tasks_trace, > > > .cb_barrier = rcu_barrier_tasks_trace, > > > .gp_kthread_dbg = show_rcu_tasks_trace_gp_kthread, > > > + .get_gp_data = rcu_tasks_trace_get_gp_data, > > > .cbflood_max = 50000, > > > .fqs = NULL, > > > .stats = NULL, > > > @@ -2264,10 +2279,8 @@ rcu_torture_stats_print(void) > > > int __maybe_unused flags = 0; > > > unsigned long __maybe_unused gp_seq = 0; > > > > > > - rcutorture_get_gp_data(cur_ops->ttype, > > > - &flags, &gp_seq); > > > - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, > > > - &flags, &gp_seq); > > > + if (cur_ops->get_gp_data) > > > + cur_ops->get_gp_data(&flags, &gp_seq); > > > wtp = READ_ONCE(writer_task); > > > pr_alert("??? Writer stall state %s(%d) g%lu f%#x ->state %#x cpu %d\n", > > > rcu_torture_writer_state_getname(), > > > @@ -3390,8 +3403,8 @@ rcu_torture_cleanup(void) > > > fakewriter_tasks = NULL; > > > } > > > > > > - rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); > > > - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); > > > + if (cur_ops->get_gp_data) > > > + cur_ops->get_gp_data(&flags, &gp_seq); > > > pr_alert("%s: End-test grace-period state: g%ld f%#x total-gps=%ld\n", > > > cur_ops->name, (long)gp_seq, flags, > > > rcutorture_seq_diff(gp_seq, start_gp_seq)); > > > @@ -3762,8 +3775,8 @@ rcu_torture_init(void) > > > nrealreaders = 1; > > > } > > > rcu_torture_print_module_parms(cur_ops, "Start of test"); > > > - rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); > > > - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); > > > + if (cur_ops->get_gp_data) > > > + cur_ops->get_gp_data(&flags, &gp_seq); > > > start_gp_seq = gp_seq; > > > pr_alert("%s: Start-test grace-period state: g%ld f%#x\n", > > > cur_ops->name, (long)gp_seq, flags); > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > > index d19326486edd..98f79ba01b0a 100644 > > > --- a/kernel/rcu/srcutree.c > > > +++ b/kernel/rcu/srcutree.c > > > @@ -1828,12 +1828,9 @@ static void process_srcu(struct work_struct *work) > > > srcu_reschedule(ssp, curdelay); > > > } > > > > > > -void srcutorture_get_gp_data(enum rcutorture_type test_type, > > > - struct srcu_struct *ssp, int *flags, > > > +void srcutorture_get_gp_data(struct srcu_struct *ssp, int *flags, > > > unsigned long *gp_seq) > > > { > > > - if (test_type != SRCU_FLAVOR) > > > - return; > > > *flags = 0; > > > *gp_seq = rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq); > > > } > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > > index e83adcdb49b5..e7ac9138a4fd 100644 > > > --- a/kernel/rcu/tasks.h > > > +++ b/kernel/rcu/tasks.h > > > @@ -1182,6 +1182,13 @@ struct task_struct *get_rcu_tasks_gp_kthread(void) > > > } > > > EXPORT_SYMBOL_GPL(get_rcu_tasks_gp_kthread); > > > > > > +void rcu_tasks_get_gp_data(int *flags, unsigned long *gp_seq) > > > +{ > > > + *flags = 0; > > > + *gp_seq = rcu_seq_current(&rcu_tasks.tasks_gp_seq); > > > +} > > > +EXPORT_SYMBOL_GPL(rcu_tasks_get_gp_data); > > > + > > > /* > > > * Protect against tasklist scan blind spot while the task is exiting and > > > * may be removed from the tasklist. Do this by adding the task to yet > > > @@ -1361,6 +1368,13 @@ struct task_struct *get_rcu_tasks_rude_gp_kthread(void) > > > } > > > EXPORT_SYMBOL_GPL(get_rcu_tasks_rude_gp_kthread); > > > > > > +void rcu_tasks_rude_get_gp_data(int *flags, unsigned long *gp_seq) > > > +{ > > > + *flags = 0; > > > + *gp_seq = rcu_seq_current(&rcu_tasks_rude.tasks_gp_seq); > > > +} > > > +EXPORT_SYMBOL_GPL(rcu_tasks_rude_get_gp_data); > > > + > > > #endif /* #ifdef CONFIG_TASKS_RUDE_RCU */ > > > > > > //////////////////////////////////////////////////////////////////////// > > > @@ -2020,6 +2034,13 @@ struct task_struct *get_rcu_tasks_trace_gp_kthread(void) > > > } > > > EXPORT_SYMBOL_GPL(get_rcu_tasks_trace_gp_kthread); > > > > > > +void rcu_tasks_trace_get_gp_data(int *flags, unsigned long *gp_seq) > > > +{ > > > + *flags = 0; > > > + *gp_seq = rcu_seq_current(&rcu_tasks_trace.tasks_gp_seq); > > > +} > > > +EXPORT_SYMBOL_GPL(rcu_tasks_trace_get_gp_data); > > > + > > > #else /* #ifdef CONFIG_TASKS_TRACE_RCU */ > > > static void exit_tasks_rcu_finish_trace(struct task_struct *t) { } > > > #endif /* #else #ifdef CONFIG_TASKS_TRACE_RCU */ > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 9fbb5ab57c84..e229a12afe31 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -539,17 +539,10 @@ static struct rcu_node *rcu_get_root(void) > > > /* > > > * Send along grace-period-related data for rcutorture diagnostics. > > > */ > > > -void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, > > > - unsigned long *gp_seq) > > > +void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq) > > > { > > > - switch (test_type) { > > > - case RCU_FLAVOR: > > > - *flags = READ_ONCE(rcu_state.gp_flags); > > > - *gp_seq = rcu_seq_current(&rcu_state.gp_seq); > > > - break; > > > - default: > > > - break; > > > - } > > > + *flags = READ_ONCE(rcu_state.gp_flags); > > > + *gp_seq = rcu_seq_current(&rcu_state.gp_seq); > > > } > > > EXPORT_SYMBOL_GPL(rcutorture_get_gp_data); > > > > > > -- > > > 2.17.1 > > >
On Mon, Mar 18, 2024 at 10:35:44AM +0800, Z qiang wrote: > > On Sun, Mar 17, 2024 at 10:31:22PM +0800, Z qiang wrote: > > > > On Wed, Mar 13, 2024 at 12:49:50PM +0800, Z qiang wrote: > > > > > > On Tue, Mar 12, 2024 at 07:35:24PM +0800, Zqiang wrote: > > > > > > > The rcu_tasks_percpu structure's->lazy_timer is queued only when > > > > > > > the rcu_tasks structure's->lazy_jiffies is not equal to zero in > > > > > > > call_rcu_tasks_generic(), if the lazy_timer callback is invoked, > > > > > > > that means the lazy_jiffes is not equal to zero, this commit > > > > > > > therefore remove lazy_jiffies check in call_rcu_tasks_generic_timer(). > > > > > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > > > --- > > > > > > > kernel/rcu/tasks.h | 2 +- > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > > > > > > index b1254cf3c210..439e0b9a2656 100644 > > > > > > > --- a/kernel/rcu/tasks.h > > > > > > > +++ b/kernel/rcu/tasks.h > > > > > > > @@ -299,7 +299,7 @@ static void call_rcu_tasks_generic_timer(struct timer_list *tlp) > > > > > > > > > > > > > > rtp = rtpcp->rtpp; > > > > > > > raw_spin_lock_irqsave_rcu_node(rtpcp, flags); > > > > > > > - if (!rcu_segcblist_empty(&rtpcp->cblist) && rtp->lazy_jiffies) { > > > > > > > + if (!rcu_segcblist_empty(&rtpcp->cblist)) { > > > > > > > > > > > > Good eyes! > > > > > > > > > > > > But did you test with something like a WARN_ON_ONCE(rtp->lazy_jiffies)? > > > > > > > > > > Hi, Paul > > > > > > > > > > + if (!rcu_segcblist_empty(&rtpcp->cblist) && > > > > > !WARN_ON_ONCE(!rtp->lazy_jiffies)) > > > > > > > > > > I've done tests like this: > > > > > > > > > > 1. runqemu nographic kvm slirp qemuparams="-smp 4 -m 2048M -drive > > > > > file=$PWD/share.img,if=virtio" > > > > > bootparams="rcupdate.rcu_tasks_trace_lazy_ms=0" -d > > > > > > > > > > 2. insmod torture.ko > > > > > insmod rcutorture.ko torture_type=tasks-tracing fwd_progress=4 > > > > > > > > > > 3. bpftrace -e 't:timer:timer_expire_entry /args->function == > > > > > kaddr("call_rcu_tasks_generic_timer")/ > > > > > { > > > > > printf("comm:%s,cpu:%d,stack:%s,func:%s\n", comm, cpu, kstack, > > > > > ksym(args->function)); }' > > > > > > > > > > The call_rcu_tasks_generic_timer() has never been executed. > > > > > > > > Very good! > > > > > > > > Then if we get a couple of acks or reviews from the others acknowledging > > > > that if they ever make ->lazy_jiffies be changeable at runtime, they > > > > will remember to do something to adjust this logic appropriately, I will > > > > take it. ;-) > > > > > > > > > > Hi, Paul > > > > > > Would it be better to modify it as follows? set needwake not > > > depend on lazy_jiffies, if ->lazy_jiffies be changed at runtime, > > > and set it to zero, will miss the chance to wake up gp kthreads. > > > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > > index e7ac9138a4fd..aea2b71af36c 100644 > > > --- a/kernel/rcu/tasks.h > > > +++ b/kernel/rcu/tasks.h > > > @@ -299,11 +299,12 @@ static void call_rcu_tasks_generic_timer(struct > > > timer_list *tlp) > > > > > > rtp = rtpcp->rtpp; > > > raw_spin_lock_irqsave_rcu_node(rtpcp, flags); > > > - if (!rcu_segcblist_empty(&rtpcp->cblist) && rtp->lazy_jiffies) { > > > + if (!rcu_segcblist_empty(&rtpcp->cblist)) { > > > if (!rtpcp->urgent_gp) > > > rtpcp->urgent_gp = 1; > > > needwake = true; > > > - mod_timer(&rtpcp->lazy_timer, rcu_tasks_lazy_time(rtp)); > > > + if (rtp->lazy_jiffies) > > > + mod_timer(&rtpcp->lazy_timer, rcu_tasks_lazy_time(rtp)); > > > } > > > raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags); > > > if (needwake) > > > > Interesting approach! > > > > But how does that avoid defeating laziness by doing the wakeup early? > > Hello, Paul > > Does this mean that if cblist is empty, we will miss the opportunity to > enqueue the timer again? If so, we can move mod_timer() outside > the if condition. > or I didn't understand your question? Never mind! I was getting confused, and forgetting that this code has already seen a timeout. Let's see what others think. Thanx, Paul > Thanks > Zqiang > > > > > > Thanx, Paul > > > > > Thanks > > > Zqiang > > > > > > > > > > Thanx, Paul > > > > > > > > > Thanks > > > > > Zqiang > > > > > > > > > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > > if (!rtpcp->urgent_gp) > > > > > > > rtpcp->urgent_gp = 1; > > > > > > > needwake = true; > > > > > > > -- > > > > > > > 2.17.1 > > > > > > >
> > On Sun, Mar 17, 2024 at 10:31:22PM +0800, Z qiang wrote: > > > On Wed, Mar 13, 2024 at 12:49:50PM +0800, Z qiang wrote: > > > > > On Tue, Mar 12, 2024 at 07:35:24PM +0800, Zqiang wrote: > > > > > > The rcu_tasks_percpu structure's->lazy_timer is queued only when > > > > > > the rcu_tasks structure's->lazy_jiffies is not equal to zero in > > > > > > call_rcu_tasks_generic(), if the lazy_timer callback is invoked, > > > > > > that means the lazy_jiffes is not equal to zero, this commit > > > > > > therefore remove lazy_jiffies check in call_rcu_tasks_generic_timer(). > > > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > > --- > > > > > > kernel/rcu/tasks.h | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > > > > > index b1254cf3c210..439e0b9a2656 100644 > > > > > > --- a/kernel/rcu/tasks.h > > > > > > +++ b/kernel/rcu/tasks.h > > > > > > @@ -299,7 +299,7 @@ static void call_rcu_tasks_generic_timer(struct timer_list *tlp) > > > > > > > > > > > > rtp = rtpcp->rtpp; > > > > > > raw_spin_lock_irqsave_rcu_node(rtpcp, flags); > > > > > > - if (!rcu_segcblist_empty(&rtpcp->cblist) && rtp->lazy_jiffies) { > > > > > > + if (!rcu_segcblist_empty(&rtpcp->cblist)) { > > > > > > > > > > Good eyes! > > > > > > > > > > But did you test with something like a WARN_ON_ONCE(rtp->lazy_jiffies)? > > > > > > > > Hi, Paul > > > > > > > > + if (!rcu_segcblist_empty(&rtpcp->cblist) && > > > > !WARN_ON_ONCE(!rtp->lazy_jiffies)) > > > > > > > > I've done tests like this: > > > > > > > > 1. runqemu nographic kvm slirp qemuparams="-smp 4 -m 2048M -drive > > > > file=$PWD/share.img,if=virtio" > > > > bootparams="rcupdate.rcu_tasks_trace_lazy_ms=0" -d > > > > > > > > 2. insmod torture.ko > > > > insmod rcutorture.ko torture_type=tasks-tracing fwd_progress=4 > > > > > > > > 3. bpftrace -e 't:timer:timer_expire_entry /args->function == > > > > kaddr("call_rcu_tasks_generic_timer")/ > > > > { > > > > printf("comm:%s,cpu:%d,stack:%s,func:%s\n", comm, cpu, kstack, > > > > ksym(args->function)); }' > > > > > > > > The call_rcu_tasks_generic_timer() has never been executed. > > > > > > Very good! > > > > > > Then if we get a couple of acks or reviews from the others acknowledging > > > that if they ever make ->lazy_jiffies be changeable at runtime, they > > > will remember to do something to adjust this logic appropriately, I will > > > take it. ;-) > > > > > > > Hi, Paul > > > > Would it be better to modify it as follows? set needwake not > > depend on lazy_jiffies, if ->lazy_jiffies be changed at runtime, > > and set it to zero, will miss the chance to wake up gp kthreads. > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > index e7ac9138a4fd..aea2b71af36c 100644 > > --- a/kernel/rcu/tasks.h > > +++ b/kernel/rcu/tasks.h > > @@ -299,11 +299,12 @@ static void call_rcu_tasks_generic_timer(struct > > timer_list *tlp) > > > > rtp = rtpcp->rtpp; > > raw_spin_lock_irqsave_rcu_node(rtpcp, flags); > > - if (!rcu_segcblist_empty(&rtpcp->cblist) && rtp->lazy_jiffies) { > > + if (!rcu_segcblist_empty(&rtpcp->cblist)) { > > if (!rtpcp->urgent_gp) > > rtpcp->urgent_gp = 1; > > needwake = true; > > - mod_timer(&rtpcp->lazy_timer, rcu_tasks_lazy_time(rtp)); > > + if (rtp->lazy_jiffies) > > + mod_timer(&rtpcp->lazy_timer, rcu_tasks_lazy_time(rtp)); > > } > > raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags); > > if (needwake) > > Interesting approach! > > But how does that avoid defeating laziness by doing the wakeup early? Hello, Paul Does this mean that if cblist is empty, we will miss the opportunity to enqueue the timer again? If so, we can move mod_timer() outside the if condition. or I didn't understand your question? Thanks Zqiang > > Thanx, Paul > > > Thanks > > Zqiang > > > > > > > Thanx, Paul > > > > > > > Thanks > > > > Zqiang > > > > > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > if (!rtpcp->urgent_gp) > > > > > > rtpcp->urgent_gp = 1; > > > > > > needwake = true; > > > > > > -- > > > > > > 2.17.1 > > > > > >
> > On Sun, Mar 17, 2024 at 07:38:11PM +0800, Zqiang wrote: > > This commit make rcu-tasks related rcutorture test support rcu-tasks > > gp state printing when the writer stall occurs or the at the end of > > rcutorture test, and generate rcu_ops->get_gp_data() operation to > > simplify the acquisition of gp state for different types of rcutorture > > tests. > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > Much better!!! > > A few questions below. One small change left, a question on testing, > and the possibility of a few cleanup patches if you are interested. > > Thanx, Paul > > > --- > > > > [ 47.582683] rcu: Start-test grace-period state: g4313 f0x0 > > [ 73.780444] rcu: End-test grace-period state: g15728 f0x0 total-gps=2854 > > > > [ 99.013921] srcu: Start-test grace-period state: g0 f0x0 > > [ 126.596727] srcu: End-test grace-period state: g10292 f0x0 total-gps=10292 > > > > [ 175.664991] srcud: Start-test grace-period state: g0 f0x0 > > [ 195.012774] srcud: End-test grace-period state: g7628 f0x0 total-gps=7628 > > > > [ 216.943521] tasks: Start-test grace-period state: g8 f0x0 > > [ 234.245093] tasks: End-test grace-period state: g300 f0x0 total-gps=292 > > > > [ 267.139368] tasks-rude: Start-test grace-period state: g8 f0x0 > > [ 296.132748] tasks-rude: End-test grace-period state: g684 f0x0 total-gps=676 > > > > [ 316.044241] tasks-tracing: Start-test grace-period state: g8 f0x0 > > [ 342.020447] tasks-tracing: End-test grace-period state: g348 f0x0 total-gps=340 > > > > kernel/rcu/rcu.h | 20 ++++++++++---------- > > kernel/rcu/rcutorture.c | 29 +++++++++++++++++++++-------- > > kernel/rcu/srcutree.c | 5 +---- > > kernel/rcu/tasks.h | 21 +++++++++++++++++++++ > > kernel/rcu/tree.c | 13 +++---------- > > 5 files changed, 56 insertions(+), 32 deletions(-) > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > index 86fce206560e..38238e595a61 100644 > > --- a/kernel/rcu/rcu.h > > +++ b/kernel/rcu/rcu.h > > @@ -522,12 +522,18 @@ static inline void show_rcu_tasks_gp_kthreads(void) {} > > > > #ifdef CONFIG_TASKS_RCU > > struct task_struct *get_rcu_tasks_gp_kthread(void); > > +void rcu_tasks_get_gp_data(int *flags, unsigned long *gp_seq); > > #endif // # ifdef CONFIG_TASKS_RCU > > > > #ifdef CONFIG_TASKS_RUDE_RCU > > struct task_struct *get_rcu_tasks_rude_gp_kthread(void); > > +void rcu_tasks_rude_get_gp_data(int *flags, unsigned long *gp_seq); > > #endif // # ifdef CONFIG_TASKS_RUDE_RCU > > > > +#ifdef CONFIG_TASKS_TRACE_RCU > > +void rcu_tasks_trace_get_gp_data(int *flags, unsigned long *gp_seq); > > +#endif > > + > > If you have not already done so, could you please test with the rcutorture > scenarios, while artificially forcing that warning to trigger? I used the following command to force writer stall: insmod rcutorture.ko torture_type=rcu* fwd_progress=0 stat_interval=4 stall_cpu_block=1 stall_cpu=40 stall_cpu_holdoff=10 tasks_trace: [52392.689229] ??? Writer stall state RTWS_SYNC(21) g447101 f0x0 ->state 0x2 cpu 2 [52392.689235] rcu_tasks_trace: RTGS_WAIT_SCAN_HOLDOUTS(6) since 8 g:447101 i:0/26562 kCuU l:1 N1 h:44/225560/436368 tasks_rude: [52614.705062] ??? Writer stall state RTWS_SYNC(21) g261 f0x0 ->state 0x2 cpu 2 [52614.705067] rcu_tasks_rude: RTGS_WAIT_GP(2) since 13807 g:261 i:0/261 kCuU l:250 tasks: [52751.857305] ??? Writer stall state RTWS_EXP_SYNC(4) g149 f0x0 ->state 0x2 cpu 2 [52751.858163] rcu_tasks: RTGS_WAIT_SCAN_HOLDOUTS(6) since 57 g:149 i:0/0 kCuU l:250 rcu: ??? Writer stall state RTWS_POLL_WAIT_EXP_FULL(20) g952633 f0x0 ->state 0x402 cpu 2 [52826.481118] task:rcu_torture_wri state:I stack:0 pid:8655 tgid:8655 ppid:2 flags:0x00004000 rcu: rcu_preempt: wait state: RCU_GP_WAIT_FQS(5) ->state: 0x402 ->rt_priority 1 delta ->gp_start 6093 ->gp_activity 0 ->gp_req_activity 6093 ->gp_wake_time 6093 ->gp_wake_seq 952629 ->gp_s0 [52826.481462] rcu: rcu_node 0:3 ->gp_seq 952633 ->gp_seq_needed 952640 ->qsmask 0x0 b.EG ->n_boosts 2328 [52826.481466] rcu: cpu 0 ->gp_seq_needed 952640 [52826.481469] rcu: cpu 1 ->gp_seq_needed 952640 [52826.481472] rcu: cpu 2 ->gp_seq_needed 952636 [52826.481474] rcu: cpu 3 ->gp_seq_needed 952640 [52826.481477] rcu: RCU callbacks invoked since boot: 127957840 [52826.481481] rcu_tasks: RTGS_WAIT_CBS(11) since 30792 g:268 i:0/0 k.u. l:250 [52826.481486] rcu_tasks_rude: RTGS_WAIT_CBS(11) since 164976 g:680 i:0/677 k.u. l:250 [52826.481492] rcu_tasks_trace: RTGS_WAIT_CBS(11) since 395536 g:447148 i:0/26567 k.u. l:1 N0 h:44/225572/436420 srcu: rcu: srcu-torture: Tree SRCU g3586 state 0 (SRCU_SIZE_SMALL) per-CPU(idx=1): 0(16,6 C) 1(2,-12 .) 2(-11,7 .) 3(-6,-1 .) T(1,0) [52940.785131] ??? Writer stall state RTWS_EXP_SYNC(4) g3586 f0x0 ->state 0x2 cpu 0 [52940.785136] task:rcu_torture_wri state:D stack:0 pid:8748 tgid:8748 ppid:2 flags:0x00004000 > > For example, CONFIG_TASKS_TRACE_RCU can be tested using this command > line: > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5m --configs "TRACE01 TRACE02" --trust-make > > You can add more scenarios to the --configs argument, and the rest of > them can be found in tools/testing/selftests/rcutorture/configs/rcu. > > > #ifdef CONFIG_TASKS_RCU_GENERIC > > void tasks_cblist_init_generic(void); > > #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ > > @@ -557,8 +563,7 @@ static inline void rcu_set_jiffies_lazy_flush(unsigned long j) { } > > #endif > > > > #if defined(CONFIG_TREE_RCU) > > -void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, > > - unsigned long *gp_seq); > > +void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq); > > void do_trace_rcu_torture_read(const char *rcutorturename, > > struct rcu_head *rhp, > > unsigned long secs, > > @@ -566,8 +571,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename, > > unsigned long c); > > void rcu_gp_set_torture_wait(int duration); > > #else > > -static inline void rcutorture_get_gp_data(enum rcutorture_type test_type, > > - int *flags, unsigned long *gp_seq) > > +static inline void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq) > > { > > *flags = 0; > > *gp_seq = 0; > > @@ -587,20 +591,16 @@ static inline void rcu_gp_set_torture_wait(int duration) { } > > > > #ifdef CONFIG_TINY_SRCU > > > > -static inline void srcutorture_get_gp_data(enum rcutorture_type test_type, > > - struct srcu_struct *sp, int *flags, > > +static inline void srcutorture_get_gp_data(struct srcu_struct *sp, int *flags, > > unsigned long *gp_seq) > > { > > - if (test_type != SRCU_FLAVOR) > > - return; > > *flags = 0; > > *gp_seq = sp->srcu_idx; > > } > > > > #elif defined(CONFIG_TREE_SRCU) > > > > -void srcutorture_get_gp_data(enum rcutorture_type test_type, > > - struct srcu_struct *sp, int *flags, > > +void srcutorture_get_gp_data(struct srcu_struct *sp, int *flags, > > unsigned long *gp_seq); > > > > #endif > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > index 2f43d31fb7a5..2e8a037072eb 100644 > > --- a/kernel/rcu/rcutorture.c > > +++ b/kernel/rcu/rcutorture.c > > @@ -381,6 +381,7 @@ struct rcu_torture_ops { > > void (*gp_kthread_dbg)(void); > > bool (*check_boost_failed)(unsigned long gp_state, int *cpup); > > int (*stall_dur)(void); > > + void (*get_gp_data)(int *flags, unsigned long *gp_seq); > > long cbflood_max; > > int irq_capable; > > int can_boost; > > @@ -569,6 +570,7 @@ static struct rcu_torture_ops rcu_ops = { > > .gp_kthread_dbg = show_rcu_gp_kthreads, > > .check_boost_failed = rcu_check_boost_fail, > > .stall_dur = rcu_jiffies_till_stall_check, > > + .get_gp_data = rcutorture_get_gp_data, > > .irq_capable = 1, > > .can_boost = IS_ENABLED(CONFIG_RCU_BOOST), > > .extendables = RCUTORTURE_MAX_EXTEND, > > @@ -612,6 +614,7 @@ static struct rcu_torture_ops rcu_busted_ops = { > > .sync = synchronize_rcu_busted, > > .exp_sync = synchronize_rcu_busted, > > .call = call_rcu_busted, > > + .get_gp_data = NULL, > > When this is NULL, please just leave it out. > > If you wish, you can also send separate patches removing the pre-existing > initialization of function pointers to NULL, one per function pointer. I will record it in my todo document. Thanks > > > .cb_barrier = NULL, > > .fqs = NULL, > > .stats = NULL, > > @@ -628,6 +631,11 @@ static struct srcu_struct srcu_ctld; > > static struct srcu_struct *srcu_ctlp = &srcu_ctl; > > static struct rcu_torture_ops srcud_ops; > > > > +static void srcu_get_gp_data(int *flags, unsigned long *gp_seq) > > +{ > > + srcutorture_get_gp_data(srcu_ctlp, flags, gp_seq); > > +} > > + > > static int srcu_torture_read_lock(void) > > { > > if (cur_ops == &srcud_ops) > > @@ -736,6 +744,7 @@ static struct rcu_torture_ops srcu_ops = { > > .call = srcu_torture_call, > > .cb_barrier = srcu_torture_barrier, > > .stats = srcu_torture_stats, > > + .get_gp_data = srcu_get_gp_data, > > .cbflood_max = 50000, > > .irq_capable = 1, > > .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), > > @@ -774,6 +783,7 @@ static struct rcu_torture_ops srcud_ops = { > > .call = srcu_torture_call, > > .cb_barrier = srcu_torture_barrier, > > .stats = srcu_torture_stats, > > + .get_gp_data = srcu_get_gp_data, > > .cbflood_max = 50000, > > .irq_capable = 1, > > .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), > > @@ -796,6 +806,7 @@ static struct rcu_torture_ops busted_srcud_ops = { > > .call = srcu_torture_call, > > .cb_barrier = srcu_torture_barrier, > > .stats = srcu_torture_stats, > > + .get_gp_data = NULL, > > .irq_capable = 1, > > .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), > > .extendables = RCUTORTURE_MAX_EXTEND, > > @@ -838,6 +849,7 @@ static struct rcu_torture_ops trivial_ops = { > > .get_gp_seq = rcu_no_completed, > > .sync = synchronize_rcu_trivial, > > .exp_sync = synchronize_rcu_trivial, > > + .get_gp_data = NULL, > > .fqs = NULL, > > .stats = NULL, > > .irq_capable = 1, > > @@ -882,6 +894,7 @@ static struct rcu_torture_ops tasks_ops = { > > .call = call_rcu_tasks, > > .cb_barrier = rcu_barrier_tasks, > > .gp_kthread_dbg = show_rcu_tasks_classic_gp_kthread, > > + .get_gp_data = rcu_tasks_get_gp_data, > > .fqs = NULL, > > .stats = NULL, > > .irq_capable = 1, > > @@ -922,6 +935,7 @@ static struct rcu_torture_ops tasks_rude_ops = { > > .call = call_rcu_tasks_rude, > > .cb_barrier = rcu_barrier_tasks_rude, > > .gp_kthread_dbg = show_rcu_tasks_rude_gp_kthread, > > + .get_gp_data = rcu_tasks_rude_get_gp_data, > > .cbflood_max = 50000, > > .fqs = NULL, > > .stats = NULL, > > @@ -974,6 +988,7 @@ static struct rcu_torture_ops tasks_tracing_ops = { > > .call = call_rcu_tasks_trace, > > .cb_barrier = rcu_barrier_tasks_trace, > > .gp_kthread_dbg = show_rcu_tasks_trace_gp_kthread, > > + .get_gp_data = rcu_tasks_trace_get_gp_data, > > .cbflood_max = 50000, > > .fqs = NULL, > > .stats = NULL, > > @@ -2264,10 +2279,8 @@ rcu_torture_stats_print(void) > > int __maybe_unused flags = 0; > > unsigned long __maybe_unused gp_seq = 0; > > > > - rcutorture_get_gp_data(cur_ops->ttype, > > - &flags, &gp_seq); > > - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, > > - &flags, &gp_seq); > > + if (cur_ops->get_gp_data) > > + cur_ops->get_gp_data(&flags, &gp_seq); > > wtp = READ_ONCE(writer_task); > > pr_alert("??? Writer stall state %s(%d) g%lu f%#x ->state %#x cpu %d\n", > > rcu_torture_writer_state_getname(), > > @@ -3390,8 +3403,8 @@ rcu_torture_cleanup(void) > > fakewriter_tasks = NULL; > > } > > > > - rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); > > - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); > > + if (cur_ops->get_gp_data) > > + cur_ops->get_gp_data(&flags, &gp_seq); > > pr_alert("%s: End-test grace-period state: g%ld f%#x total-gps=%ld\n", > > cur_ops->name, (long)gp_seq, flags, > > rcutorture_seq_diff(gp_seq, start_gp_seq)); > > @@ -3762,8 +3775,8 @@ rcu_torture_init(void) > > nrealreaders = 1; > > } > > rcu_torture_print_module_parms(cur_ops, "Start of test"); > > - rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); > > - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); > > + if (cur_ops->get_gp_data) > > + cur_ops->get_gp_data(&flags, &gp_seq); > > start_gp_seq = gp_seq; > > pr_alert("%s: Start-test grace-period state: g%ld f%#x\n", > > cur_ops->name, (long)gp_seq, flags); > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > index d19326486edd..98f79ba01b0a 100644 > > --- a/kernel/rcu/srcutree.c > > +++ b/kernel/rcu/srcutree.c > > @@ -1828,12 +1828,9 @@ static void process_srcu(struct work_struct *work) > > srcu_reschedule(ssp, curdelay); > > } > > > > -void srcutorture_get_gp_data(enum rcutorture_type test_type, > > - struct srcu_struct *ssp, int *flags, > > +void srcutorture_get_gp_data(struct srcu_struct *ssp, int *flags, > > unsigned long *gp_seq) > > { > > - if (test_type != SRCU_FLAVOR) > > - return; > > *flags = 0; > > *gp_seq = rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq); > > } > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > index e83adcdb49b5..e7ac9138a4fd 100644 > > --- a/kernel/rcu/tasks.h > > +++ b/kernel/rcu/tasks.h > > @@ -1182,6 +1182,13 @@ struct task_struct *get_rcu_tasks_gp_kthread(void) > > } > > EXPORT_SYMBOL_GPL(get_rcu_tasks_gp_kthread); > > > > +void rcu_tasks_get_gp_data(int *flags, unsigned long *gp_seq) > > +{ > > + *flags = 0; > > + *gp_seq = rcu_seq_current(&rcu_tasks.tasks_gp_seq); > > +} > > +EXPORT_SYMBOL_GPL(rcu_tasks_get_gp_data); > > + > > /* > > * Protect against tasklist scan blind spot while the task is exiting and > > * may be removed from the tasklist. Do this by adding the task to yet > > @@ -1361,6 +1368,13 @@ struct task_struct *get_rcu_tasks_rude_gp_kthread(void) > > } > > EXPORT_SYMBOL_GPL(get_rcu_tasks_rude_gp_kthread); > > > > +void rcu_tasks_rude_get_gp_data(int *flags, unsigned long *gp_seq) > > +{ > > + *flags = 0; > > + *gp_seq = rcu_seq_current(&rcu_tasks_rude.tasks_gp_seq); > > +} > > +EXPORT_SYMBOL_GPL(rcu_tasks_rude_get_gp_data); > > + > > #endif /* #ifdef CONFIG_TASKS_RUDE_RCU */ > > > > //////////////////////////////////////////////////////////////////////// > > @@ -2020,6 +2034,13 @@ struct task_struct *get_rcu_tasks_trace_gp_kthread(void) > > } > > EXPORT_SYMBOL_GPL(get_rcu_tasks_trace_gp_kthread); > > > > +void rcu_tasks_trace_get_gp_data(int *flags, unsigned long *gp_seq) > > +{ > > + *flags = 0; > > + *gp_seq = rcu_seq_current(&rcu_tasks_trace.tasks_gp_seq); > > +} > > +EXPORT_SYMBOL_GPL(rcu_tasks_trace_get_gp_data); > > + > > #else /* #ifdef CONFIG_TASKS_TRACE_RCU */ > > static void exit_tasks_rcu_finish_trace(struct task_struct *t) { } > > #endif /* #else #ifdef CONFIG_TASKS_TRACE_RCU */ > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 9fbb5ab57c84..e229a12afe31 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -539,17 +539,10 @@ static struct rcu_node *rcu_get_root(void) > > /* > > * Send along grace-period-related data for rcutorture diagnostics. > > */ > > -void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, > > - unsigned long *gp_seq) > > +void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq) > > { > > - switch (test_type) { > > - case RCU_FLAVOR: > > - *flags = READ_ONCE(rcu_state.gp_flags); > > - *gp_seq = rcu_seq_current(&rcu_state.gp_seq); > > - break; > > - default: > > - break; > > - } > > + *flags = READ_ONCE(rcu_state.gp_flags); > > + *gp_seq = rcu_seq_current(&rcu_state.gp_seq); > > } > > EXPORT_SYMBOL_GPL(rcutorture_get_gp_data); > > > > -- > > 2.17.1 > >
On Mon, Mar 18, 2024 at 01:47:43AM +0800, Alan Huang wrote: > Hi, > > I’m playing with the LKMM, then I saw the ISA2+pooncelock+pooncelock+pombonce. > > The original litmus is as follows: > ------------------------------------------------------ > P0(int *x, int *y, spinlock_t *mylock) > { > spin_lock(mylock); > WRITE_ONCE(*x, 1); > WRITE_ONCE(*y, 1); > spin_unlock(mylock); > } > > P1(int *y, int *z, spinlock_t *mylock) > { > int r0; > > spin_lock(mylock); > r0 = READ_ONCE(*y); > WRITE_ONCE(*z, 1); > spin_unlock(mylock); > } > > P2(int *x, int *z) > { > int r1; > int r2; > > r2 = READ_ONCE(*z); > smp_mb(); > r1 = READ_ONCE(*x); > } > > exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) > ------------------------------------------------------ > Of course, the result is Never. > > But when I delete P0’s spin_lock and P1’s spin_unlock: > ------------------------------------------------------- > P0(int *x, int *y, spinlock_t *mylock) > { > WRITE_ONCE(*x, 1); > WRITE_ONCE(*y, 1); > spin_unlock(mylock); > } > > P1(int *y, int *z, spinlock_t *mylock) > { > int r0; > > spin_lock(mylock); > r0 = READ_ONCE(*y); > WRITE_ONCE(*z, 1); > } > > P2(int *x, int *z) > { > int r1; > int r2; > > r2 = READ_ONCE(*z); > smp_mb(); > r1 = READ_ONCE(*x); > } > > exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) > ------------------------------------------------------ > Then herd told me the result is Sometimes. You mean like this? Test ISA2+pooncelock+pooncelock+pombonce Allowed States 8 1:r0=0; 2:r1=0; 2:r2=0; 1:r0=0; 2:r1=0; 2:r2=1; 1:r0=0; 2:r1=1; 2:r2=0; 1:r0=0; 2:r1=1; 2:r2=1; 1:r0=1; 2:r1=0; 2:r2=0; 1:r0=1; 2:r1=0; 2:r2=1; 1:r0=1; 2:r1=1; 2:r2=0; 1:r0=1; 2:r1=1; 2:r2=1; Ok Witnesses Positive: 1 Negative: 7 Flag unmatched-unlock Condition exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) Observation ISA2+pooncelock+pooncelock+pombonce Sometimes 1 7 Time ISA2+pooncelock+pooncelock+pombonce 0.01 Hash=f55b8515e48310f812aa676084f2cc88 > Is this expected? There are no locks held initially, so why can't the following sequence of events unfold: o P1() acquires the lock. o P0() does WRITE_ONCE(*y, 1). (Yes, out of order) o P1() does READ_ONCE(*y), and gets 1. o P1() does WRITE_ONCE(*z, 1). o P2() does READ_ONCE(*z) and gets 1. o P2() does smp_mb(), but there is nothing to order with. o P2() does READ_ONCE(*x) and gets 0. o P0() does WRITE_ONCE(*x, 1), but too late to affect P2(). o P0() releases the lock that is does not hold, which is why you see the "Flag unmatched-unlock" in the output. LKMM is complaining that the litmus test is not legitimate, and rightly so! Or am I missing your point? Thanx, Paul
On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > Because not all architectures support 1-byte cmpxchg? > > What prevents us from implementing it? > > Nothing that I know of, but I didn't want to put up with the KCSAN report > in the interim. And here is a lightly tested patch to emulate one-byte and two-byte cmpxchg() for architectures that do not support it. This is just the emulation, and would be followed up with patches to make the relevant architectures make use of it. The one-byte emulation has been lightly tested on x86. Thoughts? Thanx, Paul ------------------------------------------------------------------------ commit d72e54166b56d8b373676e1e92a426a07d53899a Author: Paul E. McKenney <paulmck@kernel.org> Date: Sun Mar 17 14:44:38 2024 -0700 lib: Add one-byte and two-byte cmpxchg() emulation functions Architectures are required to provide four-byte cmpxchg() and 64-bit architectures are additionally required to provide eight-byte cmpxchg(). However, there are cases where one-byte and two-byte cmpxchg() would be extremely useful. Therefore, provide cmpxchg_emu_u8() and cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms of four-byte cmpxchg(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: <linux-arch@vger.kernel.org> diff --git a/arch/Kconfig b/arch/Kconfig index 154f994547632..eef11e9918ec7 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT default 4 if FUNCTION_ALIGNMENT_4B default 0 +config ARCH_NEED_CMPXCHG_1_2_EMU + bool + endmenu diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h new file mode 100644 index 0000000000000..fee8171fa05eb --- /dev/null +++ b/include/linux/cmpxchg-emu.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Emulated 1-byte and 2-byte cmpxchg operations for architectures + * lacking direct support for these sizes. These are implemented in terms + * of 4-byte cmpxchg operations. + * + * Copyright (C) 2024 Paul E. McKenney. + */ + +#ifndef __LINUX_CMPXCHG_EMU_H +#define __LINUX_CMPXCHG_EMU_H + +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); + +#endif /* __LINUX_CMPXCHG_EMU_H */ diff --git a/lib/Makefile b/lib/Makefile index 6b09731d8e619..fecd7b8c09cbd 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o lib-$(CONFIG_GENERIC_BUG) += bug.o obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o #ensure exported functions have prototypes diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c new file mode 100644 index 0000000000000..508b55484c2b6 --- /dev/null +++ b/lib/cmpxchg-emu.c @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Emulated 1-byte and 2-byte cmpxchg operations for architectures + * lacking direct support for these sizes. These are implemented in terms + * of 4-byte cmpxchg operations. + * + * Copyright (C) 2024 Paul E. McKenney. + */ + +#include <linux/types.h> +#include <linux/export.h> +#include <linux/instrumented.h> +#include <linux/atomic.h> +#include <asm-generic/rwonce.h> + +union u8_32 { + u8 b[4]; + u32 w; +}; + +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) +{ + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); + int i = ((uintptr_t)p) & 0x3; + union u8_32 old32; + union u8_32 new32; + u32 ret; + + old32.w = READ_ONCE(*p32); + do { + if (old32.b[i] != old) + return old32.b[i]; + new32.w = old32.w; + new32.b[i] = new; + instrument_atomic_read_write(p, 1); + ret = cmpxchg(p32, old32.w, new32.w); + } while (ret != old32.w); + return old; +} +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); + +union u16_32 { + u16 h[2]; + u32 w; +}; + +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) +{ + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); + int i = ((uintptr_t)p) & 0x1; + union u16_32 old32; + union u16_32 new32; + u32 ret; + + old32.w = READ_ONCE(*p32); + do { + if (old32.h[i] != old) + return old32.h[i]; + new32.w = old32.w; + new32.h[i] = new; + instrument_atomic_read_write(p, 2); + ret = cmpxchg(p32, old32.w, new32.w); + } while (ret != old32.w); + return old; +} +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
On Sun, Mar 17, 2024 at 10:31:22PM +0800, Z qiang wrote: > > On Wed, Mar 13, 2024 at 12:49:50PM +0800, Z qiang wrote: > > > > On Tue, Mar 12, 2024 at 07:35:24PM +0800, Zqiang wrote: > > > > > The rcu_tasks_percpu structure's->lazy_timer is queued only when > > > > > the rcu_tasks structure's->lazy_jiffies is not equal to zero in > > > > > call_rcu_tasks_generic(), if the lazy_timer callback is invoked, > > > > > that means the lazy_jiffes is not equal to zero, this commit > > > > > therefore remove lazy_jiffies check in call_rcu_tasks_generic_timer(). > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > --- > > > > > kernel/rcu/tasks.h | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > > > > index b1254cf3c210..439e0b9a2656 100644 > > > > > --- a/kernel/rcu/tasks.h > > > > > +++ b/kernel/rcu/tasks.h > > > > > @@ -299,7 +299,7 @@ static void call_rcu_tasks_generic_timer(struct timer_list *tlp) > > > > > > > > > > rtp = rtpcp->rtpp; > > > > > raw_spin_lock_irqsave_rcu_node(rtpcp, flags); > > > > > - if (!rcu_segcblist_empty(&rtpcp->cblist) && rtp->lazy_jiffies) { > > > > > + if (!rcu_segcblist_empty(&rtpcp->cblist)) { > > > > > > > > Good eyes! > > > > > > > > But did you test with something like a WARN_ON_ONCE(rtp->lazy_jiffies)? > > > > > > Hi, Paul > > > > > > + if (!rcu_segcblist_empty(&rtpcp->cblist) && > > > !WARN_ON_ONCE(!rtp->lazy_jiffies)) > > > > > > I've done tests like this: > > > > > > 1. runqemu nographic kvm slirp qemuparams="-smp 4 -m 2048M -drive > > > file=$PWD/share.img,if=virtio" > > > bootparams="rcupdate.rcu_tasks_trace_lazy_ms=0" -d > > > > > > 2. insmod torture.ko > > > insmod rcutorture.ko torture_type=tasks-tracing fwd_progress=4 > > > > > > 3. bpftrace -e 't:timer:timer_expire_entry /args->function == > > > kaddr("call_rcu_tasks_generic_timer")/ > > > { > > > printf("comm:%s,cpu:%d,stack:%s,func:%s\n", comm, cpu, kstack, > > > ksym(args->function)); }' > > > > > > The call_rcu_tasks_generic_timer() has never been executed. > > > > Very good! > > > > Then if we get a couple of acks or reviews from the others acknowledging > > that if they ever make ->lazy_jiffies be changeable at runtime, they > > will remember to do something to adjust this logic appropriately, I will > > take it. ;-) > > > > Hi, Paul > > Would it be better to modify it as follows? set needwake not > depend on lazy_jiffies, if ->lazy_jiffies be changed at runtime, > and set it to zero, will miss the chance to wake up gp kthreads. > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > index e7ac9138a4fd..aea2b71af36c 100644 > --- a/kernel/rcu/tasks.h > +++ b/kernel/rcu/tasks.h > @@ -299,11 +299,12 @@ static void call_rcu_tasks_generic_timer(struct > timer_list *tlp) > > rtp = rtpcp->rtpp; > raw_spin_lock_irqsave_rcu_node(rtpcp, flags); > - if (!rcu_segcblist_empty(&rtpcp->cblist) && rtp->lazy_jiffies) { > + if (!rcu_segcblist_empty(&rtpcp->cblist)) { > if (!rtpcp->urgent_gp) > rtpcp->urgent_gp = 1; > needwake = true; > - mod_timer(&rtpcp->lazy_timer, rcu_tasks_lazy_time(rtp)); > + if (rtp->lazy_jiffies) > + mod_timer(&rtpcp->lazy_timer, rcu_tasks_lazy_time(rtp)); > } > raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags); > if (needwake) Interesting approach! But how does that avoid defeating laziness by doing the wakeup early? Thanx, Paul > Thanks > Zqiang > > > > Thanx, Paul > > > > > Thanks > > > Zqiang > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > if (!rtpcp->urgent_gp) > > > > > rtpcp->urgent_gp = 1; > > > > > needwake = true; > > > > > -- > > > > > 2.17.1 > > > > >
On Sun, Mar 17, 2024 at 07:38:11PM +0800, Zqiang wrote: > This commit make rcu-tasks related rcutorture test support rcu-tasks > gp state printing when the writer stall occurs or the at the end of > rcutorture test, and generate rcu_ops->get_gp_data() operation to > simplify the acquisition of gp state for different types of rcutorture > tests. > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Much better!!! A few questions below. One small change left, a question on testing, and the possibility of a few cleanup patches if you are interested. Thanx, Paul > --- > > [ 47.582683] rcu: Start-test grace-period state: g4313 f0x0 > [ 73.780444] rcu: End-test grace-period state: g15728 f0x0 total-gps=2854 > > [ 99.013921] srcu: Start-test grace-period state: g0 f0x0 > [ 126.596727] srcu: End-test grace-period state: g10292 f0x0 total-gps=10292 > > [ 175.664991] srcud: Start-test grace-period state: g0 f0x0 > [ 195.012774] srcud: End-test grace-period state: g7628 f0x0 total-gps=7628 > > [ 216.943521] tasks: Start-test grace-period state: g8 f0x0 > [ 234.245093] tasks: End-test grace-period state: g300 f0x0 total-gps=292 > > [ 267.139368] tasks-rude: Start-test grace-period state: g8 f0x0 > [ 296.132748] tasks-rude: End-test grace-period state: g684 f0x0 total-gps=676 > > [ 316.044241] tasks-tracing: Start-test grace-period state: g8 f0x0 > [ 342.020447] tasks-tracing: End-test grace-period state: g348 f0x0 total-gps=340 > > kernel/rcu/rcu.h | 20 ++++++++++---------- > kernel/rcu/rcutorture.c | 29 +++++++++++++++++++++-------- > kernel/rcu/srcutree.c | 5 +---- > kernel/rcu/tasks.h | 21 +++++++++++++++++++++ > kernel/rcu/tree.c | 13 +++---------- > 5 files changed, 56 insertions(+), 32 deletions(-) > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > index 86fce206560e..38238e595a61 100644 > --- a/kernel/rcu/rcu.h > +++ b/kernel/rcu/rcu.h > @@ -522,12 +522,18 @@ static inline void show_rcu_tasks_gp_kthreads(void) {} > > #ifdef CONFIG_TASKS_RCU > struct task_struct *get_rcu_tasks_gp_kthread(void); > +void rcu_tasks_get_gp_data(int *flags, unsigned long *gp_seq); > #endif // # ifdef CONFIG_TASKS_RCU > > #ifdef CONFIG_TASKS_RUDE_RCU > struct task_struct *get_rcu_tasks_rude_gp_kthread(void); > +void rcu_tasks_rude_get_gp_data(int *flags, unsigned long *gp_seq); > #endif // # ifdef CONFIG_TASKS_RUDE_RCU > > +#ifdef CONFIG_TASKS_TRACE_RCU > +void rcu_tasks_trace_get_gp_data(int *flags, unsigned long *gp_seq); > +#endif > + If you have not already done so, could you please test with the rcutorture scenarios, while artificially forcing that warning to trigger? For example, CONFIG_TASKS_TRACE_RCU can be tested using this command line: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5m --configs "TRACE01 TRACE02" --trust-make You can add more scenarios to the --configs argument, and the rest of them can be found in tools/testing/selftests/rcutorture/configs/rcu. > #ifdef CONFIG_TASKS_RCU_GENERIC > void tasks_cblist_init_generic(void); > #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ > @@ -557,8 +563,7 @@ static inline void rcu_set_jiffies_lazy_flush(unsigned long j) { } > #endif > > #if defined(CONFIG_TREE_RCU) > -void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, > - unsigned long *gp_seq); > +void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq); > void do_trace_rcu_torture_read(const char *rcutorturename, > struct rcu_head *rhp, > unsigned long secs, > @@ -566,8 +571,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename, > unsigned long c); > void rcu_gp_set_torture_wait(int duration); > #else > -static inline void rcutorture_get_gp_data(enum rcutorture_type test_type, > - int *flags, unsigned long *gp_seq) > +static inline void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq) > { > *flags = 0; > *gp_seq = 0; > @@ -587,20 +591,16 @@ static inline void rcu_gp_set_torture_wait(int duration) { } > > #ifdef CONFIG_TINY_SRCU > > -static inline void srcutorture_get_gp_data(enum rcutorture_type test_type, > - struct srcu_struct *sp, int *flags, > +static inline void srcutorture_get_gp_data(struct srcu_struct *sp, int *flags, > unsigned long *gp_seq) > { > - if (test_type != SRCU_FLAVOR) > - return; > *flags = 0; > *gp_seq = sp->srcu_idx; > } > > #elif defined(CONFIG_TREE_SRCU) > > -void srcutorture_get_gp_data(enum rcutorture_type test_type, > - struct srcu_struct *sp, int *flags, > +void srcutorture_get_gp_data(struct srcu_struct *sp, int *flags, > unsigned long *gp_seq); > > #endif > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > index 2f43d31fb7a5..2e8a037072eb 100644 > --- a/kernel/rcu/rcutorture.c > +++ b/kernel/rcu/rcutorture.c > @@ -381,6 +381,7 @@ struct rcu_torture_ops { > void (*gp_kthread_dbg)(void); > bool (*check_boost_failed)(unsigned long gp_state, int *cpup); > int (*stall_dur)(void); > + void (*get_gp_data)(int *flags, unsigned long *gp_seq); > long cbflood_max; > int irq_capable; > int can_boost; > @@ -569,6 +570,7 @@ static struct rcu_torture_ops rcu_ops = { > .gp_kthread_dbg = show_rcu_gp_kthreads, > .check_boost_failed = rcu_check_boost_fail, > .stall_dur = rcu_jiffies_till_stall_check, > + .get_gp_data = rcutorture_get_gp_data, > .irq_capable = 1, > .can_boost = IS_ENABLED(CONFIG_RCU_BOOST), > .extendables = RCUTORTURE_MAX_EXTEND, > @@ -612,6 +614,7 @@ static struct rcu_torture_ops rcu_busted_ops = { > .sync = synchronize_rcu_busted, > .exp_sync = synchronize_rcu_busted, > .call = call_rcu_busted, > + .get_gp_data = NULL, When this is NULL, please just leave it out. If you wish, you can also send separate patches removing the pre-existing initialization of function pointers to NULL, one per function pointer. > .cb_barrier = NULL, > .fqs = NULL, > .stats = NULL, > @@ -628,6 +631,11 @@ static struct srcu_struct srcu_ctld; > static struct srcu_struct *srcu_ctlp = &srcu_ctl; > static struct rcu_torture_ops srcud_ops; > > +static void srcu_get_gp_data(int *flags, unsigned long *gp_seq) > +{ > + srcutorture_get_gp_data(srcu_ctlp, flags, gp_seq); > +} > + > static int srcu_torture_read_lock(void) > { > if (cur_ops == &srcud_ops) > @@ -736,6 +744,7 @@ static struct rcu_torture_ops srcu_ops = { > .call = srcu_torture_call, > .cb_barrier = srcu_torture_barrier, > .stats = srcu_torture_stats, > + .get_gp_data = srcu_get_gp_data, > .cbflood_max = 50000, > .irq_capable = 1, > .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), > @@ -774,6 +783,7 @@ static struct rcu_torture_ops srcud_ops = { > .call = srcu_torture_call, > .cb_barrier = srcu_torture_barrier, > .stats = srcu_torture_stats, > + .get_gp_data = srcu_get_gp_data, > .cbflood_max = 50000, > .irq_capable = 1, > .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), > @@ -796,6 +806,7 @@ static struct rcu_torture_ops busted_srcud_ops = { > .call = srcu_torture_call, > .cb_barrier = srcu_torture_barrier, > .stats = srcu_torture_stats, > + .get_gp_data = NULL, > .irq_capable = 1, > .no_pi_lock = IS_ENABLED(CONFIG_TINY_SRCU), > .extendables = RCUTORTURE_MAX_EXTEND, > @@ -838,6 +849,7 @@ static struct rcu_torture_ops trivial_ops = { > .get_gp_seq = rcu_no_completed, > .sync = synchronize_rcu_trivial, > .exp_sync = synchronize_rcu_trivial, > + .get_gp_data = NULL, > .fqs = NULL, > .stats = NULL, > .irq_capable = 1, > @@ -882,6 +894,7 @@ static struct rcu_torture_ops tasks_ops = { > .call = call_rcu_tasks, > .cb_barrier = rcu_barrier_tasks, > .gp_kthread_dbg = show_rcu_tasks_classic_gp_kthread, > + .get_gp_data = rcu_tasks_get_gp_data, > .fqs = NULL, > .stats = NULL, > .irq_capable = 1, > @@ -922,6 +935,7 @@ static struct rcu_torture_ops tasks_rude_ops = { > .call = call_rcu_tasks_rude, > .cb_barrier = rcu_barrier_tasks_rude, > .gp_kthread_dbg = show_rcu_tasks_rude_gp_kthread, > + .get_gp_data = rcu_tasks_rude_get_gp_data, > .cbflood_max = 50000, > .fqs = NULL, > .stats = NULL, > @@ -974,6 +988,7 @@ static struct rcu_torture_ops tasks_tracing_ops = { > .call = call_rcu_tasks_trace, > .cb_barrier = rcu_barrier_tasks_trace, > .gp_kthread_dbg = show_rcu_tasks_trace_gp_kthread, > + .get_gp_data = rcu_tasks_trace_get_gp_data, > .cbflood_max = 50000, > .fqs = NULL, > .stats = NULL, > @@ -2264,10 +2279,8 @@ rcu_torture_stats_print(void) > int __maybe_unused flags = 0; > unsigned long __maybe_unused gp_seq = 0; > > - rcutorture_get_gp_data(cur_ops->ttype, > - &flags, &gp_seq); > - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, > - &flags, &gp_seq); > + if (cur_ops->get_gp_data) > + cur_ops->get_gp_data(&flags, &gp_seq); > wtp = READ_ONCE(writer_task); > pr_alert("??? Writer stall state %s(%d) g%lu f%#x ->state %#x cpu %d\n", > rcu_torture_writer_state_getname(), > @@ -3390,8 +3403,8 @@ rcu_torture_cleanup(void) > fakewriter_tasks = NULL; > } > > - rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); > - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); > + if (cur_ops->get_gp_data) > + cur_ops->get_gp_data(&flags, &gp_seq); > pr_alert("%s: End-test grace-period state: g%ld f%#x total-gps=%ld\n", > cur_ops->name, (long)gp_seq, flags, > rcutorture_seq_diff(gp_seq, start_gp_seq)); > @@ -3762,8 +3775,8 @@ rcu_torture_init(void) > nrealreaders = 1; > } > rcu_torture_print_module_parms(cur_ops, "Start of test"); > - rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); > - srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); > + if (cur_ops->get_gp_data) > + cur_ops->get_gp_data(&flags, &gp_seq); > start_gp_seq = gp_seq; > pr_alert("%s: Start-test grace-period state: g%ld f%#x\n", > cur_ops->name, (long)gp_seq, flags); > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index d19326486edd..98f79ba01b0a 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -1828,12 +1828,9 @@ static void process_srcu(struct work_struct *work) > srcu_reschedule(ssp, curdelay); > } > > -void srcutorture_get_gp_data(enum rcutorture_type test_type, > - struct srcu_struct *ssp, int *flags, > +void srcutorture_get_gp_data(struct srcu_struct *ssp, int *flags, > unsigned long *gp_seq) > { > - if (test_type != SRCU_FLAVOR) > - return; > *flags = 0; > *gp_seq = rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq); > } > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > index e83adcdb49b5..e7ac9138a4fd 100644 > --- a/kernel/rcu/tasks.h > +++ b/kernel/rcu/tasks.h > @@ -1182,6 +1182,13 @@ struct task_struct *get_rcu_tasks_gp_kthread(void) > } > EXPORT_SYMBOL_GPL(get_rcu_tasks_gp_kthread); > > +void rcu_tasks_get_gp_data(int *flags, unsigned long *gp_seq) > +{ > + *flags = 0; > + *gp_seq = rcu_seq_current(&rcu_tasks.tasks_gp_seq); > +} > +EXPORT_SYMBOL_GPL(rcu_tasks_get_gp_data); > + > /* > * Protect against tasklist scan blind spot while the task is exiting and > * may be removed from the tasklist. Do this by adding the task to yet > @@ -1361,6 +1368,13 @@ struct task_struct *get_rcu_tasks_rude_gp_kthread(void) > } > EXPORT_SYMBOL_GPL(get_rcu_tasks_rude_gp_kthread); > > +void rcu_tasks_rude_get_gp_data(int *flags, unsigned long *gp_seq) > +{ > + *flags = 0; > + *gp_seq = rcu_seq_current(&rcu_tasks_rude.tasks_gp_seq); > +} > +EXPORT_SYMBOL_GPL(rcu_tasks_rude_get_gp_data); > + > #endif /* #ifdef CONFIG_TASKS_RUDE_RCU */ > > //////////////////////////////////////////////////////////////////////// > @@ -2020,6 +2034,13 @@ struct task_struct *get_rcu_tasks_trace_gp_kthread(void) > } > EXPORT_SYMBOL_GPL(get_rcu_tasks_trace_gp_kthread); > > +void rcu_tasks_trace_get_gp_data(int *flags, unsigned long *gp_seq) > +{ > + *flags = 0; > + *gp_seq = rcu_seq_current(&rcu_tasks_trace.tasks_gp_seq); > +} > +EXPORT_SYMBOL_GPL(rcu_tasks_trace_get_gp_data); > + > #else /* #ifdef CONFIG_TASKS_TRACE_RCU */ > static void exit_tasks_rcu_finish_trace(struct task_struct *t) { } > #endif /* #else #ifdef CONFIG_TASKS_TRACE_RCU */ > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 9fbb5ab57c84..e229a12afe31 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -539,17 +539,10 @@ static struct rcu_node *rcu_get_root(void) > /* > * Send along grace-period-related data for rcutorture diagnostics. > */ > -void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, > - unsigned long *gp_seq) > +void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq) > { > - switch (test_type) { > - case RCU_FLAVOR: > - *flags = READ_ONCE(rcu_state.gp_flags); > - *gp_seq = rcu_seq_current(&rcu_state.gp_seq); > - break; > - default: > - break; > - } > + *flags = READ_ONCE(rcu_state.gp_flags); > + *gp_seq = rcu_seq_current(&rcu_state.gp_seq); > } > EXPORT_SYMBOL_GPL(rcutorture_get_gp_data); > > -- > 2.17.1 >
Hi, I’m playing with the LKMM, then I saw the ISA2+pooncelock+pooncelock+pombonce. The original litmus is as follows: ------------------------------------------------------ P0(int *x, int *y, spinlock_t *mylock) { spin_lock(mylock); WRITE_ONCE(*x, 1); WRITE_ONCE(*y, 1); spin_unlock(mylock); } P1(int *y, int *z, spinlock_t *mylock) { int r0; spin_lock(mylock); r0 = READ_ONCE(*y); WRITE_ONCE(*z, 1); spin_unlock(mylock); } P2(int *x, int *z) { int r1; int r2; r2 = READ_ONCE(*z); smp_mb(); r1 = READ_ONCE(*x); } exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) ------------------------------------------------------ Of course, the result is Never. But when I delete P0’s spin_lock and P1’s spin_unlock: ------------------------------------------------------- P0(int *x, int *y, spinlock_t *mylock) { WRITE_ONCE(*x, 1); WRITE_ONCE(*y, 1); spin_unlock(mylock); } P1(int *y, int *z, spinlock_t *mylock) { int r0; spin_lock(mylock); r0 = READ_ONCE(*y); WRITE_ONCE(*z, 1); } P2(int *x, int *z) { int r1; int r2; r2 = READ_ONCE(*z); smp_mb(); r1 = READ_ONCE(*x); } exists (1:r0=1 /\ 2:r2=1 /\ 2:r1=0) ------------------------------------------------------ Then herd told me the result is Sometimes. Is this expected?
> > On Wed, Mar 13, 2024 at 12:49:50PM +0800, Z qiang wrote: > > > > > > On Tue, Mar 12, 2024 at 07:35:24PM +0800, Zqiang wrote: > > > > The rcu_tasks_percpu structure's->lazy_timer is queued only when > > > > the rcu_tasks structure's->lazy_jiffies is not equal to zero in > > > > call_rcu_tasks_generic(), if the lazy_timer callback is invoked, > > > > that means the lazy_jiffes is not equal to zero, this commit > > > > therefore remove lazy_jiffies check in call_rcu_tasks_generic_timer(). > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > --- > > > > kernel/rcu/tasks.h | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > > > index b1254cf3c210..439e0b9a2656 100644 > > > > --- a/kernel/rcu/tasks.h > > > > +++ b/kernel/rcu/tasks.h > > > > @@ -299,7 +299,7 @@ static void call_rcu_tasks_generic_timer(struct timer_list *tlp) > > > > > > > > rtp = rtpcp->rtpp; > > > > raw_spin_lock_irqsave_rcu_node(rtpcp, flags); > > > > - if (!rcu_segcblist_empty(&rtpcp->cblist) && rtp->lazy_jiffies) { > > > > + if (!rcu_segcblist_empty(&rtpcp->cblist)) { > > > > > > Good eyes! > > > > > > But did you test with something like a WARN_ON_ONCE(rtp->lazy_jiffies)? > > > > Hi, Paul > > > > + if (!rcu_segcblist_empty(&rtpcp->cblist) && > > !WARN_ON_ONCE(!rtp->lazy_jiffies)) > > > > I've done tests like this: > > > > 1. runqemu nographic kvm slirp qemuparams="-smp 4 -m 2048M -drive > > file=$PWD/share.img,if=virtio" > > bootparams="rcupdate.rcu_tasks_trace_lazy_ms=0" -d > > > > 2. insmod torture.ko > > insmod rcutorture.ko torture_type=tasks-tracing fwd_progress=4 > > > > 3. bpftrace -e 't:timer:timer_expire_entry /args->function == > > kaddr("call_rcu_tasks_generic_timer")/ > > { > > printf("comm:%s,cpu:%d,stack:%s,func:%s\n", comm, cpu, kstack, > > ksym(args->function)); }' > > > > The call_rcu_tasks_generic_timer() has never been executed. > > Very good! > > Then if we get a couple of acks or reviews from the others acknowledging > that if they ever make ->lazy_jiffies be changeable at runtime, they > will remember to do something to adjust this logic appropriately, I will > take it. ;-) > Hi, Paul Would it be better to modify it as follows? set needwake not depend on lazy_jiffies, if ->lazy_jiffies be changed at runtime, and set it to zero, will miss the chance to wake up gp kthreads. diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index e7ac9138a4fd..aea2b71af36c 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -299,11 +299,12 @@ static void call_rcu_tasks_generic_timer(struct timer_list *tlp) rtp = rtpcp->rtpp; raw_spin_lock_irqsave_rcu_node(rtpcp, flags); - if (!rcu_segcblist_empty(&rtpcp->cblist) && rtp->lazy_jiffies) { + if (!rcu_segcblist_empty(&rtpcp->cblist)) { if (!rtpcp->urgent_gp) rtpcp->urgent_gp = 1; needwake = true; - mod_timer(&rtpcp->lazy_timer, rcu_tasks_lazy_time(rtp)); + if (rtp->lazy_jiffies) + mod_timer(&rtpcp->lazy_timer, rcu_tasks_lazy_time(rtp)); } raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags); if (needwake) Thanks Zqiang > Thanx, Paul > > > Thanks > > Zqiang > > > > > > > > > > Thanx, Paul > > > > > > > if (!rtpcp->urgent_gp) > > > > rtpcp->urgent_gp = 1; > > > > needwake = true; > > > > -- > > > > 2.17.1 > > > >