linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Z qiang <qiang.zhang1211@gmail.com>
Cc: "Jon Hunter" <jonathanh@nvidia.com>,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, rostedt@goodmis.org, hch@lst.de,
	"Sachin Sant" <sachinp@linux.ibm.com>,
	"Zhang, Qiang1" <qiang1.zhang@intel.com>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
	"Matthias Brugger" <matthias.bgg@gmail.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	"Michał Mirosław" <mirq-linux@rere.qmqm.pl>,
	"Dmitry Osipenko" <dmitry.osipenko@collabora.com>
Subject: Re: [PATCH rcu 11/20] srcu: Move grace-period fields from srcu_struct to srcu_usage
Date: Thu, 1 Jun 2023 20:09:31 -0700	[thread overview]
Message-ID: <f0ce2b72-21d2-4a74-ad67-5bf0e63abd98@paulmck-laptop> (raw)
In-Reply-To: <CALm+0cWkVsMBQo+rHHWFSwsAkRTOmHxXJDSuVE9Dhh=1fHTmEg@mail.gmail.com>

On Fri, Jun 02, 2023 at 10:52:44AM +0800, Z qiang wrote:
> >
> > On Thu, Jun 01, 2023 at 08:33:10PM +0800, Z qiang wrote:
> > > >
> > > > Hi Paul,
> > > >
> > > > On 30/03/2023 23:47, Paul E. McKenney wrote:
> > > > > This commit moves the ->srcu_gp_seq, ->srcu_gp_seq_needed,
> > > > > ->srcu_gp_seq_needed_exp, ->srcu_gp_start, and ->srcu_last_gp_end fields
> > > > > from the srcu_struct structure to the srcu_usage structure to reduce
> > > > > the size of the former in order to improve cache locality.
> > > > >
> > > > > Suggested-by: Christoph Hellwig <hch@lst.de>
> > > > > Tested-by: Sachin Sant <sachinp@linux.ibm.com>
> > > > > Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
> > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > >
> > > >
> > > > I have noticed a suspend regression on some of our Tegra boards recently
> > > with v6.4-rc and interestingly bisect is pointing to this commit. I was
> > > unable revert this on top of the latest mainline but if I checkout this
> > > commit suspend fails and if I checkout the previous commit is passes.
> > > >
> > > > Enabling more debug I was able to capture the following crash log I see
> > > on one of the boards ...
> > > >
> > > > [   57.327645] PM: suspend entry (deep)
> > > > [   57.331660] Filesystems sync: 0.000 seconds
> > > > [   57.340147] Freezing user space processes
> > > > [   57.347470] Freezing user space processes completed (elapsed 0.007
> > > seconds)
> > > > [   57.347501] OOM killer disabled.
> > > > [   57.347508] Freezing remaining freezable tasks
> > > > [   57.348834] Freezing remaining freezable tasks completed (elapsed
> > > 0.001 seconds)
> > > > [   57.349932] 8<--- cut here ---
> > > > [   57.349943] Unable to handle kernel NULL pointer dereference at
> > > virtual address 00000000 when write
> > > > [   57.349960] [00000000] *pgd=00000000
> > > > [   57.349986] Internal error: Oops: 805 [#1] PREEMPT SMP ARM
> > > > [   57.350007] Modules linked in: tegra30_tsensor
> > > > [   57.350033] CPU: 0 PID: 589 Comm: rtcwake Not tainted
> > > 6.3.0-rc1-00011-g03200b5ca3b4-dirty #3
> > > > [   57.350057] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> > > > [   57.350067] PC is at rcu_segcblist_enqueue+0x2c/0x38
> > > > [   57.350120] LR is at srcu_gp_start_if_needed+0xe4/0x544
> > > > [   57.350169] pc : [<c01a5120>]    lr : [<c0198b5c>]    psr: a0070093
> > > > [   57.350183] sp : f0b2dd20  ip : 3b5870ef  fp : 00000000
> > > > [   57.350194] r10: ef787d84  r9 : 00000000  r8 : ef787d80
> > > > [   57.350205] r7 : 80070013  r6 : c131ec30  r5 : ef787d40  r4 : f0b2dd64
> > > > [   57.350217] r3 : 00000000  r2 : 00000000  r1 : f0b2dd64  r0 : ef787d84
> > > > [   57.350230] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
> > >  Segment none
> > > > [   57.350251] Control: 10c5387d  Table: 81d8004a  DAC: 00000051
> > > > [   57.350261] Register r0 information: non-slab/vmalloc memory
> > > > [   57.350283] Register r1 information: 2-page vmalloc region starting at
> > > 0xf0b2c000 allocated at kernel_clone+0xb4/0x3e4
> > > > [   57.350322] Register r2 information: NULL pointer
> > > > [   57.350337] Register r3 information: NULL pointer
> > > > [   57.350350] Register r4 information: 2-page vmalloc region starting at
> > > 0xf0b2c000 allocated at kernel_clone+0xb4/0x3e4
> > > > [   57.350379] Register r5 information: non-slab/vmalloc memory
> > > > [   57.350394] Register r6 information: non-slab/vmalloc memory
> > > > [   57.350408] Register r7 information: non-paged memory
> > > > [   57.350422] Register r8 information: non-slab/vmalloc memory
> > > > [   57.350436] Register r9 information: NULL pointer
> > > > [   57.350449] Register r10 information: non-slab/vmalloc memory
> > > > [   57.350463] Register r11 information: NULL pointer
> > > > [   57.350477] Register r12 information: non-paged memory
> > > > [   57.350491] Process rtcwake (pid: 589, stack limit = 0x410bb531)
> > > > [   57.350510] Stack: (0xf0b2dd20 to 0xf0b2e000)
> > > > [   57.350534] dd20: 00000000 c1ee4a40 f0b2dd7c c0184f24 ef781495
> > > 3b5870ef c1ee4a40 c1ee4a40
> > > > [   57.350555] dd40: c131ec30 00000000 00000002 c0f3d1fc c3542ac0
> > > c2abbb10 c1ee4a40 c0199044
> > > > [   57.350574] dd60: 60070013 00000000 c0195924 00000000 00000000
> > > f0b2dd74 f0b2dd74 3b5870ef
> > > > [   57.350592] dd80: 00000000 c131ebc0 c120ab28 c0146d9c c2785b94
> > > c2785b40 c0fee9f4 c0872590
> > > > [   57.350611] dda0: c2785b40 c08c39cc c2785b40 c08c3a3c c2788c00
> > > c08c40b0 c0f3d1fc c066f028
> > > > [   57.350630] ddc0: f0b2de14 c2788c00 c1325ef4 c08c1d40 c2788c00
> > > c1325ef4 c0fee9f4 c08c31cc
> > > > [   57.350648] dde0: c13708a0 0000000d 00000000 c0681c10 c16afe84
> > > 00000002 56508788 0000000d
> > > > [   57.350665] de00: 00000002 c13708a0 10624dd3 56409580 0000000d
> > > 00000000 00000002 c0f3d1fc
> > > > [   57.350685] de20: c3542ac0 c2abbb10 c1ee4a40 c06824e4 00000000
> > > ffffa900 00000000 c1386510
> > > > [   57.350703] de40: 00000003 00000003 c1204f75 c017e8a8 c3542ac0
> > > c2abbb10 00428228 c0171574
> > > > [   57.350721] de60: 00000000 00000000 00000003 3b5870ef c1204f75
> > > 00000000 00000003 c137aeb4
> > > > [   57.350739] de80: c1204f75 c0f3d1fc c3542ac0 c2abbb10 00428228
> > > c017f380 00000003 c0f38a54
> > > > [   57.350757] dea0: 00000003 c1386524 00000004 c017d708 00000004
> > > c2abbb00 00000000 00000000
> > > > [   57.350775] dec0: c3542ac0 f0b2df28 c2abbb10 c03305b4 00000000
> > > 00000000 c2953c00 c1ee4a40
> > > > [   57.350794] dee0: 00429438 00000004 c0d18488 00004004 00000000
> > > c02b1094 00000a55 c1d80010
> > > > [   57.350812] df00: c1d80010 00000000 00000000 f0b2df78 01010006
> > > 00000004 00000000 00429438
> > > > [   57.350830] df20: 00000000 00000000 c2953c00 00000000 00000000
> > > 00000000 00000000 00000000
> > > > [   57.350848] df40: 00000000 00004004 00000000 00000000 0000006c
> > > 3b5870ef c2953c00 c2953c00
> > > > [   57.350866] df60: 00000000 00000000 c1ee4a40 00429438 00000004
> > > c02b12c8 00000000 00000000
> > > > [   57.350885] df80: 00001008 3b5870ef 0000006c 00429438 00428228
> > > 00000004 c0100324 c1ee4a40
> > > > [   57.350902] dfa0: 00000004 c01000c0 0000006c 00429438 00000004
> > > 00429438 00000004 00000000
> > > > [   57.350920] dfc0: 0000006c 00429438 00428228 00000004 00000004
> > > 00000004 0041578c 00428228
> > > > [   57.350938] dfe0: 00000004 becda9a8 b6e9bc0b b6e26206 600f0030
> > > 00000004 00000000 00000000
> > > > [   57.350960]  rcu_segcblist_enqueue from
> > > srcu_gp_start_if_needed+0xe4/0x544
> > > > [   57.351023]  srcu_gp_start_if_needed from
> > > __synchronize_srcu.part.6+0x70/0x98
> > > > [   57.351084]  __synchronize_srcu.part.6 from
> > > srcu_notifier_chain_unregister+0x6c/0xdc
> > > > [   57.351155]  srcu_notifier_chain_unregister from
> > > cpufreq_unregister_notifier+0x60/0xbc
> > > > [   57.351215]  cpufreq_unregister_notifier from
> > > tegra_actmon_pause.part.0+0x1c/0x54
> > > > [   57.351277]  tegra_actmon_pause.part.0 from tegra_actmon_stop+0x38/0x3c
> > > > [   57.351324]  tegra_actmon_stop from
> > > tegra_governor_event_handler+0x100/0x11c
> > > > [   57.351373]  tegra_governor_event_handler from
> > > devfreq_suspend_device+0x64/0xac
> > > > [   57.351423]  devfreq_suspend_device from devfreq_suspend+0x30/0x64
> > > > [   57.351467]  devfreq_suspend from dpm_suspend+0x34/0x33c
> > > > [   57.351506]  dpm_suspend from dpm_suspend_start+0x90/0x98
> > > > [   57.351528]  dpm_suspend_start from
> > > suspend_devices_and_enter+0xe4/0x93c
> > > > [   57.351573]  suspend_devices_and_enter from pm_suspend+0x280/0x3ac
> > > > [   57.351614]  pm_suspend from state_store+0x6c/0xc8
> > > > [   57.351654]  state_store from kernfs_fop_write_iter+0x118/0x1b4
> > > > [   57.351696]  kernfs_fop_write_iter from vfs_write+0x314/0x3d4
> > > > [   57.351733]  vfs_write from ksys_write+0xa0/0xd0
> > > > [   57.351760]  ksys_write from ret_fast_syscall+0x0/0x54
> > > > [   57.351788] Exception stack(0xf0b2dfa8 to 0xf0b2dff0)
> > > > [   57.351809] dfa0:                   0000006c 00429438 00000004
> > > 00429438 00000004 00000000
> > > > [   57.351828] dfc0: 0000006c 00429438 00428228 00000004 00000004
> > > 00000004 0041578c 00428228
> > > > [   57.351843] dfe0: 00000004 becda9a8 b6e9bc0b b6e26206
> > > > [   57.351863] Code: e2833001 e5803034 e5812000 e5903010 (e5831000)
> > > > [   57.351875] ---[ end trace 0000000000000000 ]---
> > > >
> > > >
> > > > I have not dug into this yet and so wanted to see if you have any
> > > thoughts on this?
> > > >
> > >
> > > Hi, Jon
> > >
> > > Please try it:
> > >
> > > diff --git a/include/linux/notifier.h b/include/linux/notifier.h
> > > index 2aba75145144..3ce6b59e02e5 100644
> > > --- a/include/linux/notifier.h
> > > +++ b/include/linux/notifier.h
> > > @@ -110,6 +110,7 @@ extern void srcu_init_notifier_head(struct
> > > srcu_notifier_head *nh);
> > >         {                                                       \
> > >                 .mutex = __MUTEX_INITIALIZER(name.mutex),       \
> > >                 .head = NULL,                                   \
> > > +               .srcuu = __SRCU_USAGE_INIT(name.srcuu),         \
> > >                 .srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \
> > >         }
> >
> > Thank you both!
> >
> > Huh.  It looks like Chen-Yu Tsai sent a patch to this effect and
> > AngeloGioacchino Del Regno tested it.  No one has picked it up yet.
> >
> > https://lore.kernel.org/all/20230526073539.339203-1-wenst@chromium.org/
> >
> > This is clearly a regression, and I don't see it in -next.  I will pick
> > it up and send it along in a few days if Matthias or Rafael don't beat
> > me to it.
> >
> > In the meantime, I would be happy to add Jon's Reported-by and Tested-by,
> > along with Qiang's Acked-by or Reviewed-by.
> 
> Acked-by: Zqiang <qiang.zhang1211@gmail.com>

Thank you!  I will apply this on my next rebase.

							Thanx, Paul

  reply	other threads:[~2023-06-02  3:09 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 22:47 [PATCH rcu 0/20] Further shrink srcu_struct to promote cache locality Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 01/20] rcu-tasks: Fix warning for unused tasks_rcu_exit_srcu Paul E. McKenney
2023-03-31 11:58   ` Frederic Weisbecker
2023-03-31 18:35     ` Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 02/20] srcu: Add whitespace to __SRCU_STRUCT_INIT() & __DEFINE_SRCU() Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 03/20] srcu: Use static init for statically allocated in-module srcu_struct Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 04/20] srcu: Begin offloading srcu_struct fields to srcu_update Paul E. McKenney
2023-04-04  0:35   ` Joel Fernandes
2023-04-04  1:06     ` Paul E. McKenney
2023-04-04  1:16       ` Joel Fernandes
2023-03-30 22:47 ` [PATCH rcu 05/20] srcu: Move ->level from srcu_struct to srcu_usage Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 06/20] srcu: Move ->srcu_size_state " Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 07/20] srcu: Move ->srcu_cb_mutex " Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 08/20] srcu: Move ->lock initialization after srcu_usage allocation Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 09/20] srcu: Move ->lock from srcu_struct to srcu_usage Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 10/20] srcu: Move ->srcu_gp_mutex " Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 11/20] srcu: Move grace-period fields " Paul E. McKenney
2023-06-01 11:14   ` Jon Hunter
     [not found]     ` <CALm+0cVXGdLNQpfJxnAnq2j2Ybs_rVAEqNzxgLSq7bDJp1KnfA@mail.gmail.com>
2023-06-01 13:46       ` Paul E. McKenney
2023-06-01 17:14         ` Jon Hunter
2023-06-01 19:21           ` Paul E. McKenney
2023-06-02  2:52         ` Z qiang
2023-06-02  3:09           ` Paul E. McKenney [this message]
2023-06-04  9:53     ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-03-30 22:47 ` [PATCH rcu 12/20] srcu: Move heuristics " Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 13/20] srcu: Move ->sda_is_static " Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 14/20] srcu: Move srcu_barrier() fields " Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 15/20] srcu: Move work-scheduling " Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 16/20] srcu: Check for readers at module-exit time Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 17/20] srcu: Fix long lines in srcu_get_delay() Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 18/20] srcu: Fix long lines in cleanup_srcu_struct() Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 19/20] srcu: Fix long lines in srcu_gp_end() Paul E. McKenney
2023-03-30 22:47 ` [PATCH rcu 20/20] srcu: Fix long lines in srcu_funnel_gp_start() Paul E. McKenney
2023-04-04 13:57 ` [PATCH rcu 0/20] Further shrink srcu_struct to promote cache locality Joel Fernandes
2023-04-04 14:09   ` Paul E. McKenney
2023-04-04 17:01     ` Joel Fernandes
2023-04-04 17:17       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0ce2b72-21d2-4a74-ad67-5bf0e63abd98@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=dmitry.osipenko@collabora.com \
    --cc=hch@lst.de \
    --cc=jonathanh@nvidia.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=matthias.bgg@gmail.com \
    --cc=mirq-linux@rere.qmqm.pl \
    --cc=qiang.zhang1211@gmail.com \
    --cc=qiang1.zhang@intel.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sachinp@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).