All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: paulmck@kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>,
	Rushikesh S Kadam <rushikesh.s.kadam@intel.com>,
	"Uladzislau Rezki (Sony)" <urezki@gmail.com>,
	Neeraj upadhyay <neeraj.iitr10@gmail.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>, rcu <rcu@vger.kernel.org>
Subject: Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
Date: Thu, 18 Aug 2022 22:45:19 -0400	[thread overview]
Message-ID: <4deb7354-bac7-b530-47ba-54cf50cfce58@joelfernandes.org> (raw)
In-Reply-To: <20220819023550.GN2125313@paulmck-ThinkPad-P17-Gen-1>



On 8/18/2022 10:35 PM, Paul E. McKenney wrote:
> On Thu, Aug 18, 2022 at 09:21:56PM -0400, Joel Fernandes wrote:
>> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>
>>> On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>
>>>> [Sorry, adding back the CC list]
>>>>
>>>> On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
>>>> <joel@joelfernandes.org> wrote:
>>>>>
>>>>> This is required to prevent callbacks triggering RCU machinery too
>>>>> quickly and too often, which adds more power to the system.
>>>>>
>>>>> When testing, we found that these paths were invoked often when the
>>>>> system is not doing anything (screen is ON but otherwise idle).
>>>>
>>>> Unfortunately, I am seeing a slow down in ChromeOS boot performance
>>>> after applying this particular patch. It is the first time I could
>>>> test ChromeOS boot times with the series since it was hard to find a
>>>> ChromeOS device that runs the upstream kernel.
>>>>
>>>> Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
>>>> patch? I wonder if the issue is with wake up interaction with the nocb
>>>> GP threads.
>>>>
>>>> We ought to disable lazy RCU during boot since it would have little
>>>> benefit anyway. But I am also concerned about some deeper problem I
>>>> did not catch before.
>>>>
>>>> I'll look into tracing the fs paths to see if I can narrow down what's
>>>> causing it. Will also try a newer kernel, I am currently testing on
>>>> 5.19-rc4.
>>>
>>> I got somewhere with this. It looks like queuing CBs as lazy CBs
>>> instead of normal CBs, are triggering expedited stalls during the boot
>>> process:
>>>
>>>   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
>>> CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
>>>
>>> No idea how/why lazy RCU CBs would be related to expedited GP issues,
>>> but maybe something hangs and causes that side-effect.
>>>
>>> initcall_debug did not help, as it seems initcalls all work fine, and
>>> then 8 seconds after the boot, it starts slowing down a lot, followed
>>> by the RCU stall messages. As a next step I'll enable ftrace during
>>> the boot to see if I can get more insight. But I believe, its not the
>>> FS layer, the FS layer just triggers lazy CBs, but there is something
>>> wrong with the core lazy-RCU work itself.
>>>
>>> This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
>>> recent kernels and debug.
>>
>> More digging, thanks to trace_event= boot option , I find that the
>> boot process does have some synchronous waits, and though these are
>> "non-lazy", for some reason the lazy CBs that were previously queued
>> are making them wait for the *full* lazy duration. Which points to a
>> likely bug in the lazy RCU logic. These synchronous CBs should never
>> be waiting like the lazy ones:
>>
>> [   17.715904]  => trace_dump_stack
>> [   17.715904]  => __wait_rcu_gp
>> [   17.715904]  => synchronize_rcu
>> [   17.715904]  => selinux_netcache_avc_callback
>> [   17.715904]  => avc_ss_reset
>> [   17.715904]  => sel_write_enforce
>> [   17.715904]  => vfs_write
>> [   17.715904]  => ksys_write
>> [   17.715904]  => do_syscall_64
>> [   17.715904]  => entry_SYSCALL_64_after_hwframe
>>
>> I'm tired so I'll resume the debug later.
> 
> At times like this, I often pull the suspect code into userspace and
> run it through its paces.  In this case, a bunch of call_rcu_lazy()
> invocations into an empty bypass list, followed by a call_rcu()
> invocation, then a check to make sure that the bypass list is no longer
> lazy.

Thanks a lot for this great debug idea, I will look into it.

Thanks,

 - Joel

  reply	other threads:[~2022-08-19  2:45 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-09  3:45 [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 1/6] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 2/6] rcu: shrinker for lazy rcu Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 3/6] rcuscale: Add laziness and kfree tests Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
2022-08-18 17:22   ` Joel Fernandes
2022-08-18 17:23   ` Joel Fernandes
2022-08-18 23:05     ` Joel Fernandes
2022-08-19  1:21       ` Joel Fernandes
2022-08-19  1:29         ` Joel Fernandes
2022-08-19  2:35         ` Paul E. McKenney
2022-08-19  2:45           ` Joel Fernandes [this message]
2022-08-19 16:30             ` Joel Fernandes
2022-08-19 17:12               ` Paul E. McKenney
2022-08-19 18:14                 ` Joel Fernandes
2022-08-19 18:17                   ` Joel Fernandes
2022-08-19 18:26                     ` Paul E. McKenney
2022-08-19 18:29                       ` Joel Fernandes
2022-08-19 19:40                   ` Joel Fernandes
2022-08-19 19:58                     ` Paul E. McKenney
2022-08-19 20:13                       ` Joel Fernandes
2022-08-09  3:45 ` [PATCH v3 resend 5/6] rcutorture: Add test code for call_rcu_lazy() Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 6/6] debug: Toggle lazy at runtime and change flush jiffies Joel Fernandes (Google)
2022-08-11  2:23 ` [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes
2022-08-11  2:31   ` Joel Fernandes
2022-08-11  2:51     ` Paul E. McKenney
2022-08-11  3:22       ` Joel Fernandes
2022-08-11  3:46         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4deb7354-bac7-b530-47ba-54cf50cfce58@joelfernandes.org \
    --to=joel@joelfernandes.org \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraj.iitr10@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=rushikesh.s.kadam@intel.com \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.