All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	rcu@vger.kernel.org, Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [BUG] Re: Linux 6.4.4
Date: Mon, 24 Jul 2023 00:32:57 +0000	[thread overview]
Message-ID: <20230724003257.GA60074@google.com> (raw)
In-Reply-To: <ebde8612-95de-4eaf-aa56-34e9b3a3fa86@paulmck-laptop>

On Sun, Jul 23, 2023 at 10:19:27AM -0700, Paul E. McKenney wrote:
> On Sun, Jul 23, 2023 at 10:50:26AM -0400, Joel Fernandes wrote:
> > 
> > 
> > On 7/22/23 13:27, Paul E. McKenney wrote:
> > [..]
> > > 
> > > OK, if this kernel is non-preemptible, you are not running TREE03,
> > > correct?
> > > 
> > >> Next plan of action is to get sched_waking stack traces since I have a
> > >> very reliable repro of this now.
> > > 
> > > Too much fun!  ;-)
> > 
> > For TREE07 issue, it is actually the schedule_timeout_interruptible(1)
> > in stutter_wait() that is beating up the CPU0 for 4 seconds.
> > 
> > This is very similar to the issue I fixed in New year in d52d3a2bf408
> > ("torture: Fix hang during kthread shutdown phase")
> 
> Agreed, if there are enough kthreads, and all the kthreads are on a
> single CPU, this could consume that CPU.
> 
> > Adding a cond_resched() there also did not help.
> > 
> > I think the issue is the stutter thread fails to move spt forward
> > because it does not get CPU time. But spt == 1 should be very brief
> > AFAIU. I was wondering if we could set that to RT.
> 
> Or just use a single hrtimer-based wait for each kthread?

[Joel]
Yes this might be better, but there's still the issue that spt may not be set
back to 0 in some future release where the thread gets starved.

> > But also maybe the following will cure it like it did for the shutdown
> > issue, giving the stutter thread just enough CPU time to move spt forward.
> > 
> > Now I am trying the following and will let it run while I go do other
> > family related things. ;)
> 
> Good point, if this avoids the problem, that gives a strong indication
> that your hypothesis on the root cause is correct.

[Joel]
And the TREE07 issue is gone with that change! So I think I'll roll into a
patch and send it to you. But I am also hoping that you are Ok with me
setting the stutter thread to RT in addition to the longer schedule_timeout.
That's just to make it more robust since I think it is crucial that it does
not stutter threads indefinitely due to the scheduler (for any unforeseen
reason in the future, such as scheduler issues). And maybe, as a part of
that I could also tackle that other TODO item about cleaning up
torture_create_kthead() as well to add support to it for setting things to
RT and use it for that.

Let me know what you think, thanks!

 - Joel


  reply	other threads:[~2023-07-24  0:33 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19 15:06 Linux 6.4.4 Greg Kroah-Hartman
2023-07-19 15:06 ` Greg Kroah-Hartman
2023-07-20 13:27 ` [BUG] " Joel Fernandes
2023-07-20 15:55   ` Paul E. McKenney
2023-07-20 16:31     ` Joel Fernandes
2023-07-20 19:04       ` Paul E. McKenney
2023-07-20 19:32         ` Joel Fernandes
2023-07-20 19:47           ` Paul E. McKenney
2023-07-21 12:13             ` Joel Fernandes
2023-07-21 19:20               ` Joel Fernandes
2023-07-21 22:08                 ` Paul E. McKenney
2023-07-22 12:38                   ` Joel Fernandes
2023-07-22 17:27                     ` Paul E. McKenney
2023-07-23  0:25                       ` Joel Fernandes
2023-07-23 14:50                       ` Joel Fernandes
2023-07-23 17:19                         ` Paul E. McKenney
2023-07-24  0:32                           ` Joel Fernandes [this message]
2023-07-24  3:35                             ` Paul E. McKenney
2023-07-24 13:36                               ` Joel Fernandes
2023-07-24 16:00                                 ` Paul E. McKenney
2023-07-24 23:04                                   ` Joel Fernandes
2023-07-24 23:17                                     ` Paul E. McKenney
2023-07-25 15:30                                       ` Joel Fernandes
2023-07-25 16:33                                         ` Paul E. McKenney
2023-07-21  1:51   ` Zhouyi Zhou
2023-07-22  1:00     ` Zhouyi Zhou
2023-07-23  0:26       ` Joel Fernandes
2023-07-23  0:39         ` Zhouyi Zhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230724003257.GA60074@google.com \
    --to=joel@joelfernandes.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.