All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: paulmck@kernel.org
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	peterz@infradead.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	willy@infradead.org, mgorman@suse.de, jpoimboe@kernel.org,
	mark.rutland@arm.com, jgross@suse.com, andrew.cooper3@citrix.com,
	bristot@kernel.org, mathieu.desnoyers@efficios.com,
	glaubitz@physik.fu-berlin.de, anton.ivanov@cambridgegreys.com,
	mattst88@gmail.com, krypton@ulrich-teichert.org,
	rostedt@goodmis.org, David.Laight@aculab.com, richard@nod.at,
	jon.grimm@amd.com, bharata@amd.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com
Subject: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling
Date: Fri, 16 Feb 2024 19:59:45 -0800	[thread overview]
Message-ID: <87r0hbkafi.fsf@oracle.com> (raw)
In-Reply-To: <0be4df28-99be-41a3-9e24-2b7cfc740b4a@paulmck-laptop>


Paul E. McKenney <paulmck@kernel.org> writes:

> On Thu, Feb 15, 2024 at 06:59:25PM -0800, Paul E. McKenney wrote:
>> On Thu, Feb 15, 2024 at 04:45:17PM -0800, Ankur Arora wrote:
>> >
>> > Paul E. McKenney <paulmck@kernel.org> writes:
>> >
>> > > On Thu, Feb 15, 2024 at 01:24:59PM -0800, Ankur Arora wrote:
>> > >>
>> > >> Paul E. McKenney <paulmck@kernel.org> writes:
>> > >>
>> > >> > On Wed, Feb 14, 2024 at 07:45:18PM -0800, Paul E. McKenney wrote:
>> > >> >> On Wed, Feb 14, 2024 at 06:03:28PM -0800, Ankur Arora wrote:
>> > >> >> >
>> > >> >> > Paul E. McKenney <paulmck@kernel.org> writes:
>> > >> >> >
>> > >> >> > > On Mon, Feb 12, 2024 at 09:55:24PM -0800, Ankur Arora wrote:
>> > >> >> > >> Hi,
>> > >> >> > >>
>> > >> >> > >> This series adds a new scheduling model PREEMPT_AUTO, which like
>> > >> >> > >> PREEMPT_DYNAMIC allows dynamic switching between a none/voluntary/full
>> > >> >> > >> preemption model. However, unlike PREEMPT_DYNAMIC, it doesn't depend
>> > >> >> > >> on explicit preemption points for the voluntary models.
>> > >> >> > >>
>> > >> >> > >> The series is based on Thomas' original proposal which he outlined
>> > >> >> > >> in [1], [2] and in his PoC [3].
>> > >> >> > >>
>> > >> >> > >> An earlier RFC version is at [4].
>> > >> >> > >
>> > >> >> > > This uncovered a couple of latent bugs in RCU due to its having been
>> > >> >> > > a good long time since anyone built a !SMP preemptible kernel with
>> > >> >> > > non-preemptible RCU.  I have a couple of fixes queued on -rcu [1], most
>> > >> >> > > likely for the merge window after next, but let me know if you need
>> > >> >> > > them sooner.
>> > >> >> >
>> > >> >> > Thanks. As you can probably tell, I skipped out on !SMP in my testing.
>> > >> >> > But, the attached diff should tide me over until the fixes are in.
>> > >> >>
>> > >> >> That was indeed my guess.  ;-)
>> > >> >>
>> > >> >> > > I am also seeing OOM conditions during rcutorture testing of callback
>> > >> >> > > flooding, but I am still looking into this.
>> > >> >> >
>> > >> >> > That's on the PREEMPT_AUTO && PREEMPT_VOLUNTARY configuration?
>> > >> >>
>> > >> >> On two of the PREEMPT_AUTO && PREEMPT_NONE configurations, but only on
>> > >> >> two of them thus far.  I am running a longer test to see if this might
>> > >> >> be just luck.  If not, I look to see what rcutorture scenarios TREE10
>> > >> >> and TRACE01 have in common.
>> > >> >
>> > >> > And still TRACE01 and TREE10 are hitting OOMs, still not seeing what
>> > >> > sets them apart.  I also hit a grace-period hang in TREE04, which does
>> > >> > CONFIG_PREEMPT_VOLUNTARY=y along with CONFIG_PREEMPT_AUTO=y.  Something
>> > >> > to dig into more.
>> > >>
>> > >> So, the only PREEMPT_VOLUNTARY=y configuration is TREE04. I wonder
>> > >> if you would continue to hit the TREE04 hang with CONFIG_PREEMTP_NONE=y
>> > >> as well?
>> > >> (Just in the interest of minimizing configurations.)
>> > >
>> > > I would be happy to, but in the spirit of full disclosure...
>> > >
>> > > First, I have seen that failure only once, which is not enough to
>> > > conclude that it has much to do with TREE04.  It might simply be low
>> > > probability, so that TREE04 simply was unlucky enough to hit it first.
>> > > In contrast, I have sufficient data to be reasonably confident that the
>> > > callback-flooding OOMs really do have something to do with the TRACE01 and
>> > > TREE10 scenarios, even though I am not yet seeing what these two scenarios
>> > > have in common that they don't also have in common with other scenarios.
>> > > But what is life without a bit of mystery?  ;-)
>> >
>> > :).
>> >
>> > > Second, please see the attached tarball, which contains .csv files showing
>> > > Kconfig options and kernel boot parameters for the various torture tests.
>> > > The portions of the filenames preceding the "config.csv" correspond to
>> > > the directories in tools/testing/selftests/rcutorture/configs.
>> >
>> > So, at least some of the HZ_FULL=y tests don't run into problems.
>> >
>> > > Third, there are additional scenarios hand-crafted by the script at
>> > > tools/testing/selftests/rcutorture/bin/torture.sh.  Thus far, none of
>> > > them have triggered, other than via the newly increased difficulty
>> > > of configurating a tracing-free kernel with which to test, but they
>> > > can still be useful in ruling out particular Kconfig options or kernel
>> > > boot parameters being related to a given issue.
>> > >
>> > > But please do take a look at the .csv files and let me know what
>> > > adjustments would be appropriate given the failure information.
>> >
>> > Nothing stands out just yet. Let me start a run here and see if
>> > that gives me some ideas.
>>
>> Sounds good, thank you!
>>
>> > I'm guessing the splats don't give any useful information or
>> > you would have attached them ;).
>>
>> My plan is to extract what can be extracted from the overnight run
>> that I just started.  Just in case the fixes have any effect on things,
>> unlikely though that might be given those fixes and the runs that failed.
>
> And I only got no failures from either TREE10 or TRACE01 on last night's
> run.

Oh that's great news. Same for my overnight runs for TREE04 and TRACE01.

Ongoing: a 24 hour run for those. Let's see how that goes.

> I merged your series on top of v6.8-rc4 with the -rcu tree's
> dev branch, the latter to get the RCU fixes.  But this means that last
> night's results are not really comparable to earlier results.
>
> I did get a few TREE09 failures, but I get those anyway.  I took it
> apart below for you because I got confused and thought that it was a
> TREE10 failure.  So just in case you were curious what one of these
> looks like and because I am too lazy to delete it.  ;-)

Heh. Well, thanks for being lazy /after/ dissecting it nicely.

> So from the viewpoint of moderate rcutorture testing, this series
> looks good.  Woo hoo!!!

Awesome!

> We did uncover a separate issue with Tasks RCU, which I will report on
> in more detail separately.  However, this issue does not (repeat, *not*)
> affect lazy preemption as such, but instead any attempt to remove all
> of the cond_resched() invocations.

So, that sounds like it happens even with (CONFIG_PREEMPT_AUTO=n,
CONFIG_PREEMPT=y)?
Anyway will look out for it when you go into the detail.

> My next step is to try this on bare metal on a system configured as
> is the fleet.  But good progress for a week!!!

Yeah this is great. Fingers crossed for the wider set of tests.

Thanks

--
ankur

  reply	other threads:[~2024-02-17  4:01 UTC|newest]

Thread overview: 157+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13  5:55 [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-02-13  5:55 ` [PATCH 01/30] preempt: introduce CONFIG_PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 02/30] thread_info: selector for TIF_NEED_RESCHED[_LAZY] Ankur Arora
2024-02-19 15:16   ` Thomas Gleixner
2024-02-20 22:50     ` Ankur Arora
2024-02-21 17:05       ` Thomas Gleixner
2024-02-21 18:26   ` Steven Rostedt
2024-02-21 20:03     ` Thomas Gleixner
2024-02-13  5:55 ` [PATCH 03/30] thread_info: tif_need_resched() now takes resched_t as param Ankur Arora
2024-02-14  3:17   ` kernel test robot
2024-02-14 14:08   ` Mark Rutland
2024-02-15  4:08     ` Ankur Arora
2024-02-19 12:30       ` Mark Rutland
2024-02-20 22:09         ` Ankur Arora
2024-02-19 15:21     ` Thomas Gleixner
2024-02-20 22:21       ` Ankur Arora
2024-02-21 17:07         ` Thomas Gleixner
2024-02-21 21:22           ` Ankur Arora
2024-02-13  5:55 ` [PATCH 04/30] sched: make test_*_tsk_thread_flag() return bool Ankur Arora
2024-02-14 14:12   ` Mark Rutland
2024-02-15  2:04     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 05/30] sched: *_tsk_need_resched() now takes resched_t as param Ankur Arora
2024-02-19 15:26   ` Thomas Gleixner
2024-02-20 22:37     ` Ankur Arora
2024-02-21 17:10       ` Thomas Gleixner
2024-02-13  5:55 ` [PATCH 06/30] entry: handle lazy rescheduling at user-exit Ankur Arora
2024-02-19 15:29   ` Thomas Gleixner
2024-02-20 22:38     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 07/30] entry/kvm: handle lazy rescheduling at guest-entry Ankur Arora
2024-02-13  5:55 ` [PATCH 08/30] entry: irqentry_exit only preempts for TIF_NEED_RESCHED Ankur Arora
2024-02-13  5:55 ` [PATCH 09/30] sched: __schedule_loop() doesn't need to check for need_resched_lazy() Ankur Arora
2024-02-13  5:55 ` [PATCH 10/30] sched: separate PREEMPT_DYNAMIC config logic Ankur Arora
2024-02-13  5:55 ` [PATCH 11/30] sched: runtime preemption config under PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 12/30] rcu: limit PREEMPT_RCU to full preemption " Ankur Arora
2024-02-13  5:55 ` [PATCH 13/30] rcu: fix header guard for rcu_all_qs() Ankur Arora
2024-02-13  5:55 ` [PATCH 14/30] preempt,rcu: warn on PREEMPT_RCU=n, preempt=full Ankur Arora
2024-02-13  5:55 ` [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
2024-03-10 10:03   ` Joel Fernandes
2024-03-10 18:56     ` Paul E. McKenney
2024-03-11  0:48       ` Joel Fernandes
2024-03-11  3:56         ` Paul E. McKenney
2024-03-11 15:01           ` Joel Fernandes
2024-03-11 20:51             ` Ankur Arora
2024-03-11 22:12               ` Thomas Gleixner
2024-03-11  5:18         ` Ankur Arora
2024-03-11 15:25           ` Joel Fernandes
2024-03-11 19:12             ` Thomas Gleixner
2024-03-11 19:53               ` Paul E. McKenney
2024-03-11 20:29                 ` Thomas Gleixner
2024-03-12  0:01                   ` Paul E. McKenney
2024-03-12  0:08               ` Joel Fernandes
2024-03-12  3:16                 ` Ankur Arora
2024-03-12  3:24                   ` Joel Fernandes
2024-03-12  5:23                     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 16/30] rcu: force context-switch " Ankur Arora
2024-02-13  5:55 ` [PATCH 17/30] x86/thread_info: define TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-14 13:25   ` Mark Rutland
2024-02-14 20:31     ` Ankur Arora
2024-02-19 12:32       ` Mark Rutland
2024-02-13  5:55 ` [PATCH 18/30] sched: prepare for lazy rescheduling in resched_curr() Ankur Arora
2024-02-13  5:55 ` [PATCH 19/30] sched: default preemption policy for PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 20/30] sched: handle idle preemption " Ankur Arora
2024-02-13  5:55 ` [PATCH 21/30] sched: schedule eagerly in resched_cpu() Ankur Arora
2024-02-13  5:55 ` [PATCH 22/30] sched/fair: refactor update_curr(), entity_tick() Ankur Arora
2024-02-13  5:55 ` [PATCH 23/30] sched/fair: handle tick expiry under lazy preemption Ankur Arora
2024-02-21 21:38   ` Steven Rostedt
2024-02-28 13:47   ` Juri Lelli
2024-02-29  6:43     ` Ankur Arora
2024-02-29  9:33       ` Juri Lelli
2024-02-29 23:54         ` Ankur Arora
2024-03-01  0:28           ` Paul E. McKenney
2024-02-13  5:55 ` [PATCH 24/30] sched: support preempt=none under PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 25/30] sched: support preempt=full " Ankur Arora
2024-02-13  5:55 ` [PATCH 26/30] sched: handle preempt=voluntary " Ankur Arora
2024-03-03  1:08   ` Joel Fernandes
2024-03-05  8:11     ` Ankur Arora
2024-03-06 20:42       ` Joel Fernandes
2024-03-07 19:01         ` Paul E. McKenney
2024-03-08  0:15           ` Joel Fernandes
2024-03-08  0:42             ` Paul E. McKenney
2024-03-08  4:22               ` Ankur Arora
2024-03-08 21:33                 ` Paul E. McKenney
2024-03-11  4:50                   ` Ankur Arora
2024-03-11 19:26                     ` Paul E. McKenney
2024-03-11 20:09                       ` Ankur Arora
2024-03-11 20:23                         ` Linus Torvalds
2024-03-11 21:03                           ` Ankur Arora
2024-03-12  0:03                           ` Paul E. McKenney
2024-03-12 12:14                             ` Thomas Gleixner
2024-03-12 19:40                               ` Paul E. McKenney
2024-03-08  3:49             ` Ankur Arora
2024-03-08  5:29               ` Joel Fernandes
2024-03-08  6:54               ` Juri Lelli
2024-03-11  5:34                 ` Ankur Arora
2024-02-13  5:55 ` [PATCH 27/30] sched: latency warn for TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-13  5:55 ` [PATCH 28/30] tracing: support lazy resched Ankur Arora
2024-02-13  5:55 ` [PATCH 29/30] Documentation: tracing: add TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-21 21:43   ` Steven Rostedt
2024-02-21 23:22     ` Ankur Arora
2024-02-21 23:53       ` Steven Rostedt
2024-03-01 23:33     ` Joel Fernandes
2024-03-02  3:09       ` Ankur Arora
2024-03-03 19:32         ` Joel Fernandes
2024-02-13  5:55 ` [PATCH 30/30] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
2024-02-13  9:47 ` [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Geert Uytterhoeven
2024-02-13 21:46   ` Ankur Arora
2024-02-14 23:57 ` Paul E. McKenney
2024-02-15  2:03   ` Ankur Arora
2024-02-15  3:45     ` Paul E. McKenney
2024-02-15 19:28       ` Paul E. McKenney
2024-02-15 20:04         ` Thomas Gleixner
2024-02-15 20:54           ` Paul E. McKenney
2024-02-15 20:53         ` Ankur Arora
2024-02-15 20:55           ` Paul E. McKenney
2024-02-15 21:24         ` Ankur Arora
2024-02-15 22:54           ` Paul E. McKenney
2024-02-15 22:56             ` Paul E. McKenney
2024-02-16  0:45             ` Ankur Arora
2024-02-16  2:59               ` Paul E. McKenney
2024-02-17  0:55                 ` Paul E. McKenney
2024-02-17  3:59                   ` Ankur Arora [this message]
2024-02-18 18:17                     ` Paul E. McKenney
2024-02-19 16:48                       ` Paul E. McKenney
2024-02-21 18:19                         ` Steven Rostedt
2024-02-21 19:41                           ` Paul E. McKenney
2024-02-21 20:11                             ` Steven Rostedt
2024-02-21 20:22                               ` Paul E. McKenney
2024-02-22 15:50                                 ` Mark Rutland
2024-02-22 19:11                                   ` Paul E. McKenney
2024-02-23 11:05                                     ` Mark Rutland
2024-02-23 15:31                                       ` Paul E. McKenney
2024-03-02  1:16                                         ` Paul E. McKenney
2024-03-19 11:45                                           ` Tasks RCU, ftrace, and trampolines (was: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling) Mark Rutland
2024-03-19 23:33                                             ` Paul E. McKenney
2024-02-21  6:48                   ` [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-02-21 17:44                     ` Paul E. McKenney
2024-02-16  0:45             ` Ankur Arora
2024-02-21 12:23 ` Raghavendra K T
2024-02-21 17:15   ` Thomas Gleixner
2024-02-21 17:27     ` Raghavendra K T
2024-02-21 21:16       ` Ankur Arora
2024-02-22  4:05         ` Raghavendra K T
2024-02-22 21:23       ` Thomas Gleixner
2024-02-23  3:14         ` Ankur Arora
2024-02-23  6:28           ` Raghavendra K T
2024-02-24  3:15             ` Raghavendra K T
2024-02-27 17:45               ` Ankur Arora
2024-02-22 13:04     ` Raghavendra K T
2024-04-23 15:21 ` Shrikanth Hegde
2024-04-23 16:13   ` Linus Torvalds
2024-04-26  7:46     ` Shrikanth Hegde
2024-04-26 19:00       ` Ankur Arora
2024-05-07 11:16         ` Shrikanth Hegde
2024-05-08  5:18           ` Ankur Arora
2024-05-15 14:31             ` Shrikanth Hegde
     [not found] <draft-87a5o4go5i.ffs@tglx>
2024-02-19 15:54 ` Thomas Gleixner
2024-02-21  6:48   ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0hbkafi.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=David.Laight@aculab.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=bristot@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jon.grimm@amd.com \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=krypton@ulrich-teichert.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mattst88@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=richard@nod.at \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.