All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Hiatt <dhiatt@digitalocean.com>
To: Josh Don <joshdon@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	"Hyser,Chris" <chris.hyser@oracle.com>,
	Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock
Date: Wed, 28 Apr 2021 09:04:38 -0700	[thread overview]
Message-ID: <CAOY2WowNjhMx1M+mOTyhNuMQQijZ-Mu6J4nxE3zO+_E-oFMZdA@mail.gmail.com> (raw)
In-Reply-To: <CABk29Nuz-FDCk23ajcr9gS4KD-wMpwyn=ASu+yuTTT445rwTvw@mail.gmail.com>

On Tue, Apr 27, 2021 at 4:35 PM Josh Don <joshdon@google.com> wrote:
>
> On Tue, Apr 27, 2021 at 10:10 AM Don Hiatt <dhiatt@digitalocean.com> wrote:
> > Hi Josh and Peter,
> >
> > I've been running into soft lookups and hard lockups when running a script
> > that just cycles setting the cookie of a group of processes over and over again.
> >
> > Unfortunately the only way I can reproduce this is by setting the cookies
> > on qemu. I've tried sysbench, stress-ng but those seem to work just fine.
> >
> > I'm running Peter's branch and even tried the suggested changes here but
> > still see the same behavior. I enabled panic on hard lockup and here below
> > is a snippet of the log.
> >
> > Is there anything you'd like me to try or have any debugging you'd like me to
> > do? I'd certainly like to get to the bottom of this.
>
> Hi Don,
>
> I tried to repro using qemu, but did not generate a lockup. Could you
> provide more details on what your script is doing (or better yet,
> share the script directly)? I would have expected you to potentially
> hit a lockup if you were cycling sched_core being enabled and
> disabled, but it sounds like you are just recreating the cookie for a
> process group over and over?
>
> Best,
> Josh

Hi Josh,

Sorry if I wasn't clear, but I'm running on bare metal (Peter's
5.12-rc8 repo) and have two qemu-system-x86_64 vms (1 with 8vcpu, the
other with 24 but it doesn't really matter). I then run a script [1]
that cycles setting the cookie for all the processes given the pid of
qemu-system-x86_64.

I also just found a lockup when I set the cookies for those two vms,
pinned them both to the same processor pair, and ran sysbench within
each vm to generate some load [2]. This is without cycling the
cookies, just setting once.

Thanks!

Don

----[1]---
This is a little test harness (I can provide the source but it is
based on your kselftests)
dhiatt@s2r5node34:~/do_coresched$ ./do_coresched -h
Usage for ./do_coresched
   -c Create sched cookie for <pid>
   -f Share sched cookie from <pid>
   -g Get sched cookie from <pid>
   -t Share sched cookie to <pid>
   -z Clear sched cookie from <pid>
   -p PID

Create: _prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, pid, PIDTYPE_PID, 0)
Share: _prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, pid, PIDTYPE_PID, 0)
Get: prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, pid, PIDTYPE_PID,
                    (unsigned long)&cookie);

--

dhiatt@s2r5node34:~/do_coresched$ cat set.sh
#!/bin/bash
# usage: set.sh pid

target=$1
echo target pid: $target
pids=`ps -eL | grep $target | awk '{print $2}' | sort -g`

echo "Setting cookies for $target"
./do_coresched -c -p $BASHPID. #
for i in $pids
do
./do_coresched -t -p $i
./do_coresched -g -p $i
done
---

---[2]---
[ 8911.926989] watchdog: BUG: soft lockup - CPU#53 stuck for 22s! [CPU
1/KVM:19539]
[ 8911.935727] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6
[ 8911.935791] NMI watchdog: Watchdog detected hard LOCKUP on cpu 7
[ 8911.935908] NMI watchdog: Watchdog detected hard LOCKUP on cpu 12
[ 8911.935967] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13
[ 8911.936070] NMI watchdog: Watchdog detected hard LOCKUP on cpu 19
[ 8911.936145] NMI watchdog: Watchdog detected hard LOCKUP on cpu 21
[ 8911.936220] NMI watchdog: Watchdog detected hard LOCKUP on cpu 23
[ 8911.936361] NMI watchdog: Watchdog detected hard LOCKUP on cpu 31
[ 8911.936453] NMI watchdog: Watchdog detected hard LOCKUP on cpu 34
[ 8911.936567] NMI watchdog: Watchdog detected hard LOCKUP on cpu 42
[ 8911.936627] NMI watchdog: Watchdog detected hard LOCKUP on cpu 46
[ 8911.936712] NMI watchdog: Watchdog detected hard LOCKUP on cpu 49
[ 8911.936827] NMI watchdog: Watchdog detected hard LOCKUP on cpu 58
[ 8911.936887] NMI watchdog: Watchdog detected hard LOCKUP on cpu 60
[ 8911.936969] NMI watchdog: Watchdog detected hard LOCKUP on cpu 70
[ 8915.926847] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 8915.932904] rcu: 2-...!: (7 ticks this GP)
idle=38e/1/0x4000000000000000 softirq=181538/181540 fqs=1178
[ 8915.942617] rcu: 14-...!: (2 GPs behind)
idle=d0e/1/0x4000000000000000 softirq=44825/44825 fqs=1178
[ 8915.954034] rcu: rcu_sched kthread timer wakeup didn't happen for
12568 jiffies! g462469 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[ 8915.965775] rcu: Possible timer handling issue on cpu=6 timer-softirq=10747
[ 8915.973021] rcu: rcu_sched kthread starved for 12572 jiffies!
g462469 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=6
[ 8915.983681] rcu: Unless rcu_sched kthread gets sufficient CPU time,
OOM is now expected behavior.
[ 8915.992879] rcu: RCU grace-period kthread stack dump:
[ 8915.998211] rcu: Stack dump where RCU GP kthread last ran:
[ 8939.925995] watchdog: BUG: soft lockup - CPU#53 stuck for 22s! [CPU
1/KVM:19539]
[ 8939.935274] NMI watchdog: Watchdog detected hard LOCKUP on cpu 25
[ 8939.935351] NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
[ 8939.935425] NMI watchdog: Watchdog detected hard LOCKUP on cpu 29
[ 8939.935653] NMI watchdog: Watchdog detected hard LOCKUP on cpu 44
[ 8939.935845] NMI watchdog: Watchdog detected hard LOCKUP on cpu 63
[ 8963.997140] watchdog: BUG: soft lockup - CPU#71 stuck for 22s!
[SchedulerRunner:4405]

-----

  parent reply	other threads:[~2021-04-28 16:04 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 12:04 [PATCH 00/19] sched: Core Scheduling Peter Zijlstra
2021-04-22 12:05 ` [PATCH 01/19] sched/fair: Add a few assertions Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-05-13  8:56     ` Ning, Hongyu
2021-04-22 12:05 ` [PATCH 02/19] sched: Provide raw_spin_rq_*lock*() helpers Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 03/19] sched: Wrap rq::lock access Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 04/19] sched: Prepare for Core-wide rq->lock Peter Zijlstra
2021-04-24  1:22   ` Josh Don
2021-04-26  8:31     ` Peter Zijlstra
2021-04-26 22:21       ` Josh Don
2021-04-27 17:10         ` Don Hiatt
2021-04-27 23:35           ` Josh Don
2021-04-28  1:03             ` Aubrey Li
2021-04-28  6:05               ` Aubrey Li
2021-04-28 10:57                 ` Aubrey Li
2021-04-28 16:41                   ` Don Hiatt
2021-04-29 20:48                     ` Josh Don
2021-04-29 21:09                       ` Don Hiatt
2021-04-29 23:22                         ` Josh Don
2021-04-30 16:18                           ` Don Hiatt
2021-04-30  8:26                         ` Aubrey Li
2021-04-28 16:04             ` Don Hiatt [this message]
2021-04-27 23:30         ` Josh Don
2021-04-28  9:13           ` Peter Zijlstra
2021-04-28 10:35             ` Aubrey Li
2021-04-28 11:03               ` Peter Zijlstra
2021-04-28 14:18                 ` Paul E. McKenney
2021-04-29 20:11             ` Josh Don
2021-05-03 19:17               ` Peter Zijlstra
2021-04-28  7:13         ` Peter Zijlstra
2021-04-28  6:02   ` Aubrey Li
2021-04-29  8:03   ` Aubrey Li
2021-04-29 20:39     ` Josh Don
2021-04-30  8:20       ` Aubrey Li
2021-04-30  8:48         ` Josh Don
2021-04-30 14:15           ` Aubrey Li
2021-05-04  7:38       ` Peter Zijlstra
2021-05-05 16:20         ` Don Hiatt
2021-05-06 10:25           ` Peter Zijlstra
2021-05-07  9:50   ` [PATCH v2 " Peter Zijlstra
2021-05-08  8:07     ` Aubrey Li
2021-05-12  9:07       ` Peter Zijlstra
2021-04-22 12:05 ` [PATCH 05/19] sched: " Peter Zijlstra
2021-05-07  9:50   ` [PATCH v2 " Peter Zijlstra
2021-05-12 10:28     ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 06/19] sched: Optimize rq_lockp() usage Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 07/19] sched: Allow sched_core_put() from atomic context Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 08/19] sched: Introduce sched_class::pick_task() Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 09/19] sched: Basic tracking of matching tasks Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 10/19] sched: Add core wide task selection and scheduling Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 11/19] sched/fair: Fix forced idle sibling starvation corner case Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Vineeth Pillai
2021-04-22 12:05 ` [PATCH 12/19] sched: Fix priority inversion of cookied task with sibling Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Joel Fernandes (Google)
2021-04-22 12:05 ` [PATCH 13/19] sched/fair: Snapshot the min_vruntime of CPUs on force idle Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Joel Fernandes (Google)
2021-04-22 12:05 ` [PATCH 14/19] sched: Trivial forced-newidle balancer Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 15/19] sched: Migration changes for core scheduling Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Aubrey Li
2021-04-22 12:05 ` [PATCH 16/19] sched: Trivial core scheduling cookie management Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 17/19] sched: Inherit task cookie on fork() Peter Zijlstra
2021-05-10 16:06   ` Joel Fernandes
2021-05-10 16:22     ` Chris Hyser
2021-05-10 20:47       ` Joel Fernandes
2021-05-10 21:38         ` Chris Hyser
2021-05-12  9:05           ` Peter Zijlstra
2021-05-12 20:20             ` Josh Don
2021-05-12 21:07               ` Don Hiatt
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 18/19] sched: prctl() core-scheduling interface Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Chris Hyser
2021-06-14 23:36   ` [PATCH 18/19] " Josh Don
2021-06-15 11:31     ` Joel Fernandes
2021-08-05 16:53   ` Eugene Syromiatnikov
2021-08-05 17:00     ` Peter Zijlstra
2021-08-17 15:15   ` Eugene Syromiatnikov
2021-08-17 15:52     ` Peter Zijlstra
2021-08-17 23:17       ` Eugene Syromiatnikov
2021-08-19 11:09         ` [PATCH] sched: Fix Core-wide rq->lock for uninitialized CPUs Peter Zijlstra
2021-08-19 15:50           ` Tao Zhou
2021-08-19 16:19           ` Eugene Syromiatnikov
2021-08-20  0:18           ` Josh Don
2021-08-20 10:02             ` Peter Zijlstra
2021-08-23  9:07           ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2021-04-22 12:05 ` [PATCH 19/19] kselftest: Add test for core sched prctl interface Peter Zijlstra
2021-05-12 10:28   ` [tip: sched/core] " tip-bot2 for Chris Hyser
2021-04-22 16:43 ` [PATCH 00/19] sched: Core Scheduling Don Hiatt
2021-04-22 17:29   ` Peter Zijlstra
2021-04-30  6:47 ` Ning, Hongyu
2021-05-06 10:29   ` Peter Zijlstra
2021-05-06 12:53     ` Ning, Hongyu
2021-05-07 18:02 ` Joel Fernandes
2021-05-10 16:16 ` Vincent Guittot
2021-05-11  7:00   ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOY2WowNjhMx1M+mOTyhNuMQQijZ-Mu6J4nxE3zO+_E-oFMZdA@mail.gmail.com \
    --to=dhiatt@digitalocean.com \
    --cc=chris.hyser@oracle.com \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.