All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: Valentin Schneider <valentin.schneider@arm.com>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"bsegall@google.com" <bsegall@google.com>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"bristot@redhat.com" <bristot@redhat.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	Michael Kelley <mikelley@microsoft.com>
Subject: RE: v5.10: sched_cpu_dying() hits BUG_ON during hibernation: kernel BUG at kernel/sched/core.c:7596!
Date: Tue, 22 Dec 2020 21:44:37 +0000	[thread overview]
Message-ID: <MW4PR21MB1857209BF0AB8C074FA4A5B6BFDF9@MW4PR21MB1857.namprd21.prod.outlook.com> (raw)
In-Reply-To: <jhjlfdqrmc6.mognet@arm.com>

> From: Valentin Schneider <valentin.schneider@arm.com>
> Sent: Tuesday, December 22, 2020 5:40 AM
> To: Dexuan Cui <decui@microsoft.com>
> Cc: mingo@redhat.com; peterz@infradead.org; juri.lelli@redhat.com;
> vincent.guittot@linaro.org; dietmar.eggemann@arm.com;
> rostedt@goodmis.org; bsegall@google.com; mgorman@suse.de;
> bristot@redhat.com; x86@kernel.org; linux-pm@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-hyperv@vger.kernel.org; Michael Kelley
> <mikelley@microsoft.com>
> Subject: Re: v5.10: sched_cpu_dying() hits BUG_ON during hibernation: kernel
> BUG at kernel/sched/core.c:7596!
> 
> 
> Hi,
> 
> On 22/12/20 09:13, Dexuan Cui wrote:
> > Hi,
> > I'm running a Linux VM with the recent mainline (48342fc07272, 12/20/2020)
> on Hyper-V.
> > When I test hibernation, the VM can easily hit the below BUG_ON during the
> resume
> > procedure (I estimate this can repro about 1/5 of the time). BTW, my VM has
> 40 vCPUs.
> >
> > I can't repro the BUG_ON with v5.9.0, so I suspect something in v5.10.0 may
> be broken?
> >
> > In v5.10.0, when the BUG_ON happens, rq->nr_running==2, and
> rq->nr_pinned==0:
> >
> > 7587 int sched_cpu_dying(unsigned int cpu)
> > 7588 {
> > 7589         struct rq *rq = cpu_rq(cpu);
> > 7590         struct rq_flags rf;
> > 7591
> > 7592         /* Handle pending wakeups and then migrate everything off
> */
> > 7593         sched_tick_stop(cpu);
> > 7594
> > 7595         rq_lock_irqsave(rq, &rf);
> > 7596         BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq));
> > 7597         rq_unlock_irqrestore(rq, &rf);
> > 7598
> > 7599         calc_load_migrate(rq);
> > 7600         update_max_interval();
> > 7601         nohz_balance_exit_idle(rq);
> > 7602         hrtick_clear(rq);
> > 7603         return 0;
> > 7604 }
> >
> > The last commit that touches the BUG_ON line is the commit
> > 3015ef4b98f5 ("sched/core: Make migrate disable and CPU hotplug
> cooperative")
> > but the commit looks good to me.
> >
> > Any idea?
> >
> 
> I'd wager this extra task is a kworker; could you give this series a try?
> 
> 
> https ://lore.kernel.org/lkml/20201218170919.2950-1-jiangshanlai@gmail.com/

Thanks, Valentin! It looks like the patchset can fix the BUG_ON, though I see
a warning, which I reported here: https://lkml.org/lkml/2020/12/22/648

Thanks,
-- Dexuan

      reply	other threads:[~2020-12-22 21:45 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-22  9:13 v5.10: sched_cpu_dying() hits BUG_ON during hibernation: kernel BUG at kernel/sched/core.c:7596! Dexuan Cui
2020-12-22 13:39 ` Valentin Schneider
2020-12-22 21:44   ` Dexuan Cui [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW4PR21MB1857209BF0AB8C074FA4A5B6BFDF9@MW4PR21MB1857.namprd21.prod.outlook.com \
    --to=decui@microsoft.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mikelley@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.