All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: "mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"bsegall@google.com" <bsegall@google.com>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"bristot@redhat.com" <bristot@redhat.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	Michael Kelley <mikelley@microsoft.com>
Subject: v5.10: sched_cpu_dying() hits BUG_ON during hibernation: kernel BUG at kernel/sched/core.c:7596!
Date: Tue, 22 Dec 2020 09:13:46 +0000	[thread overview]
Message-ID: <MW4PR21MB1857BF96E59E75EF9CE406E2BFDF9@MW4PR21MB1857.namprd21.prod.outlook.com> (raw)

Hi,
I'm running a Linux VM with the recent mainline (48342fc07272, 12/20/2020) on Hyper-V.
When I test hibernation, the VM can easily hit the below BUG_ON during the resume
procedure (I estimate this can repro about 1/5 of the time). BTW, my VM has 40 vCPUs.

I can't repro the BUG_ON with v5.9.0, so I suspect something in v5.10.0 may be broken?

In v5.10.0, when the BUG_ON happens, rq->nr_running==2, and rq->nr_pinned==0:

7587 int sched_cpu_dying(unsigned int cpu)
7588 {
7589         struct rq *rq = cpu_rq(cpu);
7590         struct rq_flags rf;
7591
7592         /* Handle pending wakeups and then migrate everything off */
7593         sched_tick_stop(cpu);
7594
7595         rq_lock_irqsave(rq, &rf);
7596         BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq));
7597         rq_unlock_irqrestore(rq, &rf);
7598
7599         calc_load_migrate(rq);
7600         update_max_interval();
7601         nohz_balance_exit_idle(rq);
7602         hrtick_clear(rq);
7603         return 0;
7604 }

The last commit that touches the BUG_ON line is the commit
3015ef4b98f5 ("sched/core: Make migrate disable and CPU hotplug cooperative")
but the commit looks good to me.

Any idea?

Thanks,
-- Dexuan

[    5.383378] PM: Image signature found, resuming
[    5.388027] PM: hibernation: resume from hibernation
[    5.395794] Freezing user space processes ... (elapsed 0.001 seconds) done.
[    5.397058] OOM killer disabled.
[    5.397059] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[    5.413013] PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff]
[    5.419331] PM: hibernation: Marking nosave pages: [mem 0x0009f000-0x000fffff]
[    5.425502] PM: hibernation: Marking nosave pages: [mem 0xf7ff0000-0xffffffff]
[    5.431706] PM: hibernation: Basic memory bitmaps created
[    5.465205] PM: Using 3 thread(s) for decompression
[    5.469590] PM: Loading and decompressing image data (505553 pages)...
[    5.492790] PM: Image loading progress:   0%
[    6.532641] PM: Image loading progress:  10%
[    6.899683] PM: Image loading progress:  20%
[    7.251672] PM: Image loading progress:  30%
[    7.598808] PM: Image loading progress:  40%
[    8.056153] PM: Image loading progress:  50%
[    8.534077] PM: Image loading progress:  60%
[    9.029886] PM: Image loading progress:  70%
[    9.542015] PM: Image loading progress:  80%
[   10.025326] PM: Image loading progress:  90%
[   10.525804] PM: Image loading progress: 100%
[   10.530241] PM: Image loading done
[   10.533295] PM: hibernation: Read 2022212 kbytes in 5.05 seconds (400.43 MB/s)
[   10.540827] PM: Image successfully loaded
[   10.599243] serial 00:04: disabled
[   10.619935] vmbus 242ff919-07db-4180-9c2e-b86cb68c8c55: parent VMBUS:01 should not be sleeping
[   10.646542] Disabling non-boot CPUs ...
[   10.694380] smpboot: CPU 1 is now offline
[   10.729044] smpboot: CPU 2 is now offline
[   10.756819] smpboot: CPU 3 is now offline
[   10.776784] smpboot: CPU 4 is now offline
[   10.800866] smpboot: CPU 5 is now offline
[   10.828923] smpboot: CPU 6 is now offline
[   10.849013] smpboot: CPU 7 is now offline
[   10.876722] smpboot: CPU 8 is now offline
[   10.909426] smpboot: CPU 9 is now offline
[   10.929360] smpboot: CPU 10 is now offline
[   10.957059] smpboot: CPU 11 is now offline
[   10.976542] smpboot: CPU 12 is now offline
[   11.008470] smpboot: CPU 13 is now offline
[   11.036356] smpboot: CPU 14 is now offline
[   11.072396] smpboot: CPU 15 is now offline
[   11.100229] smpboot: CPU 16 is now offline
[   11.128638] smpboot: CPU 17 is now offline
[   11.148479] smpboot: CPU 18 is now offline
[   11.173537] smpboot: CPU 19 is now offline
[   11.205680] smpboot: CPU 20 is now offline
[   11.231799] smpboot: CPU 21 is now offline
[   11.259562] smpboot: CPU 22 is now offline
[   11.288672] smpboot: CPU 23 is now offline
[   11.319691] smpboot: CPU 24 is now offline
[   11.355523] smpboot: CPU 25 is now offline
[   11.399469] smpboot: CPU 26 is now offline
[   11.435438] smpboot: CPU 27 is now offline
[   11.471402] smpboot: CPU 28 is now offline
[   11.515488] smpboot: CPU 29 is now offline
[   11.551355] smpboot: CPU 30 is now offline
[   11.591326] smpboot: CPU 31 is now offline
[   11.624004] smpboot: CPU 32 is now offline
[   11.659326] smpboot: CPU 33 is now offline
[   11.687478] smpboot: CPU 34 is now offline
[   11.719243] smpboot: CPU 35 is now offline
[   11.747252] smpboot: CPU 36 is now offline
[   11.771224] smpboot: CPU 37 is now offline
[   11.795089] ------------[ cut here ]------------
[   11.798052] kernel BUG at kernel/sched/core.c:7596!
[   11.798052] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   11.798052] CPU: 38 PID: 202 Comm: migration/38 Tainted: G            E     5.10.0+ #6
[   11.798052] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[   11.798052] Stopper: multi_cpu_stop+0x0/0xf0 <- 0x0
[   11.798052] RIP: 0010:sched_cpu_dying+0xa3/0xc0
[   11.798052] Code: 73 85 05 00 84 c0 75 12 48 83 c4 08 31 c0 5b c3 f0 48 01 05 af f4 7e ...
[   11.798052] RSP: 0000:ffffbb3c820bfde0 EFLAGS: 00010002
[   11.798052] RAX: 0000000000000082 RBX: ffff94f83acaac40 RCX: 0000000000000000
[   11.798052] RDX: 0000000000000001 RSI: 000000000000005a RDI: 0000000000000001
[   11.798052] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000
[   11.798052] R10: 000000000000000b R11: ffffffff89cbed88 R12: ffffffff88aa7ed0
[   11.798052] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003
[   11.798052] FS:  0000000000000000(0000) GS:ffff94f83ac80000(0000) knlGS:0000000000000000
[   11.798052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.798052] CR2: 0000000000000000 CR3: 00000001144fa002 CR4: 00000000003706e0
[   11.798052] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   11.798052] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   11.798052] Call Trace:
[   11.798052]  ? rcutree_dying_cpu+0x12/0x30
[   11.798052]  cpuhp_invoke_callback+0x82/0x4a0
[   11.798052]  take_cpu_down+0x67/0xa0
[   11.798052]  multi_cpu_stop+0x64/0xf0
[   11.798052]  ? stop_machine_yield+0x10/0x10
[   11.798052]  cpu_stopper_thread+0x95/0x130
[   11.798052]  ? sort_range+0x20/0x20
[   11.798052]  smpboot_thread_fn+0x198/0x230
[   11.798052]  kthread+0x13d/0x160
[   11.798052]  ? kthread_create_on_node+0x60/0x60
[   11.798052]  ret_from_fork+0x22/0x30
[   11.798052] Modules linked in: hv_utils(E) cn(E) hid_generic(E) ...
[   11.798052] ---[ end trace 9f1a31ebcf9c45a1 ]---
[   11.798052] RIP: 0010:sched_cpu_dying+0xa3/0xc0
[   11.798052] Code: 73 85 05 00 84 c0 75 12 48 83 c4 08 31 c0 5b c3 f0 48 01 05 af f4 7e ...
[   11.798052] RSP: 0000:ffffbb3c820bfde0 EFLAGS: 00010002
[   11.798052] RAX: 0000000000000082 RBX: ffff94f83acaac40 RCX: 0000000000000000
[   11.798052] RDX: 0000000000000001 RSI: 000000000000005a RDI: 0000000000000001
[   11.798052] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000
[   11.798052] R10: 000000000000000b R11: ffffffff89cbed88 R12: ffffffff88aa7ed0
[   11.798052] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003
[   11.798052] FS:  0000000000000000(0000) GS:ffff94f83ac80000(0000) knlGS:0000000000000000
[   11.798052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.798052] CR2: 0000000000000000 CR3: 00000001144fa002 CR4: 00000000003706e0
[   11.798052] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   11.798052] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   11.798052] note: migration/38[202] exited with preempt_count 2


             reply	other threads:[~2020-12-22  9:14 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-22  9:13 Dexuan Cui [this message]
2020-12-22 13:39 ` v5.10: sched_cpu_dying() hits BUG_ON during hibernation: kernel BUG at kernel/sched/core.c:7596! Valentin Schneider
2020-12-22 21:44   ` Dexuan Cui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW4PR21MB1857BF96E59E75EF9CE406E2BFDF9@MW4PR21MB1857.namprd21.prod.outlook.com \
    --to=decui@microsoft.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mikelley@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.