* [PATCH 0/2] x86/hyperv: fix kexec/kdump hang on some VMs @ 2020-10-14 9:24 Kairui Song 2020-10-14 9:24 ` [PATCH 1/2] x86/kexec: Use up-to-dated screen_info copy to fill boot params Kairui Song 2020-10-14 9:24 ` [PATCH 2/2] hyperv_fb: Update screen_info after removing old framebuffer Kairui Song 0 siblings, 2 replies; 5+ messages in thread From: Kairui Song @ 2020-10-14 9:24 UTC (permalink / raw) To: linux-kernel Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ard Biesheuvel, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bartlomiej Zolnierkiewicz, Dave Young, x86, linux-hyperv, kexec, Kairui Song On some HyperV machines, if kexec_file_load is used to load the kexec kernel, second kernel could hang with following stacktrace: [ 0.591705] efifb: probing for efifb [ 0.596869] efifb: framebuffer at 0xf8000000, using 3072k, total 3072k [ 0.605894] efifb: mode is 1024x768x32, linelength=4096, pages=1 [ 0.617926] efifb: scrolling: redraw [ 0.622715] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0 [ 28.039046] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1] [ 28.039046] Modules linked in: [ 28.039046] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-230.el8.x86_64 #1 [ 28.039046] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019 [ 28.039046] RIP: 0010:cfb_imageblit+0x450/0x4c0 [ 28.039046] Code: 89 f8 b9 08 00 00 00 48 89 04 24 eb 2d 41 0f be 30 29 e9 4c 8d 5f 04 d3 fe 44 21 ee 41 8b 04 b6 44 21 c8 89 c6 44 31 d6 89 37 <85> c9 75 09 49 83 c0 01 b9 08 00 00 00 4c 89 df 48 39 df 75 ce 83 [ 28.039046] RSP: 0018:ffffc90000087830 EFLAGS: 00010246 ORIG_RAX: ffffffffffffff12 [ 28.039046] RAX: 0000000000000000 RBX: ffffc90000542000 RCX: 0000000000000003 [ 28.039046] RDX: 000000000000000e RSI: 0000000000000000 RDI: ffffc90000541bf0 [ 28.039046] RBP: 0000000000000001 R08: ffff8880f555c8df R09: 0000000000aaaaaa [ 28.039046] R10: 0000000000000000 R11: ffffc90000541bf4 R12: 0000000000001000 [ 28.039046] R13: 0000000000000001 R14: ffffffff81e9a460 R15: ffff8880f555c880 [ 28.039046] FS: 0000000000000000(0000) GS:ffff8880f1000000(0000) knlGS:0000000000000000 [ 28.039046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 28.039046] CR2: 00007f7b223b8000 CR3: 00000000f3a0a004 CR4: 00000000003606b0 [ 28.039046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 28.039046] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 28.039046] Call Trace: [ 28.039046] bit_putcs+0x2a1/0x550 [ 28.039046] ? fbcon_switch+0x33e/0x5b0 [ 28.039046] ? bit_clear+0x120/0x120 [ 28.039046] fbcon_putcs+0xe7/0x100 [ 28.039046] do_update_region+0x154/0x1a0 [ 28.039046] redraw_screen+0x209/0x240 [ 28.039046] ? vc_do_resize+0x5c9/0x660 [ 28.039046] fbcon_prepare_logo+0x3b3/0x430 [ 28.039046] fbcon_init+0x436/0x630 [ 28.039046] visual_init+0xce/0x130 [ 28.039046] do_bind_con_driver+0x1df/0x2d0 [ 28.039046] do_take_over_console+0x113/0x180 [ 28.039046] do_fbcon_takeover+0x58/0xb0 [ 28.039046] register_framebuffer+0x225/0x2f0 [ 28.039046] efifb_probe.cold.5+0x51a/0x55d [ 28.039046] platform_drv_probe+0x38/0x90 [ 28.039046] really_probe+0x212/0x440 [ 28.039046] driver_probe_device+0x49/0xc0 [ 28.039046] device_driver_attach+0x50/0x60 [ 28.039046] __driver_attach+0x61/0x130 [ 28.039046] ? device_driver_attach+0x60/0x60 [ 28.039046] bus_for_each_dev+0x77/0xc0 [ 28.039046] ? klist_add_tail+0x57/0x70 [ 28.039046] bus_add_driver+0x14d/0x1e0 [ 28.039046] ? vesafb_driver_init+0x13/0x13 [ 28.039046] ? do_early_param+0x91/0x91 [ 28.039046] driver_register+0x6b/0xb0 [ 28.039046] ? vesafb_driver_init+0x13/0x13 [ 28.039046] do_one_initcall+0x46/0x1c3 [ 28.039046] ? do_early_param+0x91/0x91 [ 28.039046] kernel_init_freeable+0x1b4/0x25d [ 28.039046] ? rest_init+0xaa/0xaa [ 28.039046] kernel_init+0xa/0xfa [ 28.039046] ret_from_fork+0x35/0x40 The root cause is that hyperv_fb driver will relocate the framebuffer address in first kernel, but kexec_file_load simply reuse the old framebuffer info from boot_params, which is now invalid, so second kernel will write to an invalid framebuffer address. This series fix this problem by: 1. Let kexec_file_load use the updated copy of screen_info. Instead of using boot_params.screen_info, use the globally available screen_info variable instead (which is just an copy of boot_params.screen_info on x86). This variable could be updated by arch indenpendent drivers. Just keep this variable updated should be a good way to keep screen_info consistent across kexec. 2. Let hyperv_fb clean the screen_info copy when the boot framebuffer is relocated outside the old framebuffer. After the relocation, the framebuffer is no longer a VGA framebuffer, so just clean it up should be good. Kairui Song (2): x86/kexec: Use up-to-dated screen_info copy to fill boot params hyperv_fb: Update screen_info after removing old framebuffer arch/x86/kernel/kexec-bzimage64.c | 3 +-- drivers/video/fbdev/hyperv_fb.c | 8 ++++++++ 2 files changed, 9 insertions(+), 2 deletions(-) -- 2.28.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] x86/kexec: Use up-to-dated screen_info copy to fill boot params 2020-10-14 9:24 [PATCH 0/2] x86/hyperv: fix kexec/kdump hang on some VMs Kairui Song @ 2020-10-14 9:24 ` Kairui Song 2020-11-17 3:39 ` Dexuan Cui 2020-10-14 9:24 ` [PATCH 2/2] hyperv_fb: Update screen_info after removing old framebuffer Kairui Song 1 sibling, 1 reply; 5+ messages in thread From: Kairui Song @ 2020-10-14 9:24 UTC (permalink / raw) To: linux-kernel Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ard Biesheuvel, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bartlomiej Zolnierkiewicz, Dave Young, x86, linux-hyperv, kexec, Kairui Song kexec_file_load now just reuse the old boot_params.screen_info. But if drivers have change the hardware state, boot_param.screen_info could contain invalid info. For example, the video type might be no longer VGA, or frame buffer address changed. If kexec kernel keep using the old screen_info, kexec'ed kernel may attempt to write to an invalid framebuffer memory region. There are two screen_info globally available, boot_params.screen_info and screen_info. Later one is a copy, and could be updated by drivers. So let kexec_file_load use the updated copy. Signed-off-by: Kairui Song <kasong@redhat.com> --- arch/x86/kernel/kexec-bzimage64.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index 57c2ecf43134..ce831f9448e7 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -200,8 +200,7 @@ setup_boot_parameters(struct kimage *image, struct boot_params *params, params->hdr.hardware_subarch = boot_params.hdr.hardware_subarch; /* Copying screen_info will do? */ - memcpy(¶ms->screen_info, &boot_params.screen_info, - sizeof(struct screen_info)); + memcpy(¶ms->screen_info, &screen_info, sizeof(struct screen_info)); /* Fill in memsize later */ params->screen_info.ext_mem_k = 0; -- 2.28.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [PATCH 1/2] x86/kexec: Use up-to-dated screen_info copy to fill boot params 2020-10-14 9:24 ` [PATCH 1/2] x86/kexec: Use up-to-dated screen_info copy to fill boot params Kairui Song @ 2020-11-17 3:39 ` Dexuan Cui 2020-11-25 23:39 ` Dexuan Cui 0 siblings, 1 reply; 5+ messages in thread From: Dexuan Cui @ 2020-11-17 3:39 UTC (permalink / raw) To: Kairui Song, linux-kernel Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ard Biesheuvel, KY Srinivasan, Haiyang Zhang, Wei Liu, Bartlomiej Zolnierkiewicz, Dave Young, x86, linux-hyperv, kexec > From: Kairui Song <kasong@redhat.com> > Sent: Wednesday, October 14, 2020 2:24 AM > To: linux-kernel@vger.kernel.org > > kexec_file_load now just reuse the old boot_params.screen_info. > But if drivers have change the hardware state, boot_param.screen_info > could contain invalid info. > > For example, the video type might be no longer VGA, or frame buffer > address changed. If kexec kernel keep using the old screen_info, > kexec'ed kernel may attempt to write to an invalid framebuffer > memory region. > > There are two screen_info globally available, boot_params.screen_info > and screen_info. Later one is a copy, and could be updated by drivers. > > So let kexec_file_load use the updated copy. > > Signed-off-by: Kairui Song <kasong@redhat.com> > --- > arch/x86/kernel/kexec-bzimage64.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/kexec-bzimage64.c > b/arch/x86/kernel/kexec-bzimage64.c > index 57c2ecf43134..ce831f9448e7 100644 > --- a/arch/x86/kernel/kexec-bzimage64.c > +++ b/arch/x86/kernel/kexec-bzimage64.c > @@ -200,8 +200,7 @@ setup_boot_parameters(struct kimage *image, struct > boot_params *params, > params->hdr.hardware_subarch = boot_params.hdr.hardware_subarch; > > /* Copying screen_info will do? */ > - memcpy(¶ms->screen_info, &boot_params.screen_info, > - sizeof(struct screen_info)); > + memcpy(¶ms->screen_info, &screen_info, sizeof(struct screen_info)); > > /* Fill in memsize later */ > params->screen_info.ext_mem_k = 0; > -- Hi Kairui, According to "man kexec", kdump/kexec can use 2 different syscalls to set up the kdump kernel: -s (--kexec-file-syscall) Specify that the new KEXEC_FILE_LOAD syscall should be used exclusively. -c (--kexec-syscall) Specify that the old KEXEC_LOAD syscall should be used exclusively (the default). It looks I can only reproduce the call-trace (https://bugzilla.redhat.com/show_bug.cgi?id=1867887#c5) with KEXEC_FILE_LOAD: I did kdump tests in Ubuntu 20.04 VM and by default the VM used the KEXEC_LOAD syscall and I couldn't reproduce the call-trace; after I added the "-s" parameter to use the KEXEC_FILE_LOAD syscall, I could reproduce the call-trace and I can confirm your patch can eliminate the call-trace because the "efifb" driver doesn't even load with your patch. Your patch is only for the KEXEC_FILE_LOAD syscall, and I'm sure it's not used in the code path of the KEXEC_LOAD syscall. So, in the case of the KEXEC_LOAD syscall, do you know how the *kexec* kernel's boot_params.screen_info.lfb_base is intialized? I haven't figured it out yet. Thanks, -- Dexuan ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH 1/2] x86/kexec: Use up-to-dated screen_info copy to fill boot params 2020-11-17 3:39 ` Dexuan Cui @ 2020-11-25 23:39 ` Dexuan Cui 0 siblings, 0 replies; 5+ messages in thread From: Dexuan Cui @ 2020-11-25 23:39 UTC (permalink / raw) To: 'Kairui Song', 'linux-kernel@vger.kernel.org' Cc: 'Thomas Gleixner', 'Ingo Molnar', 'Borislav Petkov', 'Ard Biesheuvel', KY Srinivasan, Haiyang Zhang, 'Wei Liu', 'Bartlomiej Zolnierkiewicz', 'Dave Young', 'x86@kernel.org', 'linux-hyperv@vger.kernel.org', 'kexec@lists.infradead.org', Michael Kelley > From: Dexuan Cui > Sent: Monday, November 16, 2020 7:40 PM > > diff --git a/arch/x86/kernel/kexec-bzimage64.c > > b/arch/x86/kernel/kexec-bzimage64.c > > index 57c2ecf43134..ce831f9448e7 100644 > > --- a/arch/x86/kernel/kexec-bzimage64.c > > +++ b/arch/x86/kernel/kexec-bzimage64.c > > @@ -200,8 +200,7 @@ setup_boot_parameters(struct kimage *image, struct > > boot_params *params, > > params->hdr.hardware_subarch = boot_params.hdr.hardware_subarch; > > > > /* Copying screen_info will do? */ > > - memcpy(¶ms->screen_info, &boot_params.screen_info, > > - sizeof(struct screen_info)); > > + memcpy(¶ms->screen_info, &screen_info, sizeof(struct screen_info)); > > > > /* Fill in memsize later */ > > params->screen_info.ext_mem_k = 0; > > -- > > Hi Kairui, > According to "man kexec", kdump/kexec can use 2 different syscalls to set up > the > kdump kernel: > > -s (--kexec-file-syscall) > Specify that the new KEXEC_FILE_LOAD syscall should be used > exclusively. > > -c (--kexec-syscall) > Specify that the old KEXEC_LOAD syscall should be used exclusively > (the default). > > It looks I can only reproduce the call-trace > (https://bugzilla.redhat.com/show_bug.cgi?id=1867887#c5) with > KEXEC_FILE_LOAD: > I did kdump tests in Ubuntu 20.04 VM and by default the VM used the > KEXEC_LOAD > syscall and I couldn't reproduce the call-trace; after I added the "-s" parameter > to use > the KEXEC_FILE_LOAD syscall, I could reproduce the call-trace and I can confirm > your > patch can eliminate the call-trace because the "efifb" driver doesn't even load > with > your patch. > > Your patch is only for the KEXEC_FILE_LOAD syscall, and I'm sure it's not used in > the code path of the KEXEC_LOAD syscall. > > So, in the case of the KEXEC_LOAD syscall, do you know how the *kexec* > kernel's boot_params.screen_info.lfb_base is intialized? I haven't figured it > out yet. FYI: in the case of the KEXEC_LOAD syscall, I think the lfb_base of the kexec kernel is pre-setup by the kexec tool (see the function setup_linux_vesafb()): https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/i386/x86-linux-setup.c#n126 static int setup_linux_vesafb(struct x86_linux_param_header *real_mode) { struct fb_fix_screeninfo fix; struct fb_var_screeninfo var; int fd; fd = open("/dev/fb0", O_RDONLY); if (-1 == fd) return -1; if (-1 == ioctl(fd, FBIOGET_FSCREENINFO, &fix)) goto out; if (-1 == ioctl(fd, FBIOGET_VSCREENINFO, &var)) goto out; if (0 == strcmp(fix.id, "VESA VGA")) { /* VIDEO_TYPE_VLFB */ real_mode->orig_video_isVGA = 0x23; } else if (0 == strcmp(fix.id, "EFI VGA")) { /* VIDEO_TYPE_EFI */ real_mode->orig_video_isVGA = 0x70; } else if (arch_options.reuse_video_type) { int err; off_t offset = offsetof(typeof(*real_mode), orig_video_isVGA); /* blindly try old boot time video type */ err = get_bootparam(&real_mode->orig_video_isVGA, offset, 1); if (err) goto out; } else { real_mode->orig_video_isVGA = 0; close(fd); return 0; } When a Ubuntu 20.10 VM (kexec-tools-2.0.20) runs on Hyper-V, we should fall into the last condition, i.e. setting "real_mode->orig_video_isVGA = 0;", so the "efifb" driver does not load in the kdump kernel. Ubuntu 20.04 (kexec-tools-2.0.18) is a little old in that it does not have Kairui's patch https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/commit/?id=fb5a8792e6e4ee7de7ae3e06d193ea5beaaececc , so it re-uses the VRAM location set up by the hyperv_fb driver, which is undesirable because the "efifb" driver doesn't know it's accessing an "incompatible" framebuffer -- IMO this may be just a small issue, but anyay I hope Ubuntu 20.04's kexec-tools will pick up your patch. So, now we should cover all the combinations if we use the latest kernel and the latest kexec-tools, and the "efifb" driver in the kdump kernel doesn't load. Thanks, -- Dexuan ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] hyperv_fb: Update screen_info after removing old framebuffer 2020-10-14 9:24 [PATCH 0/2] x86/hyperv: fix kexec/kdump hang on some VMs Kairui Song 2020-10-14 9:24 ` [PATCH 1/2] x86/kexec: Use up-to-dated screen_info copy to fill boot params Kairui Song @ 2020-10-14 9:24 ` Kairui Song 1 sibling, 0 replies; 5+ messages in thread From: Kairui Song @ 2020-10-14 9:24 UTC (permalink / raw) To: linux-kernel Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ard Biesheuvel, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bartlomiej Zolnierkiewicz, Dave Young, x86, linux-hyperv, kexec, Kairui Song On gen2 HyperV VM, hyperv_fb will remove the old framebuffer, the new allocated framebuffer address could be at a differnt location, and it's no longer VGA framebuffer. Update screen_info so that after kexec, kernel won't try to reuse the old invalid framebuffer address as VGA. Signed-off-by: Kairui Song <kasong@redhat.com> --- drivers/video/fbdev/hyperv_fb.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c index 02411d89cb46..e36fb1a0ecdb 100644 --- a/drivers/video/fbdev/hyperv_fb.c +++ b/drivers/video/fbdev/hyperv_fb.c @@ -1114,8 +1114,15 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info) getmem_done: remove_conflicting_framebuffers(info->apertures, KBUILD_MODNAME, false); - if (!gen2vm) + + if (gen2vm) { + /* framebuffer is reallocated, clear screen_info to avoid misuse from kexec */ + screen_info.lfb_size = 0; + screen_info.lfb_base = 0; + screen_info.orig_video_isVGA = 0; + } else { pci_dev_put(pdev); + } kfree(info->apertures); return 0; -- 2.28.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-11-25 23:39 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-14 9:24 [PATCH 0/2] x86/hyperv: fix kexec/kdump hang on some VMs Kairui Song 2020-10-14 9:24 ` [PATCH 1/2] x86/kexec: Use up-to-dated screen_info copy to fill boot params Kairui Song 2020-11-17 3:39 ` Dexuan Cui 2020-11-25 23:39 ` Dexuan Cui 2020-10-14 9:24 ` [PATCH 2/2] hyperv_fb: Update screen_info after removing old framebuffer Kairui Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).