All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/1] xen: fix HVM kexec kernel panic
@ 2022-02-24 21:50 Dongli Zhang
  2022-02-24 21:50 ` [PATCH v3 1/1] xen: delay xen_hvm_init_time_ops() to xen_hvm_smp_prepare_boot_cpu() Dongli Zhang
  2022-02-25 22:39 ` [PATCH v3 0/1] xen: fix HVM kexec kernel panic Boris Ostrovsky
  0 siblings, 2 replies; 8+ messages in thread
From: Dongli Zhang @ 2022-02-24 21:50 UTC (permalink / raw)
  To: xen-devel, x86
  Cc: linux-kernel, boris.ostrovsky, jgross, sstabellini, tglx, mingo,
	bp, dave.hansen, hpa, joe.jin

This is the v3 of the patch to fix xen kexec kernel panic issue when the
kexec is triggered on VCPU >= 32.

PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1
[    0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020
[    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
... ...
[    0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX: 0000000000000000
[    0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: 0000000000000002
[    0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: 0000000000000020
[    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
[    0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: 0000000040000004
[    0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: 0000000000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffffab588000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: 00000000000406a0
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.000000] Call Trace:
[    0.000000]  <TASK>
[    0.000000]  ? xen_clocksource_read+0x24/0x40
[    0.000000]  ? xen_init_time_common+0x5/0x49
[    0.000000]  ? xen_hvm_init_time_ops+0x23/0x45
[    0.000000]  ? xen_hvm_guest_init+0x221/0x25c
[    0.000000]  ? 0xffffffffa9600000
[    0.000000]  ? setup_arch+0x440/0xbd6
[    0.000000]  ? start_kernel+0x6c/0x695
[    0.000000]  ? secondary_startup_64_no_verify+0xd5/0xdb
[    0.000000]  </TASK>


Changed since v1:
  - Add commit message to explain why xen_hvm_init_time_ops() is delayed
    for any vcpus. (Suggested by Boris Ostrovsky)
  - Add a comment in xen_hvm_smp_prepare_boot_cpu() referencing the related
    code in xen_hvm_guest_init(). (suggested by Juergen Gross)
Changed since v2:
  - Delay for all VCPUs. (Suggested by Boris Ostrovsky)
  - Add commit message that why PVM is not supported by this patch
  - Test if kexec/kdump works with mainline xen (HVM and PVM)


I have delayed the xen_hvm_init_time_ops() for all VCPUs. Unfortunately,
now I am able to reproduce the clock backward as shown below on some old
versions of xen. I am not able to reproduce on most recent mainline xen.

[    0.359687] pcpu-alloc: [0] 16 17 18 19 20 21 22 23 [0] 24 25 26 27 28 29 30 31
[    0.359694] pcpu-alloc: [0] 32 33 34 35 36 37 38 39 [0] 40 41 42 43 44 45 46 47
[    0.359701] pcpu-alloc: [0] 48 49 50 51 52 53 54 55 [0] 56 57 58 59 60 61 62 63

... clock backward after the clocksource is switched from native to xen...

[    0.000004] Fallback order for Node 0: 0
[    0.002967] Built 1 zonelists, mobility grouping on.  Total pages: 3527744
[    0.007129] Policy zone: Normal
[    0.008937] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.17.0-rc4xen-00054-gf71077a4d84b-dirty root=UUID=2a5975ab-a059-4697-9aee-7a53ddfeea21 ro text console=ttyS0,115200n8 console=tty1 earlyprintk=tty    S0,115200n8 loglevel=7 no_timer_check reboot=s32 splash crashkernel=512M-:192M vt.handoff=1
[    0.023880] Unknown kernel command line parameters "text splash BOOT_IMAGE=/boot/vmlinuz-5.17.0-rc4xen-00054-gf71077a4d84b-dirty", will be passed to user space.
[    0.032647] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.036828] printk: log_buf_len total cpu_extra contributions: 258048 bytes
[    0.041049] printk: log_buf_len min size: 262144 bytes
[    0.044481] printk: log_buf_len: 524288 bytes


Since now I am able to reproduce the clock backward on old xen version,
please let me know if I should re-use the v2 of this patch, as it has been
running well in our env well for very long time.

https://lore.kernel.org/all/20211028012543.8776-1-dongli.zhang@oracle.com/


BTW, I have tested that 'soft_reset' does not work with mainline xen, even
when I directly trigger kexec with below commands.

# kexec -l /boot/vmlinuz-5.17.0-rc4xen-00054-gf71077a4d84b-dirty \
        --initrd=/boot/initrd.img-5.17.0-rc4xen-00054-gf71077a4d84b-dirty \
        --reuse-cmdline
# kexec -e


Thank you very much!

Dongli Zhang



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/1] xen: delay xen_hvm_init_time_ops() to xen_hvm_smp_prepare_boot_cpu()
  2022-02-24 21:50 [PATCH v3 0/1] xen: fix HVM kexec kernel panic Dongli Zhang
@ 2022-02-24 21:50 ` Dongli Zhang
  2022-02-25 22:39 ` [PATCH v3 0/1] xen: fix HVM kexec kernel panic Boris Ostrovsky
  1 sibling, 0 replies; 8+ messages in thread
From: Dongli Zhang @ 2022-02-24 21:50 UTC (permalink / raw)
  To: xen-devel, x86
  Cc: linux-kernel, boris.ostrovsky, jgross, sstabellini, tglx, mingo,
	bp, dave.hansen, hpa, joe.jin

The sched_clock() can be used very early since commit 857baa87b642
("sched/clock: Enable sched clock early"). In addition, with commit
38669ba205d1 ("x86/xen/time: Output xen sched_clock time from 0"), kdump
kernel in Xen HVM guest may panic at very early stage when accessing
&__this_cpu_read(xen_vcpu)->time as in below:

setup_arch()
 -> init_hypervisor_platform()
     -> x86_init.hyper.init_platform = xen_hvm_guest_init()
         -> xen_hvm_init_time_ops()
             -> xen_clocksource_read()
                 -> src = &__this_cpu_read(xen_vcpu)->time;

This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.

However, when Xen HVM guest panic on vcpu >= 32, since
xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when
vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.

This patch always delays xen_hvm_init_time_ops() to later in
xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is
registered.

This issue can be reproduced on purpose via below command at the guest
side when kdump/kexec is enabled:

"taskset -c 33 echo c > /proc/sysrq-trigger"

Unfortunately, the 'soft_reset' (kexec) does not work with mainline xen
version so that I can test this patch only with HVM guest on old xen
hypervisor where 'soft_reset' is working. The bugfix for PVM is not
implemented due to the lack of testing environment.

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
  - Add commit message to explain why xen_hvm_init_time_ops() is delayed
    for any vcpus. (Suggested by Boris Ostrovsky)
  - Add a comment in xen_hvm_smp_prepare_boot_cpu() referencing the related
    code in xen_hvm_guest_init(). (suggested by Juergen Gross)
Changed since v2:
  - Delay for all VCPUs. (Suggested by Boris Ostrovsky)
  - Add commit message that why PVM is not supported by this patch
  - Test if kexec/kdump works with mainline xen (HVM and PVM)

 arch/x86/xen/enlighten_hvm.c |  1 -
 arch/x86/xen/smp_hvm.c       | 17 +++++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 517a9d8d8f94..53f306ec1d3b 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -216,7 +216,6 @@ static void __init xen_hvm_guest_init(void)
 	WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_hvm, xen_cpu_dead_hvm));
 	xen_unplug_emulated_devices();
 	x86_init.irqs.intr_init = xen_init_IRQ;
-	xen_hvm_init_time_ops();
 	xen_hvm_init_mmu_ops();
 
 #ifdef CONFIG_KEXEC_CORE
diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
index 6ff3c887e0b9..9a5efc1a1633 100644
--- a/arch/x86/xen/smp_hvm.c
+++ b/arch/x86/xen/smp_hvm.c
@@ -19,6 +19,23 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)
 	 */
 	xen_vcpu_setup(0);
 
+	/*
+	 * xen_hvm_init_time_ops() used to be called at very early stage
+	 * by xen_hvm_guest_init(). While only MAX_VIRT_CPUS 'vcpu_info'
+	 * are embedded inside 'shared_info', the VM would use them until
+	 * xen_vcpu_setup() is used to allocate/relocate them at arbitrary
+	 * address.
+	 *
+	 * However, when Xen HVM guest boots on vcpu >= MAX_VIRT_CPUS
+	 * (e.g., kexec kernel), per_cpu(xen_vcpu, cpu) is NULL at early
+	 * stage. To access per_cpu(xen_vcpu, cpu) via
+	 * xen_clocksource_read() would panic the kernel.
+	 *
+	 * Therefore we always delay xen_hvm_init_time_ops() to
+	 * xen_hvm_smp_prepare_boot_cpu() to avoid the panic.
+	 */
+	xen_hvm_init_time_ops();
+
 	/*
 	 * The alternative logic (which patches the unlock/lock) runs before
 	 * the smp bootup up code is activated. Hence we need to set this up
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/1] xen: fix HVM kexec kernel panic
  2022-02-24 21:50 [PATCH v3 0/1] xen: fix HVM kexec kernel panic Dongli Zhang
  2022-02-24 21:50 ` [PATCH v3 1/1] xen: delay xen_hvm_init_time_ops() to xen_hvm_smp_prepare_boot_cpu() Dongli Zhang
@ 2022-02-25 22:39 ` Boris Ostrovsky
  2022-02-26  1:17   ` Dongli Zhang
  1 sibling, 1 reply; 8+ messages in thread
From: Boris Ostrovsky @ 2022-02-25 22:39 UTC (permalink / raw)
  To: Dongli Zhang, xen-devel, x86
  Cc: linux-kernel, jgross, sstabellini, tglx, mingo, bp, dave.hansen,
	hpa, joe.jin


On 2/24/22 4:50 PM, Dongli Zhang wrote:
> This is the v3 of the patch to fix xen kexec kernel panic issue when the
> kexec is triggered on VCPU >= 32.
>
> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1
> [    0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020
> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
> ... ...
> [    0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX: 0000000000000000
> [    0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: 0000000000000002
> [    0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: 0000000000000020
> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
> [    0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: 0000000040000004
> [    0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: 0000000000000000
> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffab588000(0000) knlGS:0000000000000000
> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: 00000000000406a0
> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    0.000000] Call Trace:
> [    0.000000]  <TASK>
> [    0.000000]  ? xen_clocksource_read+0x24/0x40


This is done to set xen_sched_clock_offset which I think will not be used for a while, until sched_clock is called (and the other two uses are for suspend/resume)


Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until after all vcpu areas are properly set? Or are there other uses of xen_clocksource_read() before ?


-boris


> [    0.000000]  ? xen_init_time_common+0x5/0x49
> [    0.000000]  ? xen_hvm_init_time_ops+0x23/0x45
> [    0.000000]  ? xen_hvm_guest_init+0x221/0x25c
> [    0.000000]  ? 0xffffffffa9600000
> [    0.000000]  ? setup_arch+0x440/0xbd6
> [    0.000000]  ? start_kernel+0x6c/0x695
> [    0.000000]  ? secondary_startup_64_no_verify+0xd5/0xdb
> [    0.000000]  </TASK>
>
>
> Changed since v1:
>    - Add commit message to explain why xen_hvm_init_time_ops() is delayed
>      for any vcpus. (Suggested by Boris Ostrovsky)
>    - Add a comment in xen_hvm_smp_prepare_boot_cpu() referencing the related
>      code in xen_hvm_guest_init(). (suggested by Juergen Gross)
> Changed since v2:
>    - Delay for all VCPUs. (Suggested by Boris Ostrovsky)
>    - Add commit message that why PVM is not supported by this patch
>    - Test if kexec/kdump works with mainline xen (HVM and PVM)
>
>
> I have delayed the xen_hvm_init_time_ops() for all VCPUs. Unfortunately,
> now I am able to reproduce the clock backward as shown below on some old
> versions of xen. I am not able to reproduce on most recent mainline xen.
>
> [    0.359687] pcpu-alloc: [0] 16 17 18 19 20 21 22 23 [0] 24 25 26 27 28 29 30 31
> [    0.359694] pcpu-alloc: [0] 32 33 34 35 36 37 38 39 [0] 40 41 42 43 44 45 46 47
> [    0.359701] pcpu-alloc: [0] 48 49 50 51 52 53 54 55 [0] 56 57 58 59 60 61 62 63
>
> ... clock backward after the clocksource is switched from native to xen...
>
> [    0.000004] Fallback order for Node 0: 0
> [    0.002967] Built 1 zonelists, mobility grouping on.  Total pages: 3527744
> [    0.007129] Policy zone: Normal
> [    0.008937] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.17.0-rc4xen-00054-gf71077a4d84b-dirty root=UUID=2a5975ab-a059-4697-9aee-7a53ddfeea21 ro text console=ttyS0,115200n8 console=tty1 earlyprintk=tty    S0,115200n8 loglevel=7 no_timer_check reboot=s32 splash crashkernel=512M-:192M vt.handoff=1
> [    0.023880] Unknown kernel command line parameters "text splash BOOT_IMAGE=/boot/vmlinuz-5.17.0-rc4xen-00054-gf71077a4d84b-dirty", will be passed to user space.
> [    0.032647] printk: log_buf_len individual max cpu contribution: 4096 bytes
> [    0.036828] printk: log_buf_len total cpu_extra contributions: 258048 bytes
> [    0.041049] printk: log_buf_len min size: 262144 bytes
> [    0.044481] printk: log_buf_len: 524288 bytes
>
>
> Since now I am able to reproduce the clock backward on old xen version,
> please let me know if I should re-use the v2 of this patch, as it has been
> running well in our env well for very long time.
>
> https://lore.kernel.org/all/20211028012543.8776-1-dongli.zhang@oracle.com/
>
>
> BTW, I have tested that 'soft_reset' does not work with mainline xen, even
> when I directly trigger kexec with below commands.
>
> # kexec -l /boot/vmlinuz-5.17.0-rc4xen-00054-gf71077a4d84b-dirty \
>          --initrd=/boot/initrd.img-5.17.0-rc4xen-00054-gf71077a4d84b-dirty \
>          --reuse-cmdline
> # kexec -e
>
>
> Thank you very much!
>
> Dongli Zhang
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/1] xen: fix HVM kexec kernel panic
  2022-02-25 22:39 ` [PATCH v3 0/1] xen: fix HVM kexec kernel panic Boris Ostrovsky
@ 2022-02-26  1:17   ` Dongli Zhang
  2022-02-28 20:45     ` Boris Ostrovsky
  0 siblings, 1 reply; 8+ messages in thread
From: Dongli Zhang @ 2022-02-26  1:17 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel, x86
  Cc: linux-kernel, jgross, sstabellini, tglx, mingo, bp, dave.hansen,
	hpa, joe.jin

Hi Boris,

On 2/25/22 2:39 PM, Boris Ostrovsky wrote:
> 
> On 2/24/22 4:50 PM, Dongli Zhang wrote:
>> This is the v3 of the patch to fix xen kexec kernel panic issue when the
>> kexec is triggered on VCPU >= 32.
>>
>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1
>> [    0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020
>> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
>> ... ...
>> [    0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX:
>> 0000000000000000
>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: 0000000000000002
>> [    0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: 0000000000000020
>> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
>> [    0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: 0000000040000004
>> [    0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: 0000000000000000
>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffab588000(0000)
>> knlGS:0000000000000000
>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: 00000000000406a0
>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    0.000000] Call Trace:
>> [    0.000000]  <TASK>
>> [    0.000000]  ? xen_clocksource_read+0x24/0x40
> 
> 
> This is done to set xen_sched_clock_offset which I think will not be used for a
> while, until sched_clock is called (and the other two uses are for suspend/resume)
> 
> 
> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until
> after all vcpu areas are properly set? Or are there other uses of
> xen_clocksource_read() before ?
> 

I have tested that below patch will panic kdump kernel.

diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
index 6ff3c887e0b9..6a0c99941ae1 100644
--- a/arch/x86/xen/smp_hvm.c
+++ b/arch/x86/xen/smp_hvm.c
@@ -19,6 +19,8 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)
         */
        xen_vcpu_setup(0);

+       xen_init_sched_clock_offset();
+
        /*
         * The alternative logic (which patches the unlock/lock) runs before
         * the smp bootup up code is activated. Hence we need to set this up
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index d9c945ee1100..8a2eafa0c215 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -520,9 +520,14 @@ static void __init xen_time_init(void)
                pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier);
 }

-static void __init xen_init_time_common(void)
+void xen_init_sched_clock_offset(void)
 {
        xen_sched_clock_offset = xen_clocksource_read();
+}
+
+static void __init xen_init_time_common(void)
+{
+       xen_sched_clock_offset = 0;
        static_call_update(pv_steal_clock, xen_steal_clock);
        paravirt_set_sched_clock(xen_sched_clock);

diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index fd0fec6e92f4..9f7656214dfb 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -69,6 +69,7 @@ void xen_teardown_timer(int cpu);
 void xen_setup_cpu_clockevents(void);
 void xen_save_time_memory_area(void);
 void xen_restore_time_memory_area(void);
+void xen_init_sched_clock_offset(void);
 void xen_init_time_ops(void);
 void xen_hvm_init_time_ops(void);



Unfortunately, I am not able to obtain the panic callstack from kdump time this
time. I have only below.

PANIC: early exception 0x0e IP 10:ffffffffa6c679b6 error 0 cr2 0x20
PANIC: early exception 0x0e IP 10:ffffffffa6c679b6 error 0 cr2 0x20



The sched_clock() can be used very early since commit 857baa87b642
("sched/clock: Enable sched clock early"). Any printk should use sched_clock()
to obtain the timestamp.

vprintk_store()
-> local_clock()
   -> sched_clock()
      -> paravirt_sched_clock()
         -> xen_sched_clock()
            -> xen_clocksource_read()


AFAIR, we started to encounter the issue since commit 857baa87b642
("sched/clock: Enable sched clock early"). kdump used to work well before that
commit.

Thank you very much!

Dongli Zhang

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/1] xen: fix HVM kexec kernel panic
  2022-02-26  1:17   ` Dongli Zhang
@ 2022-02-28 20:45     ` Boris Ostrovsky
  2022-03-01  1:18       ` Dongli Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Boris Ostrovsky @ 2022-02-28 20:45 UTC (permalink / raw)
  To: Dongli Zhang, xen-devel, x86
  Cc: linux-kernel, jgross, sstabellini, tglx, mingo, bp, dave.hansen,
	hpa, joe.jin



On 2/25/22 8:17 PM, Dongli Zhang wrote:
> Hi Boris,
> 
> On 2/25/22 2:39 PM, Boris Ostrovsky wrote:
>>
>> On 2/24/22 4:50 PM, Dongli Zhang wrote:
>>> This is the v3 of the patch to fix xen kexec kernel panic issue when the
>>> kexec is triggered on VCPU >= 32.
>>>
>>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20
>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
>>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1
>>> [    0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020
>>> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
>>> ... ...
>>> [    0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX:
>>> 0000000000000000
>>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: 0000000000000002
>>> [    0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: 0000000000000020
>>> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
>>> [    0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: 0000000040000004
>>> [    0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: 0000000000000000
>>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffab588000(0000)
>>> knlGS:0000000000000000
>>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [    0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: 00000000000406a0
>>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [    0.000000] Call Trace:
>>> [    0.000000]  <TASK>
>>> [    0.000000]  ? xen_clocksource_read+0x24/0x40
>>
>>
>> This is done to set xen_sched_clock_offset which I think will not be used for a
>> while, until sched_clock is called (and the other two uses are for suspend/resume)
>>
>>
>> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until
>> after all vcpu areas are properly set? Or are there other uses of
>> xen_clocksource_read() before ?
>>
> 
> I have tested that below patch will panic kdump kernel.
> 



Oh well, so much for that then. Yes, sched_clock() is at least called from printk path.


I guess we will have to go with v2 then, we don't want to start seeing time going back, even if only with older hypervisors. The only thing I might ask is that you roll the logic inside xen_hvm_init_time_ops(). Something like


xen_hvm_init_time_ops()
{
	/*
	 * Wait until per_cpu(xen_vcpu, 0) is initialized which may happen
	 * later (e.g. when kdump kernel runs on >=MAX_VIRT_CPUS vcpu)
	 */
	if (__this_cpu_read(xen_vcpu_nr(0)) == NULL)
		return;

	...

-boris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/1] xen: fix HVM kexec kernel panic
  2022-02-28 20:45     ` Boris Ostrovsky
@ 2022-03-01  1:18       ` Dongli Zhang
  2022-03-01  4:56         ` Dongli Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Dongli Zhang @ 2022-03-01  1:18 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel, x86
  Cc: linux-kernel, jgross, sstabellini, tglx, mingo, bp, dave.hansen,
	hpa, joe.jin

Hi Boris,

On 2/28/22 12:45 PM, Boris Ostrovsky wrote:
> 
> 
> On 2/25/22 8:17 PM, Dongli Zhang wrote:
>> Hi Boris,
>>
>> On 2/25/22 2:39 PM, Boris Ostrovsky wrote:
>>>
>>> On 2/24/22 4:50 PM, Dongli Zhang wrote:
>>>> This is the v3 of the patch to fix xen kexec kernel panic issue when the
>>>> kexec is triggered on VCPU >= 32.
>>>>
>>>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20
>>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
>>>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1
>>>> [    0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020
>>>> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
>>>> ... ...
>>>> [    0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX:
>>>> 0000000000000000
>>>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX:
>>>> 0000000000000002
>>>> [    0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI:
>>>> 0000000000000020
>>>> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09:
>>>> 0000000000000001
>>>> [    0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12:
>>>> 0000000040000004
>>>> [    0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15:
>>>> 0000000000000000
>>>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffab588000(0000)
>>>> knlGS:0000000000000000
>>>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [    0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4:
>>>> 00000000000406a0
>>>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>> 0000000000000000
>>>> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>> 0000000000000400
>>>> [    0.000000] Call Trace:
>>>> [    0.000000]  <TASK>
>>>> [    0.000000]  ? xen_clocksource_read+0x24/0x40
>>>
>>>
>>> This is done to set xen_sched_clock_offset which I think will not be used for a
>>> while, until sched_clock is called (and the other two uses are for
>>> suspend/resume)
>>>
>>>
>>> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until
>>> after all vcpu areas are properly set? Or are there other uses of
>>> xen_clocksource_read() before ?
>>>
>>
>> I have tested that below patch will panic kdump kernel.
>>
> 
> 
> 
> Oh well, so much for that then. Yes, sched_clock() is at least called from
> printk path.
> 
> 
> I guess we will have to go with v2 then, we don't want to start seeing time
> going back, even if only with older hypervisors. The only thing I might ask is
> that you roll the logic inside xen_hvm_init_time_ops(). Something like
> 
> 
> xen_hvm_init_time_ops()
> {
>     /*
>      * Wait until per_cpu(xen_vcpu, 0) is initialized which may happen
>      * later (e.g. when kdump kernel runs on >=MAX_VIRT_CPUS vcpu)
>      */
>     if (__this_cpu_read(xen_vcpu_nr(0)) == NULL)
>         return;
> 

I think you meant __this_cpu_read(xen_vcpu).

I will call xen_hvm_init_time_ops() at both places, and move the logic into
xen_hvm_init_time_ops().

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/1] xen: fix HVM kexec kernel panic
  2022-03-01  1:18       ` Dongli Zhang
@ 2022-03-01  4:56         ` Dongli Zhang
  2022-03-01 17:21           ` Boris Ostrovsky
  0 siblings, 1 reply; 8+ messages in thread
From: Dongli Zhang @ 2022-03-01  4:56 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-kernel, jgross, sstabellini, tglx, mingo, bp, dave.hansen,
	hpa, joe.jin, xen-devel, x86

Hi Boris,

On 2/28/22 5:18 PM, Dongli Zhang wrote:
> Hi Boris,
> 
> On 2/28/22 12:45 PM, Boris Ostrovsky wrote:
>>
>>
>> On 2/25/22 8:17 PM, Dongli Zhang wrote:
>>> Hi Boris,
>>>
>>> On 2/25/22 2:39 PM, Boris Ostrovsky wrote:
>>>>
>>>> On 2/24/22 4:50 PM, Dongli Zhang wrote:
>>>>> This is the v3 of the patch to fix xen kexec kernel panic issue when the
>>>>> kexec is triggered on VCPU >= 32.
>>>>>
>>>>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20
>>>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
>>>>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1
>>>>> [    0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020
>>>>> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
>>>>> ... ...
>>>>> [    0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX:
>>>>> 0000000000000000
>>>>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX:
>>>>> 0000000000000002
>>>>> [    0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI:
>>>>> 0000000000000020
>>>>> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09:
>>>>> 0000000000000001
>>>>> [    0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12:
>>>>> 0000000040000004
>>>>> [    0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffab588000(0000)
>>>>> knlGS:0000000000000000
>>>>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [    0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4:
>>>>> 00000000000406a0
>>>>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>> 0000000000000000
>>>>> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>>> 0000000000000400
>>>>> [    0.000000] Call Trace:
>>>>> [    0.000000]  <TASK>
>>>>> [    0.000000]  ? xen_clocksource_read+0x24/0x40
>>>>
>>>>
>>>> This is done to set xen_sched_clock_offset which I think will not be used for a
>>>> while, until sched_clock is called (and the other two uses are for
>>>> suspend/resume)
>>>>
>>>>
>>>> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until
>>>> after all vcpu areas are properly set? Or are there other uses of
>>>> xen_clocksource_read() before ?
>>>>
>>>
>>> I have tested that below patch will panic kdump kernel.
>>>
>>
>>
>>
>> Oh well, so much for that then. Yes, sched_clock() is at least called from
>> printk path.
>>
>>
>> I guess we will have to go with v2 then, we don't want to start seeing time
>> going back, even if only with older hypervisors. The only thing I might ask is
>> that you roll the logic inside xen_hvm_init_time_ops(). Something like
>>
>>
>> xen_hvm_init_time_ops()
>> {
>>     /*
>>      * Wait until per_cpu(xen_vcpu, 0) is initialized which may happen
>>      * later (e.g. when kdump kernel runs on >=MAX_VIRT_CPUS vcpu)
>>      */
>>     if (__this_cpu_read(xen_vcpu_nr(0)) == NULL)
>>         return;
>>
> 
> I think you meant __this_cpu_read(xen_vcpu).
> 
> I will call xen_hvm_init_time_ops() at both places, and move the logic into
> xen_hvm_init_time_ops().
> 
> Thank you very much!
> 
> Dongli Zhang
> 


How about we do not move the logic into xen_hvm_init_time_ops()?

Suppose we move the logic into xen_hvm_init_time_ops() line 573, the line line
570 might be printed twice.


559 void __init xen_hvm_init_time_ops(void)
560 {
561         /*
562          * vector callback is needed otherwise we cannot receive interrupts
563          * on cpu > 0 and at this point we don't know how many cpus are
564          * available.
565          */
566         if (!xen_have_vector_callback)
567                 return;
568
569         if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
570                 pr_info("Xen doesn't support pvclock on HVM, disable pv timer");
571                 return;
572         }
573
574         xen_init_time_common();
575
576         x86_init.timers.setup_percpu_clockev = xen_time_init;
577         x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
578
579         x86_platform.set_wallclock = xen_set_wallclock;
580 }

I feel the code looks better if we keep the logic at caller side. Would you mind
letting me know your feedback?

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/1] xen: fix HVM kexec kernel panic
  2022-03-01  4:56         ` Dongli Zhang
@ 2022-03-01 17:21           ` Boris Ostrovsky
  0 siblings, 0 replies; 8+ messages in thread
From: Boris Ostrovsky @ 2022-03-01 17:21 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: linux-kernel, jgross, sstabellini, tglx, mingo, bp, dave.hansen,
	hpa, joe.jin, xen-devel, x86


On 2/28/22 11:56 PM, Dongli Zhang wrote:
> Hi Boris,
>
> On 2/28/22 5:18 PM, Dongli Zhang wrote:
>> Hi Boris,
>>
>> On 2/28/22 12:45 PM, Boris Ostrovsky wrote:
>>>
>>> On 2/25/22 8:17 PM, Dongli Zhang wrote:
>>>> Hi Boris,
>>>>
>>>> On 2/25/22 2:39 PM, Boris Ostrovsky wrote:
>>>>> On 2/24/22 4:50 PM, Dongli Zhang wrote:
>>>>>> This is the v3 of the patch to fix xen kexec kernel panic issue when the
>>>>>> kexec is triggered on VCPU >= 32.
>>>>>>
>>>>>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20
>>>>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
>>>>>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1
>>>>>> [    0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020
>>>>>> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
>>>>>> ... ...
>>>>>> [    0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX:
>>>>>> 0000000000000000
>>>>>> [    0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX:
>>>>>> 0000000000000002
>>>>>> [    0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI:
>>>>>> 0000000000000020
>>>>>> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09:
>>>>>> 0000000000000001
>>>>>> [    0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12:
>>>>>> 0000000040000004
>>>>>> [    0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15:
>>>>>> 0000000000000000
>>>>>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffab588000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [    0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4:
>>>>>> 00000000000406a0
>>>>>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>>> 0000000000000000
>>>>>> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>>>> 0000000000000400
>>>>>> [    0.000000] Call Trace:
>>>>>> [    0.000000]  <TASK>
>>>>>> [    0.000000]  ? xen_clocksource_read+0x24/0x40
>>>>>
>>>>> This is done to set xen_sched_clock_offset which I think will not be used for a
>>>>> while, until sched_clock is called (and the other two uses are for
>>>>> suspend/resume)
>>>>>
>>>>>
>>>>> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until
>>>>> after all vcpu areas are properly set? Or are there other uses of
>>>>> xen_clocksource_read() before ?
>>>>>
>>>> I have tested that below patch will panic kdump kernel.
>>>>
>>>
>>>
>>> Oh well, so much for that then. Yes, sched_clock() is at least called from
>>> printk path.
>>>
>>>
>>> I guess we will have to go with v2 then, we don't want to start seeing time
>>> going back, even if only with older hypervisors. The only thing I might ask is
>>> that you roll the logic inside xen_hvm_init_time_ops(). Something like
>>>
>>>
>>> xen_hvm_init_time_ops()
>>> {
>>>      /*
>>>       * Wait until per_cpu(xen_vcpu, 0) is initialized which may happen
>>>       * later (e.g. when kdump kernel runs on >=MAX_VIRT_CPUS vcpu)
>>>       */
>>>      if (__this_cpu_read(xen_vcpu_nr(0)) == NULL)
>>>          return;
>>>
>> I think you meant __this_cpu_read(xen_vcpu).
>>
>> I will call xen_hvm_init_time_ops() at both places, and move the logic into
>> xen_hvm_init_time_ops().
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>
> How about we do not move the logic into xen_hvm_init_time_ops()?
>
> Suppose we move the logic into xen_hvm_init_time_ops() line 573, the line line
> 570 might be printed twice.


You would need to make sure the routine is executed only once so something like a local static variable would be needed.


>
>
> 559 void __init xen_hvm_init_time_ops(void)
> 560 {
> 561         /*
> 562          * vector callback is needed otherwise we cannot receive interrupts
> 563          * on cpu > 0 and at this point we don't know how many cpus are
> 564          * available.
> 565          */
> 566         if (!xen_have_vector_callback)
> 567                 return;
> 568
> 569         if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
> 570                 pr_info("Xen doesn't support pvclock on HVM, disable pv timer");
> 571                 return;
> 572         }
> 573
> 574         xen_init_time_common();
> 575
> 576         x86_init.timers.setup_percpu_clockev = xen_time_init;
> 577         x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
> 578
> 579         x86_platform.set_wallclock = xen_set_wallclock;
> 580 }
>
> I feel the code looks better if we keep the logic at caller side. Would you mind
> letting me know your feedback?


My preference is to keep logic concentrated in one place whenever possible.


-boris


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-03-01 17:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-24 21:50 [PATCH v3 0/1] xen: fix HVM kexec kernel panic Dongli Zhang
2022-02-24 21:50 ` [PATCH v3 1/1] xen: delay xen_hvm_init_time_ops() to xen_hvm_smp_prepare_boot_cpu() Dongli Zhang
2022-02-25 22:39 ` [PATCH v3 0/1] xen: fix HVM kexec kernel panic Boris Ostrovsky
2022-02-26  1:17   ` Dongli Zhang
2022-02-28 20:45     ` Boris Ostrovsky
2022-03-01  1:18       ` Dongli Zhang
2022-03-01  4:56         ` Dongli Zhang
2022-03-01 17:21           ` Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.