linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue
@ 2021-10-12  7:24 Dongli Zhang
  2021-10-12  7:24 ` [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 Dongli Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Dongli Zhang @ 2021-10-12  7:24 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, x86, boris.ostrovsky, jgross, sstabellini, tglx,
	mingo, bp, hpa, andrew.cooper3, george.dunlap, iwj, jbeulich,
	julien, wl, joe.jin

When the kdump/kexec is enabled at HVM VM side, to panic kernel will trap
to xen side with reason=soft_reset. As a result, the xen will reboot the VM
with the kdump kernel.

Unfortunately, when the VM is panic with below command line ...

"taskset -c 33 echo c > /proc/sysrq-trigger"

... the kdump kernel is panic at early stage ...

PANIC: early exception 0x0e IP 10:ffffffffa8c66876 error 0 cr2 0x20
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc5xen #1
[    0.000000] Hardware name: Xen HVM domU
[    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
... ...
[    0.000000] RSP: 0000:ffffffffaa203e20 EFLAGS: 00010082 ORIG_RAX: 0000000000000000
[    0.000000] RAX: 0000000000000003 RBX: 0000000000010000 RCX: 00000000ffffdfff
[    0.000000] RDX: 0000000000000003 RSI: 00000000ffffdfff RDI: 0000000000000020
[    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
[    0.000000] R10: ffffffffaa203e00 R11: ffffffffaa203c70 R12: 0000000040000004
[    0.000000] R13: ffffffffaa203e5c R14: ffffffffaa203e58 R15: 0000000000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffffaa95e000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000020 CR3: 00000000ec9e0000 CR4: 00000000000406a0
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.000000] Call Trace:
[    0.000000]  ? xen_init_time_common+0x11/0x55
[    0.000000]  ? xen_hvm_init_time_ops+0x23/0x45
[    0.000000]  ? xen_hvm_guest_init+0x214/0x251
[    0.000000]  ? 0xffffffffa8c00000
[    0.000000]  ? setup_arch+0x440/0xbd6
[    0.000000]  ? start_kernel+0x6a/0x689
[    0.000000]  ? secondary_startup_64_no_verify+0xc2/0xcb

This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.


The 1st patch is to fix the issue at VM kernel side. However, we may
observe clock drift at VM side due to the issue at xen hypervisor side.
This is because the pv vcpu_time_info is not updated when
VCPUOP_register_vcpu_info.

The 2nd patch is to force_update_vcpu_system_time() at xen side when
VCPUOP_register_vcpu_info, to avoid the VM clock drift during kdump kernel
boot.


I did test the fix by backporting the 2nd patch to a prior old xen version.
This is because I am not able to use soft_reset successfully with mainline
xen. I have encountered below error when testing soft_reset with mainline
xen. Please let me know if there is any know issue/solution.

# xl -v create -F vm.cfg
... ...
... ...
Domain 1 has shut down, reason code 5 0x5
Action for shutdown reason code 5 is soft-reset
Done. Rebooting now
xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error
libxl: error: libxl_cpuid.c:488:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists
libxl: error: libxl_create.c:1573:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:53:libxl__domain_type: unable to get domain type for domid=1, assuming HVM


Thank you very much!

Dongli Zhang



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
  2021-10-12  7:24 [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue Dongli Zhang
@ 2021-10-12  7:24 ` Dongli Zhang
  2021-10-12  8:48   ` Juergen Gross
  2021-10-12 17:17   ` Boris Ostrovsky
  2021-10-12  7:24 ` [PATCH xen 2/2] xen: update system time immediately when VCPUOP_register_vcpu_info Dongli Zhang
  2021-10-12  8:47 ` [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue Juergen Gross
  2 siblings, 2 replies; 8+ messages in thread
From: Dongli Zhang @ 2021-10-12  7:24 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, x86, boris.ostrovsky, jgross, sstabellini, tglx,
	mingo, bp, hpa, andrew.cooper3, george.dunlap, iwj, jbeulich,
	julien, wl, joe.jin

The sched_clock() can be used very early since upstream
commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
with upstream commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock
time from 0"), kdump kernel in Xen HVM guest may panic at very early stage
when accessing &__this_cpu_read(xen_vcpu)->time as in below:

setup_arch()
 -> init_hypervisor_platform()
     -> x86_init.hyper.init_platform = xen_hvm_guest_init()
         -> xen_hvm_init_time_ops()
             -> xen_clocksource_read()
                 -> src = &__this_cpu_read(xen_vcpu)->time;

This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.

However, when Xen HVM guest panic on vcpu >= 32, since
xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when
vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.

This patch delays xen_hvm_init_time_ops() to later in
xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is
registered when the boot vcpu is >= 32.

This issue can be reproduced on purpose via below command at the guest
side when kdump/kexec is enabled:

"taskset -c 33 echo c > /proc/sysrq-trigger"

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 arch/x86/xen/enlighten_hvm.c | 20 +++++++++++++++++++-
 arch/x86/xen/smp_hvm.c       |  3 +++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index e68ea5f4ad1c..152279416d9a 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -216,7 +216,25 @@ static void __init xen_hvm_guest_init(void)
 	WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_hvm, xen_cpu_dead_hvm));
 	xen_unplug_emulated_devices();
 	x86_init.irqs.intr_init = xen_init_IRQ;
-	xen_hvm_init_time_ops();
+
+	/*
+	 * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'
+	 * and the VM would use them until xen_vcpu_setup() is used to
+	 * allocate/relocate them at arbitrary address.
+	 *
+	 * However, when Xen HVM guest panic on vcpu >= MAX_VIRT_CPUS,
+	 * per_cpu(xen_vcpu, cpu) is still NULL at this stage. To access
+	 * per_cpu(xen_vcpu, cpu) via xen_clocksource_read() would panic.
+	 *
+	 * Therefore we delay xen_hvm_init_time_ops() to
+	 * xen_hvm_smp_prepare_boot_cpu() when boot vcpu is >= MAX_VIRT_CPUS.
+	 */
+	if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
+		pr_info("Delay xen_hvm_init_time_ops() as kernel is running on vcpu=%d\n",
+			xen_vcpu_nr(0));
+	else
+		xen_hvm_init_time_ops();
+
 	xen_hvm_init_mmu_ops();
 
 #ifdef CONFIG_KEXEC_CORE
diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
index 6ff3c887e0b9..60cd4fafd188 100644
--- a/arch/x86/xen/smp_hvm.c
+++ b/arch/x86/xen/smp_hvm.c
@@ -19,6 +19,9 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)
 	 */
 	xen_vcpu_setup(0);
 
+	if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
+		xen_hvm_init_time_ops();
+
 	/*
 	 * The alternative logic (which patches the unlock/lock) runs before
 	 * the smp bootup up code is activated. Hence we need to set this up
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH xen 2/2] xen: update system time immediately when VCPUOP_register_vcpu_info
  2021-10-12  7:24 [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue Dongli Zhang
  2021-10-12  7:24 ` [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 Dongli Zhang
@ 2021-10-12  7:24 ` Dongli Zhang
  2021-10-12  8:47 ` [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue Juergen Gross
  2 siblings, 0 replies; 8+ messages in thread
From: Dongli Zhang @ 2021-10-12  7:24 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, x86, boris.ostrovsky, jgross, sstabellini, tglx,
	mingo, bp, hpa, andrew.cooper3, george.dunlap, iwj, jbeulich,
	julien, wl, joe.jin

The guest may access the pv vcpu_time_info immediately after
VCPUOP_register_vcpu_info. This is to borrow the idea of
VCPUOP_register_vcpu_time_memory_area, where the
force_update_vcpu_system_time() is called immediately when the new memory
area is registered.

Otherwise, we may observe clock drift at the VM side if the VM accesses
the clocksource immediately after VCPUOP_register_vcpu_info().

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 xen/common/domain.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 40d67ec342..c879f6723b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1695,6 +1695,8 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = map_vcpu_info(v, info.mfn, info.offset);
         domain_unlock(d);
 
+        force_update_vcpu_system_time(v);
+
         break;
     }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue
  2021-10-12  7:24 [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue Dongli Zhang
  2021-10-12  7:24 ` [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 Dongli Zhang
  2021-10-12  7:24 ` [PATCH xen 2/2] xen: update system time immediately when VCPUOP_register_vcpu_info Dongli Zhang
@ 2021-10-12  8:47 ` Juergen Gross
  2021-10-12 15:50   ` Dongli Zhang
  2 siblings, 1 reply; 8+ messages in thread
From: Juergen Gross @ 2021-10-12  8:47 UTC (permalink / raw)
  To: Dongli Zhang, xen-devel
  Cc: linux-kernel, x86, boris.ostrovsky, sstabellini, tglx, mingo, bp,
	hpa, andrew.cooper3, george.dunlap, iwj, jbeulich, julien, wl,
	joe.jin


[-- Attachment #1.1.1: Type: text/plain, Size: 2779 bytes --]

On 12.10.21 09:24, Dongli Zhang wrote:
> When the kdump/kexec is enabled at HVM VM side, to panic kernel will trap
> to xen side with reason=soft_reset. As a result, the xen will reboot the VM
> with the kdump kernel.
> 
> Unfortunately, when the VM is panic with below command line ...
> 
> "taskset -c 33 echo c > /proc/sysrq-trigger"
> 
> ... the kdump kernel is panic at early stage ...
> 
> PANIC: early exception 0x0e IP 10:ffffffffa8c66876 error 0 cr2 0x20
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc5xen #1
> [    0.000000] Hardware name: Xen HVM domU
> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
> ... ...
> [    0.000000] RSP: 0000:ffffffffaa203e20 EFLAGS: 00010082 ORIG_RAX: 0000000000000000
> [    0.000000] RAX: 0000000000000003 RBX: 0000000000010000 RCX: 00000000ffffdfff
> [    0.000000] RDX: 0000000000000003 RSI: 00000000ffffdfff RDI: 0000000000000020
> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
> [    0.000000] R10: ffffffffaa203e00 R11: ffffffffaa203c70 R12: 0000000040000004
> [    0.000000] R13: ffffffffaa203e5c R14: ffffffffaa203e58 R15: 0000000000000000
> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffaa95e000(0000) knlGS:0000000000000000
> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.000000] CR2: 0000000000000020 CR3: 00000000ec9e0000 CR4: 00000000000406a0
> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    0.000000] Call Trace:
> [    0.000000]  ? xen_init_time_common+0x11/0x55
> [    0.000000]  ? xen_hvm_init_time_ops+0x23/0x45
> [    0.000000]  ? xen_hvm_guest_init+0x214/0x251
> [    0.000000]  ? 0xffffffffa8c00000
> [    0.000000]  ? setup_arch+0x440/0xbd6
> [    0.000000]  ? start_kernel+0x6a/0x689
> [    0.000000]  ? secondary_startup_64_no_verify+0xc2/0xcb
> 
> This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
> embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
> used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
> 
> 
> The 1st patch is to fix the issue at VM kernel side. However, we may
> observe clock drift at VM side due to the issue at xen hypervisor side.
> This is because the pv vcpu_time_info is not updated when
> VCPUOP_register_vcpu_info.
> 
> The 2nd patch is to force_update_vcpu_system_time() at xen side when
> VCPUOP_register_vcpu_info, to avoid the VM clock drift during kdump kernel
> boot.

Please don't mix patches for multiple projects in one series.

In cases like this it is fine to mention the other project's patch
verbally instead.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
  2021-10-12  7:24 ` [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 Dongli Zhang
@ 2021-10-12  8:48   ` Juergen Gross
  2021-10-12 17:17   ` Boris Ostrovsky
  1 sibling, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2021-10-12  8:48 UTC (permalink / raw)
  To: Dongli Zhang, xen-devel
  Cc: linux-kernel, x86, boris.ostrovsky, sstabellini, tglx, mingo, bp,
	hpa, andrew.cooper3, george.dunlap, iwj, jbeulich, julien, wl,
	joe.jin


[-- Attachment #1.1.1: Type: text/plain, Size: 3436 bytes --]

On 12.10.21 09:24, Dongli Zhang wrote:
> The sched_clock() can be used very early since upstream
> commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
> with upstream commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock
> time from 0"), kdump kernel in Xen HVM guest may panic at very early stage
> when accessing &__this_cpu_read(xen_vcpu)->time as in below:
> 
> setup_arch()
>   -> init_hypervisor_platform()
>       -> x86_init.hyper.init_platform = xen_hvm_guest_init()
>           -> xen_hvm_init_time_ops()
>               -> xen_clocksource_read()
>                   -> src = &__this_cpu_read(xen_vcpu)->time;
> 
> This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
> embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
> used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
> 
> However, when Xen HVM guest panic on vcpu >= 32, since
> xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when
> vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.
> 
> This patch delays xen_hvm_init_time_ops() to later in
> xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is
> registered when the boot vcpu is >= 32.
> 
> This issue can be reproduced on purpose via below command at the guest
> side when kdump/kexec is enabled:
> 
> "taskset -c 33 echo c > /proc/sysrq-trigger"
> 
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
>   arch/x86/xen/enlighten_hvm.c | 20 +++++++++++++++++++-
>   arch/x86/xen/smp_hvm.c       |  3 +++
>   2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
> index e68ea5f4ad1c..152279416d9a 100644
> --- a/arch/x86/xen/enlighten_hvm.c
> +++ b/arch/x86/xen/enlighten_hvm.c
> @@ -216,7 +216,25 @@ static void __init xen_hvm_guest_init(void)
>   	WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_hvm, xen_cpu_dead_hvm));
>   	xen_unplug_emulated_devices();
>   	x86_init.irqs.intr_init = xen_init_IRQ;
> -	xen_hvm_init_time_ops();
> +
> +	/*
> +	 * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'
> +	 * and the VM would use them until xen_vcpu_setup() is used to
> +	 * allocate/relocate them at arbitrary address.
> +	 *
> +	 * However, when Xen HVM guest panic on vcpu >= MAX_VIRT_CPUS,
> +	 * per_cpu(xen_vcpu, cpu) is still NULL at this stage. To access
> +	 * per_cpu(xen_vcpu, cpu) via xen_clocksource_read() would panic.
> +	 *
> +	 * Therefore we delay xen_hvm_init_time_ops() to
> +	 * xen_hvm_smp_prepare_boot_cpu() when boot vcpu is >= MAX_VIRT_CPUS.
> +	 */
> +	if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
> +		pr_info("Delay xen_hvm_init_time_ops() as kernel is running on vcpu=%d\n",
> +			xen_vcpu_nr(0));
> +	else
> +		xen_hvm_init_time_ops();
> +
>   	xen_hvm_init_mmu_ops();
>   
>   #ifdef CONFIG_KEXEC_CORE
> diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
> index 6ff3c887e0b9..60cd4fafd188 100644
> --- a/arch/x86/xen/smp_hvm.c
> +++ b/arch/x86/xen/smp_hvm.c
> @@ -19,6 +19,9 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)
>   	 */
>   	xen_vcpu_setup(0);
>   
> +	if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
> +		xen_hvm_init_time_ops();
> +

Please add a comment referencing the related code in
xen_hvm_guest_init().


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue
  2021-10-12  8:47 ` [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue Juergen Gross
@ 2021-10-12 15:50   ` Dongli Zhang
  0 siblings, 0 replies; 8+ messages in thread
From: Dongli Zhang @ 2021-10-12 15:50 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: linux-kernel, x86, boris.ostrovsky, sstabellini, tglx, mingo, bp,
	hpa, andrew.cooper3, george.dunlap, iwj, jbeulich, julien, wl,
	joe.jin

Hi Juergen,

On 10/12/21 1:47 AM, Juergen Gross wrote:
> On 12.10.21 09:24, Dongli Zhang wrote:
>> When the kdump/kexec is enabled at HVM VM side, to panic kernel will trap
>> to xen side with reason=soft_reset. As a result, the xen will reboot the VM
>> with the kdump kernel.
>>
>> Unfortunately, when the VM is panic with below command line ...
>>
>> "taskset -c 33 echo c > /proc/sysrq-trigger"
>>
>> ... the kdump kernel is panic at early stage ...
>>
>> PANIC: early exception 0x0e IP 10:ffffffffa8c66876 error 0 cr2 0x20
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc5xen #1
>> [    0.000000] Hardware name: Xen HVM domU
>> [    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
>> ... ...
>> [    0.000000] RSP: 0000:ffffffffaa203e20 EFLAGS: 00010082 ORIG_RAX:
>> 0000000000000000
>> [    0.000000] RAX: 0000000000000003 RBX: 0000000000010000 RCX: 00000000ffffdfff
>> [    0.000000] RDX: 0000000000000003 RSI: 00000000ffffdfff RDI: 0000000000000020
>> [    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
>> [    0.000000] R10: ffffffffaa203e00 R11: ffffffffaa203c70 R12: 0000000040000004
>> [    0.000000] R13: ffffffffaa203e5c R14: ffffffffaa203e58 R15: 0000000000000000
>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffffaa95e000(0000)
>> knlGS:0000000000000000
>> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.000000] CR2: 0000000000000020 CR3: 00000000ec9e0000 CR4: 00000000000406a0
>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    0.000000] Call Trace:
>> [    0.000000]  ? xen_init_time_common+0x11/0x55
>> [    0.000000]  ? xen_hvm_init_time_ops+0x23/0x45
>> [    0.000000]  ? xen_hvm_guest_init+0x214/0x251
>> [    0.000000]  ? 0xffffffffa8c00000
>> [    0.000000]  ? setup_arch+0x440/0xbd6
>> [    0.000000]  ? start_kernel+0x6a/0x689
>> [    0.000000]  ? secondary_startup_64_no_verify+0xc2/0xcb
>>
>> This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
>> embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
>> used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
>>
>>
>> The 1st patch is to fix the issue at VM kernel side. However, we may
>> observe clock drift at VM side due to the issue at xen hypervisor side.
>> This is because the pv vcpu_time_info is not updated when
>> VCPUOP_register_vcpu_info.
>>
>> The 2nd patch is to force_update_vcpu_system_time() at xen side when
>> VCPUOP_register_vcpu_info, to avoid the VM clock drift during kdump kernel
>> boot.
> 
> Please don't mix patches for multiple projects in one series.
> 
> In cases like this it is fine to mention the other project's patch
> verbally instead.
> 

I will split the patchset in v2 and email to different projects.

The core ideas of this combined patchset are:

1. Fix at HVM domU side (kdump kernel panic)

2. Fix at Xen hypervisor side (clock drift issue in kdump kernel)

3. To report (or seek for help) that soft_reset does not work with mainline-xen
so that I am not able to test my patchset with the most recent mainline xen.

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
  2021-10-12  7:24 ` [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 Dongli Zhang
  2021-10-12  8:48   ` Juergen Gross
@ 2021-10-12 17:17   ` Boris Ostrovsky
  2021-10-25  5:20     ` Dongli Zhang
  1 sibling, 1 reply; 8+ messages in thread
From: Boris Ostrovsky @ 2021-10-12 17:17 UTC (permalink / raw)
  To: Dongli Zhang, xen-devel
  Cc: linux-kernel, x86, jgross, sstabellini, tglx, mingo, bp, hpa,
	andrew.cooper3, george.dunlap, iwj, jbeulich, julien, wl,
	joe.jin


On 10/12/21 3:24 AM, Dongli Zhang wrote:
> The sched_clock() can be used very early since upstream
> commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
> with upstream commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock
> time from 0"), kdump kernel in Xen HVM guest may panic at very early stage
> when accessing &__this_cpu_read(xen_vcpu)->time as in below:


Please drop "upstream". It's always upstream here.


> +
> +	/*
> +	 * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'
> +	 * and the VM would use them until xen_vcpu_setup() is used to
> +	 * allocate/relocate them at arbitrary address.
> +	 *
> +	 * However, when Xen HVM guest panic on vcpu >= MAX_VIRT_CPUS,
> +	 * per_cpu(xen_vcpu, cpu) is still NULL at this stage. To access
> +	 * per_cpu(xen_vcpu, cpu) via xen_clocksource_read() would panic.
> +	 *
> +	 * Therefore we delay xen_hvm_init_time_ops() to
> +	 * xen_hvm_smp_prepare_boot_cpu() when boot vcpu is >= MAX_VIRT_CPUS.
> +	 */
> +	if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)


What about always deferring this when panicing? Would that work?


Deciding whether to defer based on cpu number feels a bit awkward.


-boris


> +		pr_info("Delay xen_hvm_init_time_ops() as kernel is running on vcpu=%d\n",
> +			xen_vcpu_nr(0));
> +	else
> +		xen_hvm_init_time_ops();
> +
>   	xen_hvm_init_mmu_ops();
>   

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
  2021-10-12 17:17   ` Boris Ostrovsky
@ 2021-10-25  5:20     ` Dongli Zhang
  0 siblings, 0 replies; 8+ messages in thread
From: Dongli Zhang @ 2021-10-25  5:20 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: linux-kernel, x86, jgross, sstabellini, tglx, mingo, bp, hpa,
	andrew.cooper3, george.dunlap, iwj, jbeulich, julien, wl,
	joe.jin

Hi Boris,

On 10/12/21 10:17 AM, Boris Ostrovsky wrote:
> 
> On 10/12/21 3:24 AM, Dongli Zhang wrote:
>> The sched_clock() can be used very early since upstream
>> commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
>> with upstream commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock
>> time from 0"), kdump kernel in Xen HVM guest may panic at very early stage
>> when accessing &__this_cpu_read(xen_vcpu)->time as in below:
> 
> 
> Please drop "upstream". It's always upstream here.
> 
> 
>> +
>> +    /*
>> +     * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'
>> +     * and the VM would use them until xen_vcpu_setup() is used to
>> +     * allocate/relocate them at arbitrary address.
>> +     *
>> +     * However, when Xen HVM guest panic on vcpu >= MAX_VIRT_CPUS,
>> +     * per_cpu(xen_vcpu, cpu) is still NULL at this stage. To access
>> +     * per_cpu(xen_vcpu, cpu) via xen_clocksource_read() would panic.
>> +     *
>> +     * Therefore we delay xen_hvm_init_time_ops() to
>> +     * xen_hvm_smp_prepare_boot_cpu() when boot vcpu is >= MAX_VIRT_CPUS.
>> +     */
>> +    if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
> 
> 
> What about always deferring this when panicing? Would that work?
> 
> 
> Deciding whether to defer based on cpu number feels a bit awkward.
> 
> 
> -boris
> 

I did some tests and I do not think this works well. I prefer to delay the
initialization only for VCPU >= 32.

This is the syslog if we always delay xen_hvm_init_time_ops(), regardless
whether VCPU >= 32.

[    0.032372] Booting paravirtualized kernel on Xen HVM
[    0.032376] clocksource: refined-jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.037683] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:64
nr_node_ids:2
[    0.041876] percpu: Embedded 49 pages/cpu s162968 r8192 d29544 u262144

--> There is a clock backwards from 0.041876 to 0.000010.

[    0.000010] Built 2 zonelists, mobility grouping on.  Total pages: 2015744
[    0.000012] Policy zone: Normal
[    0.000014] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-rc6xen+
root=UUID=2a5975ab-a059-4697-9aee-7a53ddfeea21 ro text console=ttyS0,115200n8
console=tty1 crashkernel=512M-:192M


This is because the initial pv_sched_clock is native_sched_clock(), and it
switches to xen_sched_clock() in xen_hvm_init_time_ops(). Is it fine to always
have a clock backward for non-kdump kernel?

To avoid the clock backward, we may register a dummy clocksource which always
returns 0, before xen_hvm_init_time_ops(). I do not think this is reasonable.

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-25  5:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12  7:24 [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue Dongli Zhang
2021-10-12  7:24 ` [PATCH linux 1/2] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 Dongli Zhang
2021-10-12  8:48   ` Juergen Gross
2021-10-12 17:17   ` Boris Ostrovsky
2021-10-25  5:20     ` Dongli Zhang
2021-10-12  7:24 ` [PATCH xen 2/2] xen: update system time immediately when VCPUOP_register_vcpu_info Dongli Zhang
2021-10-12  8:47 ` [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue Juergen Gross
2021-10-12 15:50   ` Dongli Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).