Re: [PATCH v2 1/1] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32

From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Dongli Zhang <dongli.zhang@oracle.com>,
	xen-devel@lists.xenproject.org, x86@kernel.org
Cc: jgross@suse.com, sstabellini@kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, joe.jin@oracle.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/1] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
Date: Fri, 5 Nov 2021 16:41:16 -0400	[thread overview]
Message-ID: <a891e4f6-9832-8335-2cf8-e494f8703fb7@oracle.com> (raw)
In-Reply-To: <fe6d7aaf-97b5-dfa2-75e5-1a072f4aac92@oracle.com>

On 11/4/21 1:59 PM, Dongli Zhang wrote:
> Hi Boris,
>
> On 11/1/21 10:34 AM, Boris Ostrovsky wrote:
>> On 10/27/21 9:25 PM, Dongli Zhang wrote:
>>> The sched_clock() can be used very early since
>>> commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
>>> with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from
>>> 0"), kdump kernel in Xen HVM guest may panic at very early stage when
>>> accessing &__this_cpu_read(xen_vcpu)->time as in below:
>>>
>>> setup_arch()
>>>    -> init_hypervisor_platform()
>>>        -> x86_init.hyper.init_platform = xen_hvm_guest_init()
>>>            -> xen_hvm_init_time_ops()
>>>                -> xen_clocksource_read()
>>>                    -> src = &__this_cpu_read(xen_vcpu)->time;
>>>
>>> This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
>>> embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
>>> used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
>>>
>>> However, when Xen HVM guest panic on vcpu >= 32, since
>>> xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when
>>> vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.
>>>
>>> This patch delays xen_hvm_init_time_ops() to later in
>>> xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is
>>> registered when the boot vcpu is >= 32.
>>>
>>> Another option is to always delay xen_hvm_init_time_ops() for any vcpus
>>> (including vcpu=0). Since to delay xen_hvm_init_time_ops() may lead to
>>> clock backward issue,
>>
>> This is referring to
>> https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg01516.html I
>> assume?
> No.
>
> So far there are clock forward (e.g., from 0 to 65345) issue and clock backward
> issue (e.g., from 2.432 to 0).
>
> The clock forward issue can be resolved by above link to enforce clock update
> during vcpu info registration. However, so far I am only able to reproduce clock
> forward when "taskset -c 33 echo c > /proc/sysrq-trigger".
>
> That is, I am not able to see any clock forward drift during regular boot (on
> CPU=0), without the fix at hypervisor side.
>
> The clock backward issue is because the native clock source is used if we delay
> the initialization of xen clock source. We will see a backward when the source
> is switched from native to xen.
>
>>
>>>    it is preferred to avoid that for regular boot (The
>>> pv_sched_clock=native_sched_clock() is used at the very beginning until
>>> xen_sched_clock() is registered). That requires to adjust
>>> xen_sched_clock_offset. That's why we only delay xen_hvm_init_time_ops()
>>> for vcpu>=32.
>>
>> We delay only on VCPU>=32 because we want to avoid the clock going backwards due
>> to hypervisor problem pointed to be the link above, not because we need to
>> adjust xen_sched_clock_offset (which we could if we wanted).
> I will add that.
>
> Just to clarify that so far I am not able to reproduce the clock backward issue
> during regular boot (on CPU=0), when the fix is not available at hypervisor side.

FTR, Dongli and I had a chat and he is going to provide another version of the patch where time initialization is deferred for all vcpus.

-boris