All of lore.kernel.org
 help / color / mirror / Atom feed
* warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
@ 2022-02-08 14:59 Maxim Levitsky
  2022-02-08 15:15 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 10+ messages in thread
From: Maxim Levitsky @ 2022-02-08 14:59 UTC (permalink / raw)
  To: kvm; +Cc: Vitaly Kuznetsov





[102140.117649] WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm]
[102140.118063] Modules linked in: kvm_amd(O) ccp rng_core kvm(O) irqbypass tun xt_conntrack ip6table_filter ip6_tables tps6598x wmi_bmof snd_acp3x_pdm_dma snd_soc_dmic snd_acp3x_rn regmap_i2c
snd_soc_core ftdi_sio snd_ctl_led bfq snd_hda_codec_realtek iwlmvm btusb snd_hda_codec_generic snd_hda_codec_hdmi btrtl psmouse btbcm pcspkr btintel k10temp snd_hda_intel snd_intel_dspcfg tpm_crb
snd_hda_codec snd_hwdep snd_hda_core snd_pcm iwlwifi snd_rn_pci_acp3x i2c_piix4 tpm_tis tpm_tis_core thinkpad_acpi wmi platform_profile rtc_cmos i2c_multi_instantiate i2c_designware_platform
i2c_designware_core sch_fq_codel dm_crypt mmc_block hid_generic usbhid rtsx_pci_sdmmc mmc_core atkbd libps2 amdgpu drm_ttm_helper ttm gpu_sched i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea
cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea nvme xhci_pci drm sp5100_tco nvme_core rtsx_pci ucsi_acpi xhci_hcd drm_panel_orientation_quirks mfd_core t10_pi typec_ucsi typec i8042
pinctrl_amd r8169 realtek 8250_pci
[102140.118133]  usbmon nbd fuse autofs4 [last unloaded: rng_core]
[102140.120708] CPU: 10 PID: 579353 Comm: qemu-system-x86 Tainted: G        W  O      5.16.0.stable #20
[102140.120971] Hardware name: LENOVO 20UF001CUS/20UF001CUS, BIOS R1CET65W(1.34 ) 06/17/2021
[102140.121206] RIP: 0010:mark_page_dirty_in_slot+0x6c/0x80 [kvm]
[102140.121441] Code: 21 00 00 0f bf b6 04 01 00 00 c1 e1 10 48 89 e5 09 ce e8 17 aa 00 00 5d c3 c3 48 8b 86 c0 00 00 00 48 63 d2 f0 48 0f ab 10 c3 <0f> 0b c3 0f 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00
0f 1f 00 0f 1f
[102140.121990] RSP: 0018:ffffc9000078fcb8 EFLAGS: 00010246
[102140.122152] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
[102140.122363] RDX: 0000000000108607 RSI: ffff88810a264600 RDI: ffffc90001e55000
[102140.122571] RBP: ffffc9000078fd08 R08: 0000000000000000 R09: 0000000000000000
[102140.122789] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810a264600
[102140.123001] R13: 0000000000000004 R14: 0000000000108608 R15: ffffc90001e5e8cc
[102140.123210] FS:  00007ff0d9519d80(0000) GS:ffff8883ff880000(0000) knlGS:0000000000000000
[102140.123447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[102140.123618] CR2: 000055e6c099a0d8 CR3: 000000010a7f5000 CR4: 0000000000350ee0
[102140.123834] Call Trace:
[102140.123910]  <TASK>
[102140.123976]  ? kvm_write_guest+0x114/0x120 [kvm]
[102140.124183]  kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm]
[102140.124396]  kvm_arch_vm_ioctl+0xa26/0xc50 [kvm]
[102140.124585]  ? schedule+0x4e/0xc0
[102140.124701]  ? __cond_resched+0x1a/0x50
[102140.124826]  ? futex_wait+0x166/0x250
[102140.124946]  ? __send_signal+0x1f1/0x3d0
[102140.125072]  kvm_vm_ioctl+0x747/0xda0 [kvm]
[102140.125264]  ? do_futex+0xa7/0x160
[102140.125370]  ? __x64_sys_futex+0x74/0x1b0
[102140.125493]  __x64_sys_ioctl+0x8e/0xc0
[102140.125612]  do_syscall_64+0x35/0x80
[102140.125738]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[102140.125893] RIP: 0033:0x7ff0dd6b72bb
[102140.126001] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 2b 0f 00 f7
d8 64 89 01 48
[102140.126548] RSP: 002b:00007ffe28d41158 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[102140.126770] RAX: ffffffffffffffda RBX: 000055984440d610 RCX: 00007ff0dd6b72bb
[102140.126975] RDX: 00007ffe28d412a0 RSI: 000000004030ae7b RDI: 000000000000001e
[102140.127175] RBP: 00007ffe28d41250 R08: 0000000000000000 R09: 00000000ffffffff
[102140.127375] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ff0dec1c3a0
[102140.127574] R13: 0000559841ba2847 R14: 00005598442b03a0 R15: 0000559844077b10
[102140.127792]  </TASK>
[102140.127863] ---[ end trace b8a3ae7a8a53d467 ]---


This happens because kvm_hv_invalidate_tsc_page is called from kvm_vm_ioctl_set_clock
which is a VM wide ioctl and thus doesn't have to be called with an active vCPU.
 
But as I see the warring states that guest memory writes must alway be done
while a vCPU is active to allow the write to be recorded in its dirty track ring.
 
I _think_ it might be OK to drop this invalidation,
and rely on the fact that kvm_hv_setup_tsc_page will update it,
and it is called when vCPU0 processes KVM_REQ_CLOCK_UPDATE which is raised in the
kvm_vm_ioctl_set_clock eventually.
 
Vitaly, any thoughts on this?
 
For reference those are my HV flags:
 
    $cpu_flags: |
        $cpu_flags,
        hv_relaxed,hv_spinlocks=0x1fff,hv_vpindex,     # General HV features
        hv_runtime,hv_time,hv-frequencies,             # Clock stuff        
        hv_synic,hv_stimer,hv-stimer-direct,#hv-vapic, # APIC extensions
        #hv-tlbflush,hv-ipi,                           # IPI extensions
        hv-reenlightenment,                            # nested stuff
 
 
 
PS: unrelated question:
 
Vitaly, do you know why hv-evmcs needs hv-vapic?
 
 
I know that they stuffed the eVMCS pointer to HV_X64_MSR_VP_ASSIST_PAGE,
 
But can't we set HV_APIC_ACCESS_AVAILABLE but not HV_APIC_ACCESS_RECOMMENDED
so that guest would hopefully still know that HV assist page is available,
but should not use it for APIC acceleration?
 
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
  2022-02-08 14:59 warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context Maxim Levitsky
@ 2022-02-08 15:15 ` Vitaly Kuznetsov
  2022-02-08 15:34   ` Maxim Levitsky
  0 siblings, 1 reply; 10+ messages in thread
From: Vitaly Kuznetsov @ 2022-02-08 15:15 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: kvm

Maxim Levitsky <mlevitsk@redhat.com> writes:

> [102140.117649] WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm]
...
> [102140.123834] Call Trace:
> [102140.123910]  <TASK>
> [102140.123976]  ? kvm_write_guest+0x114/0x120 [kvm]
> [102140.124183]  kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm]
> [102140.124396]  kvm_arch_vm_ioctl+0xa26/0xc50 [kvm]

...

>
> This happens because kvm_hv_invalidate_tsc_page is called from kvm_vm_ioctl_set_clock
> which is a VM wide ioctl and thus doesn't have to be called with an active vCPU.
>  
> But as I see the warring states that guest memory writes must alway be done
> while a vCPU is active to allow the write to be recorded in its dirty track ring.
>  
> I _think_ it might be OK to drop this invalidation,
> and rely on the fact that kvm_hv_setup_tsc_page will update it,
> and it is called when vCPU0 processes KVM_REQ_CLOCK_UPDATE which is raised in the
> kvm_vm_ioctl_set_clock eventually.
>  
> Vitaly, any thoughts on this?
>  


TSC page (as well as SynIC pages) are supposed to be "overlay" pages
which are mapped over guest's memory but KVM doesn't do that and just
writes to guest's memory. This kind of works as Windows/Hyper-V guests
never disable these features and expecting the memory behind them to
stay intact.

Dirty tracking for active TSC page can be omited, I belive. Let me take
a look at this.

> For reference those are my HV flags:
>  
>     $cpu_flags: |
>         $cpu_flags,
>         hv_relaxed,hv_spinlocks=0x1fff,hv_vpindex,     # General HV features
>         hv_runtime,hv_time,hv-frequencies,             # Clock stuff        
>         hv_synic,hv_stimer,hv-stimer-direct,#hv-vapic, # APIC extensions
>         #hv-tlbflush,hv-ipi,                           # IPI extensions
>         hv-reenlightenment,                            # nested stuff
>  
>  
>  
> PS: unrelated question:
>  
> Vitaly, do you know why hv-evmcs needs hv-vapic?
>  
>  
> I know that they stuffed the eVMCS pointer to HV_X64_MSR_VP_ASSIST_PAGE,
>  
> But can't we set HV_APIC_ACCESS_AVAILABLE but not HV_APIC_ACCESS_RECOMMENDED
> so that guest would hopefully still know that HV assist page is available,
> but should not use it for APIC acceleration?

Yes,

"hv-vapic" enables so-called "VP Assist" page and Enlightened VMCS GPA
sits there, it is used instead of VMPTRLD (which becomes unsupported)

Take a look at the newly introduced "hv-apicv"/"hv-avic" (the same
thing) in QEMU: 

commit e1f9a8e8c90ae54387922e33e5ac4fd759747d01
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Thu Sep 2 11:35:28 2021 +0200

    i386: Implement pseudo 'hv-avic' ('hv-apicv') enlightenment

when enabled, HV_APIC_ACCESS_RECOMMENDED is not set even with "hv-vapic"
(but HV_APIC_ACCESS_AVAILABLE remains). 

-- 
Vitaly


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
  2022-02-08 15:15 ` Vitaly Kuznetsov
@ 2022-02-08 15:34   ` Maxim Levitsky
  2022-02-08 16:01     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 10+ messages in thread
From: Maxim Levitsky @ 2022-02-08 15:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: kvm

On Tue, 2022-02-08 at 16:15 +0100, Vitaly Kuznetsov wrote:
> Maxim Levitsky <mlevitsk@redhat.com> writes:
> 
> > [102140.117649] WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm]
> ...
> > [102140.123834] Call Trace:
> > [102140.123910]  <TASK>
> > [102140.123976]  ? kvm_write_guest+0x114/0x120 [kvm]
> > [102140.124183]  kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm]
> > [102140.124396]  kvm_arch_vm_ioctl+0xa26/0xc50 [kvm]
> 
> ...
> 
> > This happens because kvm_hv_invalidate_tsc_page is called from kvm_vm_ioctl_set_clock
> > which is a VM wide ioctl and thus doesn't have to be called with an active vCPU.
> >  
> > But as I see the warring states that guest memory writes must alway be done
> > while a vCPU is active to allow the write to be recorded in its dirty track ring.
> >  
> > I _think_ it might be OK to drop this invalidation,
> > and rely on the fact that kvm_hv_setup_tsc_page will update it,
> > and it is called when vCPU0 processes KVM_REQ_CLOCK_UPDATE which is raised in the
> > kvm_vm_ioctl_set_clock eventually.
> >  
> > Vitaly, any thoughts on this?
> >  
> 
> TSC page (as well as SynIC pages) are supposed to be "overlay" pages
> which are mapped over guest's memory but KVM doesn't do that and just
> writes to guest's memory. This kind of works as Windows/Hyper-V guests
> never disable these features and expecting the memory behind them to
> stay intact.
> 
> Dirty tracking for active TSC page can be omited, I belive. Let me take
> a look at this.
> 
> > For reference those are my HV flags:
> >  
> >     $cpu_flags: |
> >         $cpu_flags,
> >         hv_relaxed,hv_spinlocks=0x1fff,hv_vpindex,     # General HV features
> >         hv_runtime,hv_time,hv-frequencies,             # Clock stuff        
> >         hv_synic,hv_stimer,hv-stimer-direct,#hv-vapic, # APIC extensions
> >         #hv-tlbflush,hv-ipi,                           # IPI extensions
> >         hv-reenlightenment,                            # nested stuff
> >  
> >  
> >  
> > PS: unrelated question:
> >  
> > Vitaly, do you know why hv-evmcs needs hv-vapic?
> >  
> >  
> > I know that they stuffed the eVMCS pointer to HV_X64_MSR_VP_ASSIST_PAGE,
> >  
> > But can't we set HV_APIC_ACCESS_AVAILABLE but not HV_APIC_ACCESS_RECOMMENDED
> > so that guest would hopefully still know that HV assist page is available,
> > but should not use it for APIC acceleration?
> 
> Yes,
> 
> "hv-vapic" enables so-called "VP Assist" page and Enlightened VMCS GPA
> sits there, it is used instead of VMPTRLD (which becomes unsupported)
> 
> Take a look at the newly introduced "hv-apicv"/"hv-avic" (the same
> thing) in QEMU: 
> 
> commit e1f9a8e8c90ae54387922e33e5ac4fd759747d01
> Author: Vitaly Kuznetsov <vkuznets@redhat.com>
> Date:   Thu Sep 2 11:35:28 2021 +0200
> 
>     i386: Implement pseudo 'hv-avic' ('hv-apicv') enlightenment
> 
> when enabled, HV_APIC_ACCESS_RECOMMENDED is not set even with "hv-vapic"
> (but HV_APIC_ACCESS_AVAILABLE remains). 
> 

Cool, I didn't expect this. I thought that hv-vapic only enables the AutoEOI
deprecation bit.

This needs to be updated in hyperv.txt in qemu - it currently states that
hv-evmcs disables posted interrupts (that is APICv) and hv-avic
only mentions AutoEOI feature.

Thanks!
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
  2022-02-08 15:34   ` Maxim Levitsky
@ 2022-02-08 16:01     ` Vitaly Kuznetsov
  2022-02-08 17:06       ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Vitaly Kuznetsov @ 2022-02-08 16:01 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: kvm

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Tue, 2022-02-08 at 16:15 +0100, Vitaly Kuznetsov wrote:
>> 
>> "hv-vapic" enables so-called "VP Assist" page and Enlightened VMCS GPA
>> sits there, it is used instead of VMPTRLD (which becomes unsupported)
>> 
>> Take a look at the newly introduced "hv-apicv"/"hv-avic" (the same
>> thing) in QEMU: 
>> 
>> commit e1f9a8e8c90ae54387922e33e5ac4fd759747d01
>> Author: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Date:   Thu Sep 2 11:35:28 2021 +0200
>> 
>>     i386: Implement pseudo 'hv-avic' ('hv-apicv') enlightenment
>> 
>> when enabled, HV_APIC_ACCESS_RECOMMENDED is not set even with "hv-vapic"
>> (but HV_APIC_ACCESS_AVAILABLE remains). 
>> 
>
> Cool, I didn't expect this. I thought that hv-vapic only enables the AutoEOI
> deprecation bit.
>
> This needs to be updated in hyperv.txt in qemu - it currently states that
> hv-evmcs disables posted interrupts (that is APICv) 

EVMCS disables (not only) posted interrupts feature for nested
guests. To be precise, it disables the following:

#define EVMCS1_UNSUPPORTED_PINCTRL (PIN_BASED_POSTED_INTR | \
                                    PIN_BASED_VMX_PREEMPTION_TIMER)
#define EVMCS1_UNSUPPORTED_2NDEXEC                                      \
        (SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |                         \
         SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |                      \
         SECONDARY_EXEC_APIC_REGISTER_VIRT |                            \
         SECONDARY_EXEC_ENABLE_PML |                                    \
         SECONDARY_EXEC_ENABLE_VMFUNC |                                 \
         SECONDARY_EXEC_SHADOW_VMCS |                                   \
         SECONDARY_EXEC_TSC_SCALING |                                   \
         SECONDARY_EXEC_PAUSE_LOOP_EXITING)
#define EVMCS1_UNSUPPORTED_VMEXIT_CTRL                                  \
        (VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |                           \
         VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
#define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
#define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)

> and hv-avic only mentions AutoEOI feature.

True, this is hidden in "The enlightenment allows to use Hyper-V SynIC
with hardware APICv/AVIC enabled". Any suggestions on how to improve
this are more than welcome!.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
  2022-02-08 16:01     ` Vitaly Kuznetsov
@ 2022-02-08 17:06       ` Sean Christopherson
  2022-02-09 12:44         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-02-08 17:06 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: Maxim Levitsky, kvm

On Tue, Feb 08, 2022, Vitaly Kuznetsov wrote:
> Maxim Levitsky <mlevitsk@redhat.com> writes:
> > and hv-avic only mentions AutoEOI feature.
> 
> True, this is hidden in "The enlightenment allows to use Hyper-V SynIC
> with hardware APICv/AVIC enabled". Any suggestions on how to improve
> this are more than welcome!.

Specifically for the WARN, does this approach makes sense?

https://lore.kernel.org/all/YcTpJ369cRBN4W93@google.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
  2022-02-08 17:06       ` Sean Christopherson
@ 2022-02-09 12:44         ` Vitaly Kuznetsov
  2022-02-09 16:22           ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Vitaly Kuznetsov @ 2022-02-09 12:44 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Maxim Levitsky, kvm

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Feb 08, 2022, Vitaly Kuznetsov wrote:
>> Maxim Levitsky <mlevitsk@redhat.com> writes:
>> > and hv-avic only mentions AutoEOI feature.
>> 
>> True, this is hidden in "The enlightenment allows to use Hyper-V SynIC
>> with hardware APICv/AVIC enabled". Any suggestions on how to improve
>> this are more than welcome!.
>
> Specifically for the WARN, does this approach makes sense?
>
> https://lore.kernel.org/all/YcTpJ369cRBN4W93@google.com

(Sorry for missing this dicsussion back in December)

It probably does but the patch just introduces
HV_TSC_PAGE_UPDATE_REQUIRED flag and drops kvm_write_guest() completely,
the flag is never reset and nothing ever gets written to guest's
memory. I suppose you've forgotten to commit a hunk :-) This is good
starting poing though, I'll take a look.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
  2022-02-09 12:44         ` Vitaly Kuznetsov
@ 2022-02-09 16:22           ` Sean Christopherson
  2022-02-09 17:33             ` Vitaly Kuznetsov
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-02-09 16:22 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: Maxim Levitsky, kvm

On Wed, Feb 09, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Tue, Feb 08, 2022, Vitaly Kuznetsov wrote:
> >> Maxim Levitsky <mlevitsk@redhat.com> writes:
> >> > and hv-avic only mentions AutoEOI feature.
> >> 
> >> True, this is hidden in "The enlightenment allows to use Hyper-V SynIC
> >> with hardware APICv/AVIC enabled". Any suggestions on how to improve
> >> this are more than welcome!.
> >
> > Specifically for the WARN, does this approach makes sense?
> >
> > https://lore.kernel.org/all/YcTpJ369cRBN4W93@google.com
> 
> (Sorry for missing this dicsussion back in December)
> 
> It probably does but the patch just introduces
> HV_TSC_PAGE_UPDATE_REQUIRED flag and drops kvm_write_guest() completely,
> the flag is never reset and nothing ever gets written to guest's
> memory. I suppose you've forgotten to commit a hunk :-)

I don't think so, the idea is that kvm_hv_setup_tsc_page() handles the write.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c194a8cbd25f..c1adc9efea28 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2848,7 +2848,7 @@ static void kvm_end_pvclock_update(struct kvm *kvm)
>
>  static void kvm_update_masterclock(struct kvm *kvm)
>  {
> -	kvm_hv_invalidate_tsc_page(kvm);
> +	kvm_hv_request_tsc_page_update(kvm);
>  	kvm_start_pvclock_update(kvm);
>  	pvclock_update_vm_gtod_copy(kvm);
>  	kvm_end_pvclock_update(kvm);
> @@ -3060,8 +3060,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  				       offsetof(struct compat_vcpu_info, time));
>  	if (vcpu->xen.vcpu_time_info_set)
>  		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0);
> -	if (!v->vcpu_idx)
> -		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
> +	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);

This change sends all vCPUs through the helper, not just vCPU 0.  Then the common
helper checks HV_TSC_PAGE_UPDATE_REQUIRED under lock.

	if (!(hv->hv_tsc_page_status & HV_TSC_PAGE_UPDATE_REQUIRED))
		goto out_unlock;


	--- error checking ---

	/* Write the struct entirely before the non-zero sequence.  */
	smp_wmb();

	hv->tsc_ref.tsc_sequence = tsc_seq;
	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
		goto out_err;

	hv->hv_tsc_page_status = HV_TSC_PAGE_SET;
	goto out_unlock;

out_err:
	hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
out_unlock:
	mutex_unlock(&hv->hv_lock);


If there are no errors, the kvm_write_guest() goes through and the status is
"reset".  If there are errors, the status is set to BROKEN.

Should I send an RFC to facilitate discussion?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
  2022-02-09 16:22           ` Sean Christopherson
@ 2022-02-09 17:33             ` Vitaly Kuznetsov
  2022-02-10 13:44               ` Vitaly Kuznetsov
  0 siblings, 1 reply; 10+ messages in thread
From: Vitaly Kuznetsov @ 2022-02-09 17:33 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Maxim Levitsky, kvm

Sean Christopherson <seanjc@google.com> writes:

> On Wed, Feb 09, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Tue, Feb 08, 2022, Vitaly Kuznetsov wrote:
>> >> Maxim Levitsky <mlevitsk@redhat.com> writes:
>> >> > and hv-avic only mentions AutoEOI feature.
>> >> 
>> >> True, this is hidden in "The enlightenment allows to use Hyper-V SynIC
>> >> with hardware APICv/AVIC enabled". Any suggestions on how to improve
>> >> this are more than welcome!.
>> >
>> > Specifically for the WARN, does this approach makes sense?
>> >
>> > https://lore.kernel.org/all/YcTpJ369cRBN4W93@google.com
>> 
>> (Sorry for missing this dicsussion back in December)
>> 
>> It probably does but the patch just introduces
>> HV_TSC_PAGE_UPDATE_REQUIRED flag and drops kvm_write_guest() completely,
>> the flag is never reset and nothing ever gets written to guest's
>> memory. I suppose you've forgotten to commit a hunk :-)
>
> I don't think so, the idea is that kvm_hv_setup_tsc_page() handles the write.
>

Oh, sorry, missed that. Patches always look weird in the browser :-)

>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c194a8cbd25f..c1adc9efea28 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2848,7 +2848,7 @@ static void kvm_end_pvclock_update(struct kvm *kvm)
>>
>>  static void kvm_update_masterclock(struct kvm *kvm)
>>  {
>> -	kvm_hv_invalidate_tsc_page(kvm);
>> +	kvm_hv_request_tsc_page_update(kvm);
>>  	kvm_start_pvclock_update(kvm);
>>  	pvclock_update_vm_gtod_copy(kvm);
>>  	kvm_end_pvclock_update(kvm);
>> @@ -3060,8 +3060,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>>  				       offsetof(struct compat_vcpu_info, time));
>>  	if (vcpu->xen.vcpu_time_info_set)
>>  		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0);
>> -	if (!v->vcpu_idx)
>> -		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
>> +	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
>
> This change sends all vCPUs through the helper, not just vCPU 0.  Then the common
> helper checks HV_TSC_PAGE_UPDATE_REQUIRED under lock.
>
> 	if (!(hv->hv_tsc_page_status & HV_TSC_PAGE_UPDATE_REQUIRED))
> 		goto out_unlock;
>
>
> 	--- error checking ---
>
> 	/* Write the struct entirely before the non-zero sequence.  */
> 	smp_wmb();
>
> 	hv->tsc_ref.tsc_sequence = tsc_seq;
> 	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
> 			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
> 		goto out_err;
>
> 	hv->hv_tsc_page_status = HV_TSC_PAGE_SET;
> 	goto out_unlock;
>
> out_err:
> 	hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
> out_unlock:
> 	mutex_unlock(&hv->hv_lock);
>
>
> If there are no errors, the kvm_write_guest() goes through and the status is
> "reset".  If there are errors, the status is set to BROKEN.
>
> Should I send an RFC to facilitate discussion?
>

Sure, please go ahead. There are some basic selftests for TSC page
since:

commit 2c7f76b4c42bd5d953bc821e151644434865f999
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Thu Mar 18 15:09:49 2021 +0100

    selftests: kvm: Add basic Hyper-V clocksources tests

but I'll have to refresh my memory on the problematic migration scenario
when kvm_hv_invalidate_tsc_page() got introduced.

Thanks!

-- 
Vitaly


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
  2022-02-09 17:33             ` Vitaly Kuznetsov
@ 2022-02-10 13:44               ` Vitaly Kuznetsov
  2022-02-14 14:22                 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 10+ messages in thread
From: Vitaly Kuznetsov @ 2022-02-10 13:44 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Maxim Levitsky, kvm

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Sean Christopherson <seanjc@google.com> writes:
>
>> On Wed, Feb 09, 2022, Vitaly Kuznetsov wrote:
>>> Sean Christopherson <seanjc@google.com> writes:
>>> 
>>> > On Tue, Feb 08, 2022, Vitaly Kuznetsov wrote:
>>> >> Maxim Levitsky <mlevitsk@redhat.com> writes:
>>> >> > and hv-avic only mentions AutoEOI feature.
>>> >> 
>>> >> True, this is hidden in "The enlightenment allows to use Hyper-V SynIC
>>> >> with hardware APICv/AVIC enabled". Any suggestions on how to improve
>>> >> this are more than welcome!.
>>> >
>>> > Specifically for the WARN, does this approach makes sense?
>>> >
>>> > https://lore.kernel.org/all/YcTpJ369cRBN4W93@google.com
>>> 
>>> (Sorry for missing this dicsussion back in December)
>>> 
>>> It probably does but the patch just introduces
>>> HV_TSC_PAGE_UPDATE_REQUIRED flag and drops kvm_write_guest() completely,
>>> the flag is never reset and nothing ever gets written to guest's
>>> memory. I suppose you've forgotten to commit a hunk :-)
>>
>> I don't think so, the idea is that kvm_hv_setup_tsc_page() handles the write.
>>

...

>
> but I'll have to refresh my memory on the problematic migration scenario
> when kvm_hv_invalidate_tsc_page() got introduced.

I've smoke-tested your patch with both selftests and Win10+WSL2
migration and nothing blew up. I, however, don't quite like the idea to
make HV_TSC_PAGE_UPDATE_REQUIRED a bit flag which is orthogonal to all
other HV_TSC_PAGE_ state machine states. E.g. we have the following in
get_time_ref_counter():

  if (hv->hv_tsc_page_status != HV_TSC_PAGE_SET)
          return div_u64(get_kvmclock_ns(kvm), 100);

the following in tsc_page_update_unsafe():

  return (hv->hv_tsc_page_status != HV_TSC_PAGE_GUEST_CHANGED) &&
          hv->hv_tsc_emulation_control;

and the followin in what's now kvm_hv_request_tsc_page_update():

  if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
      hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET ||
      tsc_page_update_unsafe(hv))
          goto out_unlock;

and while I don't see how HV_TSC_PAGE_UPDATE_REQUIRED breaks any of
these, it cetainly takes an extra effort to understand these checks as
we're now comparing something more than just a state machine's state.
Same goes to all assignments to hv->hv_tsc_page_status:
HV_TSC_PAGE_UPDATE_REQUIRED gets implicitly overwritten (i.e. we're not
just switching from state A to state B, we're also clearing the
flag). Again, I don't see how is this incorrect, just unnecessary
complicated (and that's what get me confused when I said you're missing
something in your patch!). 

In case making HV_TSC_PAGE_UPDATE_REQUIRED a real state (or, actually,
several new states) is too cumbersome I'd suggest to explore two
options:
- adding helpers to set/check hv->hv_tsc_page_status and making
HV_TSC_PAGE_UPDATE_REQUIRED clearing/masking explicit.
- adding a boolean (e.g. "tsc_page_update_required") to 'struct kvm_hv'.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context
  2022-02-10 13:44               ` Vitaly Kuznetsov
@ 2022-02-14 14:22                 ` Vitaly Kuznetsov
  0 siblings, 0 replies; 10+ messages in thread
From: Vitaly Kuznetsov @ 2022-02-14 14:22 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Maxim Levitsky, kvm

[-- Attachment #1: Type: text/plain, Size: 341 bytes --]

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> In case making HV_TSC_PAGE_UPDATE_REQUIRED a real state (...) is too
> cumbersome ...

Turns out that it's not and/or I was able to convince myself that the
attached modification of your patch is equally good. Lightly tested.
I can test it a bit more and send it out if needed.

-- 
Vitaly


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-KVM-x86-hyper-v-Avoid-writing-to-TSC-page-without-an.patch --]
[-- Type: text/x-patch, Size: 6231 bytes --]

From 51982d597bcdb497a34ce30b8fb792492d0080dc Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Thu, 10 Feb 2022 10:31:36 +0100
Subject: [PATCH RFC] KVM: x86: hyper-v: Avoid writing to TSC page without an
 active vCPU

The following WARN is triggered from kvm_vm_ioctl_set_clock():
 WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm]
 ...
 CPU: 10 PID: 579353 Comm: qemu-system-x86 Tainted: G        W  O      5.16.0.stable #20
 Hardware name: LENOVO 20UF001CUS/20UF001CUS, BIOS R1CET65W(1.34 ) 06/17/2021
 RIP: 0010:mark_page_dirty_in_slot+0x6c/0x80 [kvm]
 ...
 Call Trace:
  <TASK>
  ? kvm_write_guest+0x114/0x120 [kvm]
  kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm]
  kvm_arch_vm_ioctl+0xa26/0xc50 [kvm]
  ? schedule+0x4e/0xc0
  ? __cond_resched+0x1a/0x50
  ? futex_wait+0x166/0x250
  ? __send_signal+0x1f1/0x3d0
  kvm_vm_ioctl+0x747/0xda0 [kvm]
  ...

The WARN was introduced by 03c0304a86bc ("KVM: Warn if mark_page_dirty()
is called without an active vCPU") but the change seems to be correct
(unlike Hyper-V TSC page update mechanism). In fact, there's no real need
to write to guest memory to invalidate TSC page, this can be done on
the first vCPU which goes through kvm_guest_time_update().

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/hyperv.c           | 40 +++++++--------------------------
 arch/x86/kvm/hyperv.h           |  2 +-
 arch/x86/kvm/x86.c              |  7 +++---
 4 files changed, 14 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6e7c545bc7ee..8334148cfc90 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -971,10 +971,10 @@ enum hv_tsc_page_status {
 	HV_TSC_PAGE_GUEST_CHANGED,
 	/* TSC page MSR was written by KVM userspace, update pending */
 	HV_TSC_PAGE_HOST_CHANGED,
+	/* TSC page needs to be updated due to internal KVM changes */
+	HV_TSC_PAGE_KVM_CHANGED,
 	/* TSC page was properly set up and is currently active  */
 	HV_TSC_PAGE_SET,
-	/* TSC page is currently being updated and therefore is inactive */
-	HV_TSC_PAGE_UPDATING,
 	/* TSC page was set up with an inaccessible GPA */
 	HV_TSC_PAGE_BROKEN,
 };
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index c74e7ebb973d..1f6675f3f084 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1123,11 +1123,13 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence));
 	BUILD_BUG_ON(offsetof(struct ms_hyperv_tsc_page, tsc_sequence) != 0);
 
+	mutex_lock(&hv->hv_lock);
+
 	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
+	    hv->hv_tsc_page_status == HV_TSC_PAGE_SET ||
 	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET)
-		return;
+		goto out_unlock;
 
-	mutex_lock(&hv->hv_lock);
 	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
 		goto out_unlock;
 
@@ -1189,45 +1191,19 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	mutex_unlock(&hv->hv_lock);
 }
 
-void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
+void kvm_hv_request_tsc_page_update(struct kvm *kvm)
 {
 	struct kvm_hv *hv = to_kvm_hv(kvm);
-	u64 gfn;
-	int idx;
-
-	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
-	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET ||
-	    tsc_page_update_unsafe(hv))
-		return;
 
 	mutex_lock(&hv->hv_lock);
 
-	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
-		goto out_unlock;
-
-	/* Preserve HV_TSC_PAGE_GUEST_CHANGED/HV_TSC_PAGE_HOST_CHANGED states */
-	if (hv->hv_tsc_page_status == HV_TSC_PAGE_SET)
-		hv->hv_tsc_page_status = HV_TSC_PAGE_UPDATING;
+	if (hv->hv_tsc_page_status == HV_TSC_PAGE_SET &&
+	    !tsc_page_update_unsafe(hv))
+		hv->hv_tsc_page_status = HV_TSC_PAGE_KVM_CHANGED;
 
-	gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
-
-	hv->tsc_ref.tsc_sequence = 0;
-
-	/*
-	 * Take the srcu lock as memslots will be accessed to check the gfn
-	 * cache generation against the memslots generation.
-	 */
-	idx = srcu_read_lock(&kvm->srcu);
-	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
-			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
-		hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
-	srcu_read_unlock(&kvm->srcu, idx);
-
-out_unlock:
 	mutex_unlock(&hv->hv_lock);
 }
 
-
 static bool hv_check_msr_access(struct kvm_vcpu_hv *hv_vcpu, u32 msr)
 {
 	if (!hv_vcpu->enforce_cpuid)
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index e19c00ee9ab3..da2737f2a956 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -137,7 +137,7 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu);
 
 void kvm_hv_setup_tsc_page(struct kvm *kvm,
 			   struct pvclock_vcpu_time_info *hv_clock);
-void kvm_hv_invalidate_tsc_page(struct kvm *kvm);
+void kvm_hv_request_tsc_page_update(struct kvm *kvm);
 
 void kvm_hv_init_vm(struct kvm *kvm);
 void kvm_hv_destroy_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 74b53a16f38a..7c7ad7fe611e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2859,7 +2859,7 @@ static void kvm_end_pvclock_update(struct kvm *kvm)
 
 static void kvm_update_masterclock(struct kvm *kvm)
 {
-	kvm_hv_invalidate_tsc_page(kvm);
+	kvm_hv_request_tsc_page_update(kvm);
 	kvm_start_pvclock_update(kvm);
 	pvclock_update_vm_gtod_copy(kvm);
 	kvm_end_pvclock_update(kvm);
@@ -3071,8 +3071,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 				       offsetof(struct compat_vcpu_info, time));
 	if (vcpu->xen.vcpu_time_info_set)
 		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0);
-	if (!v->vcpu_idx)
-		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
+	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
 	return 0;
 }
 
@@ -6176,7 +6175,7 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
 	if (data.flags & ~KVM_CLOCK_VALID_FLAGS)
 		return -EINVAL;
 
-	kvm_hv_invalidate_tsc_page(kvm);
+	kvm_hv_request_tsc_page_update(kvm);
 	kvm_start_pvclock_update(kvm);
 	pvclock_update_vm_gtod_copy(kvm);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-02-14 14:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08 14:59 warning in kvm_hv_invalidate_tsc_page due to writes to guest memory from VM ioctl context Maxim Levitsky
2022-02-08 15:15 ` Vitaly Kuznetsov
2022-02-08 15:34   ` Maxim Levitsky
2022-02-08 16:01     ` Vitaly Kuznetsov
2022-02-08 17:06       ` Sean Christopherson
2022-02-09 12:44         ` Vitaly Kuznetsov
2022-02-09 16:22           ` Sean Christopherson
2022-02-09 17:33             ` Vitaly Kuznetsov
2022-02-10 13:44               ` Vitaly Kuznetsov
2022-02-14 14:22                 ` Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.