linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Thoughts of AMX KVM support based on latest kernel
@ 2021-11-10 13:01 Liu, Jing2
  2021-11-16 12:18 ` Thomas Gleixner
  2021-11-16 19:20 ` Thomas Gleixner
  0 siblings, 2 replies; 23+ messages in thread
From: Liu, Jing2 @ 2021-11-10 13:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini
  Cc: LKML, x86, kvm, Nakajima, Jun, Dave Hansen, Arjan van de Ven,
	Jing Liu, seanjc, Cooper, Andrew, Liu, Jing2, Bae, Chang Seok

Hi Thomas and Paolo, 

Thanks for your thoughts and suggestions. After reading the emails
and looking at the code, we'd like to explain our thoughts of AMX 
KVM support based on latest kernel and the code from git: 
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu-kvm

AMX support based on existing design concepts 

One of our objectives is to have a simple and clean KVM implementation by
utilizing the new dynamic extended-features handling in the FPU core.  

Dynamic reallocation and "lazy passthrough" 

The new code allows us to implement "lazy passthrough" of the XFD MSRs
by coupling a buffer reallocation request, which is indirectly made by vcpus 
(VM exit). With "lazy passthrough" of the XFD MSR, we can avoid unnecessary
save/restore of the MSR and allocation of the extended features until the
guest really requires and is allowed to use. Until that point, the XFD MSR is
virtual, and thus we do not need to save/restore the actual MSR at VM 
entry/exit time. And the vcpu does not have an extended state until that point.  

Once the guest starts using the XFD feature (e.g. AMX) and it is permitted to
use it, we allow the guest to directly modify the MSR (passthrough) to avoid
(potentially frequent) VM exits. 

Triggering of a reallocation request and error handling 

First, we want to avoid weird guest failures at runtime due to (more likely) 
permission failures of a reallocation request, checking the permissions of the
vcpu (for the extend features) at kvm_vcpu_ioctl_set_cpuid2() time, when
QEMU wants to advertise the extended features (e.g. AMX) for the first time.
We have no idea at vcpu_create() time whether QEMU wants to enable AMX
or not at that time. If kvm_vcpu_ioctl_set_cpuid2() succeeds, then there is 
no need to further check permission in reallocation path.

Upon detection (interception) of an attempt by a vcpu to write to XCR0 (XSETBV)
and XFD (WRMSR), we check if the write is valid, and we start passthrough of 
the XFD MSRs if the dynamic feature[i] meets the condition
XCR0[i]=1 && XFD[i]=0. And we make a reallocation request to the FPU core.  

We simplify the KVM implementation by assuming that the reallocation 
request was successful when the vcpu comes back to KVM. For such VM exit
handling that requires a buffer-reallocation request, we don't resume the
guest immediately. Instead, we go back to the userspace, to rely on the 
userspace VMM (e.g. QEMU) for handling error cases. The actual reallocation
happens when control is transferred from KVM to the kernel (FPU core). If 
no error, QEMU will come back to KVM by repeating vcpu_ioctl_run(). 

Potential failures there are due to lack of memory. But this would not be
interesting cases; the host should have more resource problems at that 
time if that is the case.  

Additional KVM-specific or and virtualization requirements 

KVM needs to virtualize the XFD features, and we have additional
requirements. 

XFD reset value 
The XFD reset value needs to be 0.  

KVM-specific XFD handling in XSAVES/XRSTORS 

Once we start passthrough the XFD MSR, we need to save/restore
them at VM exit/entry time. If we immediately resume the guest
without enabling interrupts/preemptions (exit fast-path), we have no
issues. We don't need to save the MSR. The question is how the host
XFD MSR is restored while control is in KVM.  

The XSAVE(S) instruction saves the (guest) state component[x] as 0 or
doesn't save when XFD[x] != 0. Accordingly, XRSTOR(S) cannot restore
that (guest state). And it is possible that XFD != 0 and the guest is using
extended feature at VM exit; we can check the XINUSE state-component
bitmap by XGETBV(1). By adding more meaning to the existing field: 
fpstate->in_use, it can be useful for KVM to set the XINUSE value. 

The usual VM exit handling in KVM, however, is done with 
interrupt/preemption enabled. If a guest has a non-zero XFD and AMX
is in use at VM exit, the host and KVM need to maintain the guest state.
There are two cases where the host and KVM may lose the state:  

a). KVM is scheduled out and kernel context switch does XSAVES, 
b). KVM is interrupted and the softirq path calls
kernel_fpu_begin_mask(), which may execute XSAVES.  

One crude way (Option 1) would be clear XFD temporarily at VM exit
time if the extended feature (AMX) is in use (XINUSE). It also causes 
unnecessary overhead because interrupt/preemption may not always
happen. 

Given the new unified handling of the XFD state management and 
guest awareness in the FPU core, we think it might be better to defer
this to the host (Option 2): 

a). Before the host kernel executes XSAVES, it clears XFD by checking if 
this is a KVM guest fpu and if guest AMX is in use (XINUSE). KVM can
convey the condition by using fpstate->is_guest and fpstate->in_use,
for example. We need to add more meaning (and code changes) to
those fields. 
b). Same for XRSTORS.  

One of potential drawbacks of the Option 2 might be additional 
checks in the host, although we can minimize the impact by having
CONFIG_KVM_TBD. We believe that the case
"XFD != 0 and XINUSE != 0" should be very infrequent. 

Propagation of reallocation errors  

As noted above, a reallocation request can fail, and we need to
propagate the error code to the userspace (e.g. QEMU) so that
it can handle the failure properly. Since we do not want to 
terminate the guest after running due to permission errors 
("weird failure"), we think we should check the permission at
set_cpuid2 time, return failure if no permission.  

Looking forward to your comments.

Thanks,
Jing

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-10 13:01 Thoughts of AMX KVM support based on latest kernel Liu, Jing2
@ 2021-11-16 12:18 ` Thomas Gleixner
  2021-11-16 16:05   ` Sean Christopherson
  2021-11-16 19:20 ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-16 12:18 UTC (permalink / raw)
  To: Liu, Jing2, Paolo Bonzini
  Cc: LKML, x86, kvm, Nakajima, Jun, Dave Hansen, Arjan van de Ven,
	Jing Liu, seanjc, Cooper, Andrew, Liu, Jing2, Bae, Chang Seok

Jing,

On Wed, Nov 10 2021 at 13:01, Jing2 Liu wrote:
> Triggering of a reallocation request and error handling 
>
> First, we want to avoid weird guest failures at runtime due to (more likely) 
> permission failures of a reallocation request, checking the permissions of the
> vcpu (for the extend features) at kvm_vcpu_ioctl_set_cpuid2() time, when
> QEMU wants to advertise the extended features (e.g. AMX) for the first
> time.

That's the right thing to do. If there is no permission for the guest
granted via the prctl() extension I suggested then exposing AMX should
be rejected.

> We have no idea at vcpu_create() time whether QEMU wants to enable AMX
> or not at that time. If kvm_vcpu_ioctl_set_cpuid2() succeeds, then there is 
> no need to further check permission in reallocation path.

That's correct.

> Upon detection (interception) of an attempt by a vcpu to write to XCR0 (XSETBV)
> and XFD (WRMSR), we check if the write is valid, and we start passthrough of 
> the XFD MSRs if the dynamic feature[i] meets the condition
> XCR0[i]=1 && XFD[i]=0. And we make a reallocation request to the FPU core.  
>
> We simplify the KVM implementation by assuming that the reallocation 
> request was successful when the vcpu comes back to KVM. For such VM exit
> handling that requires a buffer-reallocation request, we don't resume the
> guest immediately. Instead, we go back to the userspace, to rely on the 
> userspace VMM (e.g. QEMU) for handling error cases. The actual reallocation
> happens when control is transferred from KVM to the kernel (FPU core). If 
> no error, QEMU will come back to KVM by repeating vcpu_ioctl_run(). 
>
> Potential failures there are due to lack of memory. But this would not be
> interesting cases; the host should have more resource problems at that 
> time if that is the case.

Indeed.

> One of potential drawbacks of the Option 2 might be additional 
> checks in the host, although we can minimize the impact by having
> CONFIG_KVM_TBD. We believe that the case
> "XFD != 0 and XINUSE != 0" should be very infrequent.

I really don't like the idea of having an extra check in switch_to().

Can we start simple and do something like the uncompiled below and see
how much overhead it creates?

Thanks,

        tglx
---
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 0f8b90ab18c9..6175a78e0be8 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -122,4 +122,12 @@ static __always_inline __pure bool fpu_state_size_dynamic(void)
 }
 #endif
 
+void fpu_update_guest_xfd_state(void);
+
+static inline void kvm_update_guest_xfd_state(void)
+{
+	if (fpu_state_size_dynamic())
+		fpu_update_guest_xfd_state();
+}
+
 #endif
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8ea306b1bf8e..161db48c9052 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -199,6 +199,17 @@ void fpu_reset_from_exception_fixup(void)
 }
 
 #if IS_ENABLED(CONFIG_KVM)
+void fpu_update_guest_xfd_state(void)
+{
+	u64 xfd;
+
+	/* FIXME: Add debug */
+	rdmsrl(MSR_IA32_XFD, xfd);
+	current->thread.fpu.fpstate->xfd = xfd;
+	__this_cpu_write(xfd_state, xfd);
+}
+EXPORT_SYMBOL_GPL(fpu_update_guest_xfd_state);
+
 static void __fpstate_reset(struct fpstate *fpstate);
 
 bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2686f2edb47c..9425fdbb4806 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9576,6 +9576,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	vcpu->arch.last_vmentry_cpu = vcpu->cpu;
 	vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
 
+	kvm_update_guest_xfd_state();
+
 	vcpu->mode = OUTSIDE_GUEST_MODE;
 	smp_wmb();
 



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 12:18 ` Thomas Gleixner
@ 2021-11-16 16:05   ` Sean Christopherson
  2021-11-16 18:55     ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2021-11-16 16:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Liu, Jing2, Paolo Bonzini, LKML, x86, kvm, Nakajima, Jun,
	Dave Hansen, Arjan van de Ven, Jing Liu, Cooper, Andrew, Bae,
	Chang Seok

On Tue, Nov 16, 2021, Thomas Gleixner wrote:
> > One of potential drawbacks of the Option 2 might be additional 
> > checks in the host, although we can minimize the impact by having
> > CONFIG_KVM_TBD. We believe that the case
> > "XFD != 0 and XINUSE != 0" should be very infrequent.
> 
> I really don't like the idea of having an extra check in switch_to().
> 
> Can we start simple and do something like the uncompiled below and see
> how much overhead it creates?
> 
> Thanks,
> 
>         tglx
> ---

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2686f2edb47c..9425fdbb4806 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9576,6 +9576,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	vcpu->arch.last_vmentry_cpu = vcpu->cpu;
>  	vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
>  
> +	kvm_update_guest_xfd_state();

Is there a reason the XFD switch can't key off TIF_NEED_FPU_LOAD a la the other
FPU stuff?  I.e. piggyback this snippet in vcpu_enter_guest():

	if (test_thread_flag(TIF_NEED_FPU_LOAD))
		switch_fpu_return();
> +
>  	vcpu->mode = OUTSIDE_GUEST_MODE;
>  	smp_wmb();
>  
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 16:05   ` Sean Christopherson
@ 2021-11-16 18:55     ` Thomas Gleixner
  2021-11-16 19:49       ` Paolo Bonzini
  2021-11-16 20:12       ` Sean Christopherson
  0 siblings, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-16 18:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Liu, Jing2, Paolo Bonzini, LKML, x86, kvm, Nakajima, Jun,
	Dave Hansen, Arjan van de Ven, Jing Liu, Cooper, Andrew, Bae,
	Chang Seok

Sean,

On Tue, Nov 16 2021 at 16:05, Sean Christopherson wrote:
> On Tue, Nov 16, 2021, Thomas Gleixner wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 2686f2edb47c..9425fdbb4806 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -9576,6 +9576,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>  	vcpu->arch.last_vmentry_cpu = vcpu->cpu;
>>  	vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
>>  
>> +	kvm_update_guest_xfd_state();
>
> Is there a reason the XFD switch can't key off TIF_NEED_FPU_LOAD a la the other
> FPU stuff?  I.e. piggyback this snippet in vcpu_enter_guest():

TIF_NEED_FPU_LOAD is not set here.

> 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
> 		switch_fpu_return();

Assume guest has control of XFD and XFD writes are not trapped. That
means on vmexit the XFD state of the guest is unknown.

vcpu_run()
        kvm_load_guest_fpu()
            wrmsrl(XFD, guest_fpstate->xfd);
            XRSTORS
          
        do {

           local_irq_disable();

           // Covers the case of softirq usage and preemption
           if (test_thread_flag(TIF_NEED_FPU_LOAD))
		switch_fpu_return()
                  wrmsrl(XFD, guest_fpstate->xfd);

           do {
                vmenter();              // Guest modifies XFD
           } while (reenter);

           local_irq_enable();     <- Problem starts here

           preempt_enable();	   <- Becomes wider here

        } while (!breakout);

        kvm_put_guest_fpu();            // Switch back to user FPU state

So we have the following cases:

guest_fpstate.xfd        XFD at vmexit

       0                      0         // consistent state
       1                      0         // inconsistent state
       0                      1         // inconsistent state
       1                      1         // consistent state

Now assume that after reenabling interrupts a interrupt/softirq happens
which uses FPU. It will save the correct state because XFD is still
guest state, but the subsequent restore will operate on the stale
guest_fpstate.xfd value.

Same problem vs schedule after reenabling preemption or if not preempted
in kvm_put_guest_fpu()

Now you could argue that the interrupt/softirq XSAVES should also read
the XFD MSR and save it in guest_fpstate.xfd. Same in schedule()
and kvm_put_guest_fpu(), i.e:

      XSAVES
      if (fpstate->is_guest) {
            rdmsrl(XFD, xfd);
            fpstate->xfd = xfd;
            __this_cpu_write(..., xfd);
      }

We can do that, but I'm unhappy about this conditional in schedule(). So
I was asking for doing a simple KVM only solution first:

vcpu_run()
        kvm_load_guest_fpu()
            wrmsrl(XFD, guest_fpstate->xfd);
            XRSTORS
          
        do {

           local_irq_disable();

           if (test_thread_flag(TIF_NEED_FPU_LOAD))
		switch_fpu_return()
                  wrmsrl(XFD, guest_fpstate->xfd);

           do {
                vmenter();              // Guest modifies XFD
           } while (reenter);

           update_xfd_state();          // Restore consistency

           local_irq_enable();

and check how bad that is for KVM in terms of overhead on AMX systems.

If it really matters we can look at the conditional in XSAVES path.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-10 13:01 Thoughts of AMX KVM support based on latest kernel Liu, Jing2
  2021-11-16 12:18 ` Thomas Gleixner
@ 2021-11-16 19:20 ` Thomas Gleixner
  2021-11-17  4:52   ` Nakajima, Jun
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-16 19:20 UTC (permalink / raw)
  To: Liu, Jing2, Paolo Bonzini
  Cc: LKML, x86, kvm, Nakajima, Jun, Dave Hansen, Arjan van de Ven,
	Jing Liu, seanjc, Cooper, Andrew, Liu, Jing2, Bae, Chang Seok

Jing,

On Wed, Nov 10 2021 at 13:01, Liu, Jing2 wrote:

more thoughts.

> Once we start passthrough the XFD MSR, we need to save/restore
> them at VM exit/entry time. If we immediately resume the guest
> without enabling interrupts/preemptions (exit fast-path), we have no
> issues. We don't need to save the MSR.

Correct.

> The question is how the host XFD MSR is restored while control is in
> KVM.
>
> The XSAVE(S) instruction saves the (guest) state component[x] as 0 or
> doesn't save when XFD[x] != 0. Accordingly, XRSTOR(S) cannot restore
> that (guest state). And it is possible that XFD != 0 and the guest is using
> extended feature at VM exit;

You mean on creative guests which just keep AMX state alive and set
XFD[AMX] = 1 to later restore it to XFD[AMX] = 0?

> we can check the XINUSE state-component bitmap by XGETBV(1). By adding
> more meaning to the existing field: fpstate->in_use, it can be useful
> for KVM to set the XINUSE value.

As I pointed out to Sean, the problem is inconsistent state. Checking
XGETBV(1) cannot make that go away. And I have no idea how you want to
abuse fpstate->in_use for anything related to the XINUSE bitmap. It's a
single bit for a particular purpose and has absolutely nothing to do
with XINUSE. Trying to overload that is just wrong.

If XFD is not trapped then you have exactly three options:

   1) Make it an autosave MSR and grab the guest XFD value from that
      memory to update fpstate and the shadow memory before reenabling
      interrupts

   2) Do the MSR read how I suggested before reenabling interrupts

   3) Conditionally post XSAVES when fpstate->is_guest == true

Anything else wont work.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 18:55     ` Thomas Gleixner
@ 2021-11-16 19:49       ` Paolo Bonzini
  2021-11-16 20:14         ` Sean Christopherson
  2021-11-16 20:36         ` Thomas Gleixner
  2021-11-16 20:12       ` Sean Christopherson
  1 sibling, 2 replies; 23+ messages in thread
From: Paolo Bonzini @ 2021-11-16 19:49 UTC (permalink / raw)
  To: Thomas Gleixner, Sean Christopherson
  Cc: Liu, Jing2, LKML, x86, kvm, Nakajima, Jun, Dave Hansen,
	Arjan van de Ven, Jing Liu, Cooper, Andrew, Bae, Chang Seok

On 11/16/21 19:55, Thomas Gleixner wrote:
> We can do that, but I'm unhappy about this conditional in schedule(). So
> I was asking for doing a simple KVM only solution first:
> 
> vcpu_run()
>          kvm_load_guest_fpu()
>              wrmsrl(XFD, guest_fpstate->xfd);
>              XRSTORS
>            
>          do {
> 
>             local_irq_disable();
> 
>             if (test_thread_flag(TIF_NEED_FPU_LOAD))
> 		switch_fpu_return()
>                    wrmsrl(XFD, guest_fpstate->xfd);
> 
>             do {
>                  vmenter();              // Guest modifies XFD
>             } while (reenter);
> 
>             update_xfd_state();          // Restore consistency
> 
>             local_irq_enable();
> 
> and check how bad that is for KVM in terms of overhead on AMX systems.

I agree, this is how we handle SPEC_CTRL for example and it can be 
extended to XFD.  We should first do that, then switch to the MSR lists. 
  Hacking into schedule() should really be the last resort.

>            local_irq_enable();     <- Problem starts here
> 
>            preempt_enable();	   <- Becomes wider here

It doesn't become that much wider because there's always preempt 
notifiers.  So if it's okay to save XFD in the XSAVES wrapper and in 
kvm_arch_vcpu_put(), that might be already remove the need to do it 
schedule().

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 18:55     ` Thomas Gleixner
  2021-11-16 19:49       ` Paolo Bonzini
@ 2021-11-16 20:12       ` Sean Christopherson
  2021-11-16 23:35         ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Sean Christopherson @ 2021-11-16 20:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Liu, Jing2, Paolo Bonzini, LKML, x86, kvm, Nakajima, Jun,
	Dave Hansen, Arjan van de Ven, Jing Liu, Cooper, Andrew, Bae,
	Chang Seok

On Tue, Nov 16, 2021, Thomas Gleixner wrote:
> Now you could argue that the interrupt/softirq XSAVES should also read
> the XFD MSR and save it in guest_fpstate.xfd. Same in schedule()
> and kvm_put_guest_fpu(), i.e:
> 
>       XSAVES
>       if (fpstate->is_guest) {
>             rdmsrl(XFD, xfd);
>             fpstate->xfd = xfd;
>             __this_cpu_write(..., xfd);
>       }
> 
> We can do that, but I'm unhappy about this conditional in schedule(). So
> I was asking for doing a simple KVM only solution first:

Ah, the schedule() conditional is the part I was missing.  Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 19:49       ` Paolo Bonzini
@ 2021-11-16 20:14         ` Sean Christopherson
  2021-11-16 20:36         ` Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2021-11-16 20:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Thomas Gleixner, Liu, Jing2, LKML, x86, kvm, Nakajima, Jun,
	Dave Hansen, Arjan van de Ven, Jing Liu, Cooper, Andrew, Bae,
	Chang Seok

On Tue, Nov 16, 2021, Paolo Bonzini wrote:
> On 11/16/21 19:55, Thomas Gleixner wrote:
> > We can do that, but I'm unhappy about this conditional in schedule(). So
> > I was asking for doing a simple KVM only solution first:
> > 
> > vcpu_run()
> >          kvm_load_guest_fpu()
> >              wrmsrl(XFD, guest_fpstate->xfd);
> >              XRSTORS
> >          do {
> > 
> >             local_irq_disable();
> > 
> >             if (test_thread_flag(TIF_NEED_FPU_LOAD))
> > 		switch_fpu_return()
> >                    wrmsrl(XFD, guest_fpstate->xfd);
> > 
> >             do {
> >                  vmenter();              // Guest modifies XFD
> >             } while (reenter);
> > 
> >             update_xfd_state();          // Restore consistency
> > 
> >             local_irq_enable();
> > 
> > and check how bad that is for KVM in terms of overhead on AMX systems.
> 
> I agree, this is how we handle SPEC_CTRL for example and it can be extended
> to XFD.  We should first do that, then switch to the MSR lists.  Hacking
> into schedule() should really be the last resort.

Agreed as well.

> >            local_irq_enable();     <- Problem starts here
> > 
> >            preempt_enable();	   <- Becomes wider here
> 
> It doesn't become that much wider because there's always preempt notifiers.
> So if it's okay to save XFD in the XSAVES wrapper and in
> kvm_arch_vcpu_put(), that might be already remove the need to do it
> schedule().

Assuming AMX can be accessed from (soft) IRQ context, hooking the preempt notifiers
isn't sufficient.  That's also why KVM waits until IRQs are disabled before
handling TIF_NEED_FPU_LOAD.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 19:49       ` Paolo Bonzini
  2021-11-16 20:14         ` Sean Christopherson
@ 2021-11-16 20:36         ` Thomas Gleixner
  2021-11-16 22:11           ` Nakajima, Jun
  2021-11-17  7:39           ` Paolo Bonzini
  1 sibling, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-16 20:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: Liu, Jing2, LKML, x86, kvm, Nakajima, Jun, Dave Hansen,
	Arjan van de Ven, Jing Liu, Cooper, Andrew, Bae, Chang Seok

Paolo,

On Tue, Nov 16 2021 at 20:49, Paolo Bonzini wrote:
> On 11/16/21 19:55, Thomas Gleixner wrote:
>> We can do that, but I'm unhappy about this conditional in schedule(). So
>> I was asking for doing a simple KVM only solution first:
>> 
>> vcpu_run()
>>          kvm_load_guest_fpu()
>>              wrmsrl(XFD, guest_fpstate->xfd);
>>              XRSTORS
>>            
>>          do {
>> 
>>             local_irq_disable();
>> 
>>             if (test_thread_flag(TIF_NEED_FPU_LOAD))
>> 		switch_fpu_return()
>>                    wrmsrl(XFD, guest_fpstate->xfd);
>> 
>>             do {
>>                  vmenter();              // Guest modifies XFD
>>             } while (reenter);
>> 
>>             update_xfd_state();          // Restore consistency
>> 
>>             local_irq_enable();
>> 
>> and check how bad that is for KVM in terms of overhead on AMX systems.
>
> I agree, this is how we handle SPEC_CTRL for example and it can be 
> extended to XFD.

SPEC_CTRL is different because it's done right after each VMEXIT.

XFD can be done lazy when breaking out of the exit fastpath loop before
enabling interrupts.

> We should first do that, then switch to the MSR lists. 
>   Hacking into schedule() should really be the last resort.
>
>>            local_irq_enable();     <- Problem starts here
>> 
>>            preempt_enable();	   <- Becomes wider here
>
> It doesn't become that much wider because there's always preempt 
> notifiers.  So if it's okay to save XFD in the XSAVES wrapper and in 
> kvm_arch_vcpu_put(), that might be already remove the need to do it 
> schedule().

Did not think about preemption notifiers. Probably because I hate
notifiers with a passion since I had to deal with the CPU hotplug
notifier trainwreck.

But yes that would work. So the places to do that would be:

1) kvm_sched_out() -> kvm_arch_vcpu_put()
2) kernel_fpu_begin_mask()
3) kvm_put_guest_fpu()

But I really would start with the trivial version I suggested because
that's already in the slow path and not at every VMEXIT.

I'd be really surprised if that RDMSR is truly noticeable within all the
other crud this path is doing.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 20:36         ` Thomas Gleixner
@ 2021-11-16 22:11           ` Nakajima, Jun
  2021-11-17  7:39           ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Nakajima, Jun @ 2021-11-16 22:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, Sean Christopherson, Liu, Jing2, LKML, x86, kvm,
	Dave Hansen, Arjan van de Ven, Jing Liu, Cooper, Andrew, Bae,
	Chang Seok

> 
> On Nov 16, 2021, at 12:36 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> Paolo,
> 
> On Tue, Nov 16 2021 at 20:49, Paolo Bonzini wrote:
>> On 11/16/21 19:55, Thomas Gleixner wrote:
>>> We can do that, but I'm unhappy about this conditional in schedule(). So
>>> I was asking for doing a simple KVM only solution first:
>>> 
>>> vcpu_run()
>>>         kvm_load_guest_fpu()
>>>             wrmsrl(XFD, guest_fpstate->xfd);
>>>             XRSTORS
>>> 
>>>         do {
>>> 
>>>            local_irq_disable();
>>> 
>>>            if (test_thread_flag(TIF_NEED_FPU_LOAD))
>>> 		switch_fpu_return()
>>>                   wrmsrl(XFD, guest_fpstate->xfd);
>>> 
>>>            do {
>>>                 vmenter();              // Guest modifies XFD
>>>            } while (reenter);
>>> 
>>>            update_xfd_state();          // Restore consistency
>>> 
>>>            local_irq_enable();
>>> 
>>> and check how bad that is for KVM in terms of overhead on AMX systems.
>> 
>> I agree, this is how we handle SPEC_CTRL for example and it can be 
>> extended to XFD.
> 
> SPEC_CTRL is different because it's done right after each VMEXIT.
> 
> XFD can be done lazy when breaking out of the exit fastpath loop before
> enabling interrupts.

I agree. The XFD features are for user-space.


> 
>> We should first do that, then switch to the MSR lists. 
>>  Hacking into schedule() should really be the last resort.
>> 
>>>           local_irq_enable();     <- Problem starts here
>>> 
>>>           preempt_enable();	   <- Becomes wider here
>> 
>> It doesn't become that much wider because there's always preempt 
>> notifiers.  So if it's okay to save XFD in the XSAVES wrapper and in 
>> kvm_arch_vcpu_put(), that might be already remove the need to do it 
>> schedule().
> 
> Did not think about preemption notifiers. Probably because I hate
> notifiers with a passion since I had to deal with the CPU hotplug
> notifier trainwreck.
> 
> But yes that would work. So the places to do that would be:
> 
> 1) kvm_sched_out() -> kvm_arch_vcpu_put()
> 2) kernel_fpu_begin_mask()
> 3) kvm_put_guest_fpu()
> 
> But I really would start with the trivial version I suggested because
> that's already in the slow path and not at every VMEXIT.
> 
> I'd be really surprised if that RDMSR is truly noticeable within all the
> other crud this path is doing.
> 

I also agree here, and we’ll measure the effect to double-check.

We don’t want to complicate or optimize the system for very rare cases.
I like your "trivial version" because all the things KVM needs to do is just restore the consistent state.


--- 
Jun



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 20:12       ` Sean Christopherson
@ 2021-11-16 23:35         ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-16 23:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Liu, Jing2, Paolo Bonzini, LKML, x86, kvm, Nakajima, Jun,
	Dave Hansen, Arjan van de Ven, Jing Liu, Cooper, Andrew, Bae,
	Chang Seok

On Tue, Nov 16 2021 at 20:12, Sean Christopherson wrote:
> On Tue, Nov 16, 2021, Thomas Gleixner wrote:
>> Now you could argue that the interrupt/softirq XSAVES should also read
>> the XFD MSR and save it in guest_fpstate.xfd. Same in schedule()
>> and kvm_put_guest_fpu(), i.e:
>> 
>>       XSAVES
>>       if (fpstate->is_guest) {
>>             rdmsrl(XFD, xfd);
>>             fpstate->xfd = xfd;
>>             __this_cpu_write(..., xfd);
>>       }
>> 
>> We can do that, but I'm unhappy about this conditional in schedule(). So
>> I was asking for doing a simple KVM only solution first:
>
> Ah, the schedule() conditional is the part I was missing.  Thanks!

As I was missing the preempt notifier... which makes it a different
story, but I still think that the simple version is good enough at least
for a start.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 19:20 ` Thomas Gleixner
@ 2021-11-17  4:52   ` Nakajima, Jun
  2021-11-17  7:31     ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Nakajima, Jun @ 2021-11-17  4:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Liu, Jing2, Paolo Bonzini, LKML, x86, kvm, Dave Hansen,
	Arjan van de Ven, Jing Liu, seanjc, Cooper, Andrew, Bae,
	Chang Seok

> 
> On Nov 16, 2021, at 11:20 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> Jing,
> 
> On Wed, Nov 10 2021 at 13:01, Liu, Jing2 wrote:
> 
> more thoughts.
> 
>> Once we start passthrough the XFD MSR, we need to save/restore
>> them at VM exit/entry time. If we immediately resume the guest
>> without enabling interrupts/preemptions (exit fast-path), we have no
>> issues. We don't need to save the MSR.
> 
> Correct.
> 
>> The question is how the host XFD MSR is restored while control is in
>> KVM.
>> 
>> The XSAVE(S) instruction saves the (guest) state component[x] as 0 or
>> doesn't save when XFD[x] != 0. Accordingly, XRSTOR(S) cannot restore
>> that (guest state). And it is possible that XFD != 0 and the guest is using
>> extended feature at VM exit;
> 
> You mean on creative guests which just keep AMX state alive and set
> XFD[AMX] = 1 to later restore it to XFD[AMX] = 0?


Typically a (usual) guest saves the AMX state for the previous process and sets XFD[AMX] = 1 for the next at context switch time, and a VM exit can happen anytime, e.g. right after XFD[AMX] = 1. 
But this case is okay because the state is already saved by the guest.

If a (creative) guest wants to set XFD[AMX] = 1 for fun while keeping AMX state alive without saving the AXM state, it may lose the state after VM exit/entry. I think the right thing to do is to avoid such programming in the first place. Let me find out if we can add such notes in the programming references.


--- 
Jun



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-17  4:52   ` Nakajima, Jun
@ 2021-11-17  7:31     ` Paolo Bonzini
  2021-11-17 10:15       ` Tian, Kevin
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2021-11-17  7:31 UTC (permalink / raw)
  To: Nakajima, Jun, Thomas Gleixner
  Cc: Liu, Jing2, LKML, x86, kvm, Dave Hansen, Arjan van de Ven,
	Jing Liu, seanjc, Cooper, Andrew, Bae, Chang Seok

On 11/17/21 05:52, Nakajima, Jun wrote:
> If a (creative) guest wants to set XFD[AMX] = 1 for fun while keeping
> AMX state alive without saving the AXM state, it may lose the state
> after VM exit/entry.

I think this should not happen, unless you also document that other 
random events (hypothetically, it could be some other core using AMX?) 
can cause the loss of XTILEDATA if XFD[AMX]=1.  Virtualization should 
not be special, I'd prefer that the guest has the same behavior as bare 
metal in this respect.

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-16 20:36         ` Thomas Gleixner
  2021-11-16 22:11           ` Nakajima, Jun
@ 2021-11-17  7:39           ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2021-11-17  7:39 UTC (permalink / raw)
  To: Thomas Gleixner, Sean Christopherson
  Cc: Liu, Jing2, LKML, x86, kvm, Nakajima, Jun, Dave Hansen,
	Arjan van de Ven, Jing Liu, Cooper, Andrew, Bae, Chang Seok

On 11/16/21 21:36, Thomas Gleixner wrote:
>            local_irq_enable();     <- Problem starts here
> 
>            preempt_enable();	   <- Becomes wider here
>
>> It doesn't become that much wider because there's always preempt
>> notifiers.  So if it's okay to save XFD in the XSAVES wrapper and in
>> kvm_arch_vcpu_put(), that might be already remove the need to do it
>> schedule().
>
> Did not think about preemption notifiers. Probably because I hate
> notifiers with a passion since I had to deal with the CPU hotplug
> notifier trainwreck.
> 
> But yes that would work. So the places to do that would be:
> 
> 1) kvm_sched_out() -> kvm_arch_vcpu_put() > 2) kernel_fpu_begin_mask()

... which calls save_fpregs_to_fpstate

> 3) kvm_put_guest_fpu()

... which calls save_fpregs_to_fpstate via fpu_swap_kvm_fpstate

So perhaps it could be done in save_fpregs_to_fpstate (for the sched out 
path, it would be called by switch_fpu_prepare()).  But for now I would 
also start with the trivial version.

> I'd be really surprised if that RDMSR is truly noticeable within all the
> other crud this path is doing.

I agree.

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Thoughts of AMX KVM support based on latest kernel
  2021-11-17  7:31     ` Paolo Bonzini
@ 2021-11-17 10:15       ` Tian, Kevin
  2021-11-17 12:24         ` Thomas Gleixner
  2021-11-17 12:53         ` Paolo Bonzini
  0 siblings, 2 replies; 23+ messages in thread
From: Tian, Kevin @ 2021-11-17 10:15 UTC (permalink / raw)
  To: Paolo Bonzini, Nakajima, Jun, Thomas Gleixner
  Cc: Liu, Jing2, LKML, x86, kvm, Dave Hansen, Arjan van de Ven,
	Jing Liu, seanjc, Cooper, Andrew, Bae, Chang Seok

> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Wednesday, November 17, 2021 3:31 PM
> 
> On 11/17/21 05:52, Nakajima, Jun wrote:
> > If a (creative) guest wants to set XFD[AMX] = 1 for fun while keeping
> > AMX state alive without saving the AXM state, it may lose the state
> > after VM exit/entry.
> 
> I think this should not happen, unless you also document that other
> random events (hypothetically, it could be some other core using AMX?)
> can cause the loss of XTILEDATA if XFD[AMX]=1.  Virtualization should
> not be special, I'd prefer that the guest has the same behavior as bare
> metal in this respect.
> 

The state may be lost with the simple version suggested by Thomas,
because with XFD[AMX]=1 the AMX state won't be stored to 
guest_fpstate when the vcpu thread is preempted and then get lost
when restored. To emulate the hardware behavior this will need
additional trick to force XFD[AMX]=0 if XGETBV(1) is true (ignoring 
guest XFD) in fpu_update_guest_xfd_state(void). It also implies 
that the actual guest XFD needs to be saved in a place before forcing 
XFD[AMX] to 0 and recovered (after interrupt is disabled) before 
entering the guest.

We are not sure whether such trick is worthwhile, since a sane
guest shouldn't set XFD[AMX]=1 before storing the AMX state. This
is why we want to seek SDM change to mark out that the software
should not assume XTILEDATA is still valid when XFD[AMX]=1. 

Then KVM can make a simple assumption that once XFD[AMX]=1
it implies the guest doesn't care about the AMX state. Then the
simple version from Thomas can work perfectly.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Thoughts of AMX KVM support based on latest kernel
  2021-11-17 10:15       ` Tian, Kevin
@ 2021-11-17 12:24         ` Thomas Gleixner
  2021-11-17 12:53         ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-17 12:24 UTC (permalink / raw)
  To: Tian, Kevin, Paolo Bonzini, Nakajima, Jun
  Cc: Liu, Jing2, LKML, x86, kvm, Dave Hansen, Arjan van de Ven,
	Jing Liu, seanjc, Cooper, Andrew, Bae, Chang Seok

Tian,

On Wed, Nov 17 2021 at 10:15, Tian, Kevin wrote:
> We are not sure whether such trick is worthwhile, since a sane
> guest shouldn't set XFD[AMX]=1 before storing the AMX state. This
> is why we want to seek SDM change to mark out that the software
> should not assume XTILEDATA is still valid when XFD[AMX]=1.

Yes, please. Anything else is just causing too much hackery for no
value.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-17 10:15       ` Tian, Kevin
  2021-11-17 12:24         ` Thomas Gleixner
@ 2021-11-17 12:53         ` Paolo Bonzini
  2021-11-18 23:17           ` Nakajima, Jun
  1 sibling, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2021-11-17 12:53 UTC (permalink / raw)
  To: Tian, Kevin, Nakajima, Jun, Thomas Gleixner
  Cc: Liu, Jing2, LKML, x86, kvm, Dave Hansen, Arjan van de Ven,
	Jing Liu, seanjc, Cooper, Andrew, Bae, Chang Seok

On 11/17/21 11:15, Tian, Kevin wrote:
> We are not sure whether such trick is worthwhile, since a sane
> guest shouldn't set XFD[AMX]=1 before storing the AMX state. This
> is why we want to seek SDM change to mark out that the software
> should not assume XTILEDATA is still valid when XFD[AMX]=1.

Okay, I just don't want it to be called out as virtualization specific.

It doesn't have to happen in current processors, but it should be 
architecturally valid behavior to clear the processor's state as soon as 
a bit in XFD is set to 1.

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-17 12:53         ` Paolo Bonzini
@ 2021-11-18 23:17           ` Nakajima, Jun
  2021-11-19 10:13             ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Nakajima, Jun @ 2021-11-18 23:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Tian, Kevin, Thomas Gleixner, Liu, Jing2, LKML, x86, kvm,
	Dave Hansen, Arjan van de Ven, Jing Liu, seanjc, Cooper, Andrew,
	Bae, Chang Seok

On Nov 17, 2021, at 4:53 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 11/17/21 11:15, Tian, Kevin wrote:
>> We are not sure whether such trick is worthwhile, since a sane
>> guest shouldn't set XFD[AMX]=1 before storing the AMX state. This
>> is why we want to seek SDM change to mark out that the software
>> should not assume XTILEDATA is still valid when XFD[AMX]=1.
> 
> Okay, I just don't want it to be called out as virtualization specific.
> 
> It doesn't have to happen in current processors, but it should be architecturally valid behavior to clear the processor's state as soon as a bit in XFD is set to 1.
> 
> Paolo
> 

We recommend that "system software initialize AMX state _before_ doing so" (below). Also, I think what the “creative” guest is doing is "lazy restore”, and "This approach will not operate correctly for a variety of reasons."

https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf


3.3 RECOMMENDATIONS FOR SYSTEM SOFTWARE

System software may disable use of Intel AMX by clearing XCR0[18:17], by clearing CR4.OSXSAVE, or by setting
IA32_XFD[18]. It is recommended that system software initialize AMX state (e.g., by executing TILERELEASE)
before doing so. This is because maintaining AMX state in a non-initialized state may have negative power and performance implications.

System software should not use XFD to implement a “lazy restore” approach to management of the XTILEDATA
state component. This approach will not operate correctly for a variety of reasons. One is that the LDTILECFG and TILERELEASE instructions initialize XTILEDATA and do not cause an #NM exception. Another is that an execution of XSAVE by a user thread will save XTILEDATA as initialized instead of the data expected by the user thread.

--- 
Jun






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-18 23:17           ` Nakajima, Jun
@ 2021-11-19 10:13             ` Thomas Gleixner
  2021-11-19 15:41               ` Nakajima, Jun
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-19 10:13 UTC (permalink / raw)
  To: Nakajima, Jun, Paolo Bonzini
  Cc: Tian, Kevin, Liu, Jing2, LKML, x86, kvm, Dave Hansen,
	Arjan van de Ven, Jing Liu, seanjc, Cooper, Andrew, Bae,
	Chang Seok

Jun,

On Thu, Nov 18 2021 at 23:17, Jun Nakajima wrote:
> On Nov 17, 2021, at 4:53 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> It doesn't have to happen in current processors, but it should be
>> architecturally valid behavior to clear the processor's state as soon
>> as a bit in XFD is set to 1.
>
> 3.3 RECOMMENDATIONS FOR SYSTEM SOFTWARE
>
> System software may disable use of Intel AMX by clearing XCR0[18:17],
> by clearing CR4.OSXSAVE, or by setting IA32_XFD[18]. It is recommended
> that system software initialize AMX state (e.g., by executing
> TILERELEASE) before doing so. This is because maintaining AMX state in
> a non-initialized state may have negative power and performance
> implications.
>
> System software should not use XFD to implement a “lazy restore”
> approach to management of the XTILEDATA state component. This approach
> will not operate correctly for a variety of reasons. One is that the
> LDTILECFG and TILERELEASE instructions initialize XTILEDATA and do not
> cause an #NM exception. Another is that an execution of XSAVE by a
> user thread will save XTILEDATA as initialized instead of the data
> expected by the user thread.

Can this pretty please be reworded so that it says:

  When setting IA32_XFD[18] the AMX register state is not guaranteed to
  be preserved. The resulting register state depends on the
  implementation.

Also it's a real design disaster that component 17 cannot be fenced off
via XFD. That's really inconsistent and leads exactly to this half
defined state.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Thoughts of AMX KVM support based on latest kernel
  2021-11-19 10:13             ` Thomas Gleixner
@ 2021-11-19 15:41               ` Nakajima, Jun
  2021-12-08  0:50                 ` Rewording of Setting IA32_XFD[18] (Re: Thoughts of AMX KVM support based on latest kernel) Nakajima, Jun
  0 siblings, 1 reply; 23+ messages in thread
From: Nakajima, Jun @ 2021-11-19 15:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, Tian, Kevin, Liu, Jing2, LKML, x86, kvm,
	Dave Hansen, Arjan van de Ven, Jing Liu, seanjc, Cooper, Andrew,
	Bae, Chang Seok

Hi Thomas,

> On Nov 19, 2021, at 2:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> Jun,
> 
> On Thu, Nov 18 2021 at 23:17, Jun Nakajima wrote:
>> On Nov 17, 2021, at 4:53 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> 
>>> It doesn't have to happen in current processors, but it should be
>>> architecturally valid behavior to clear the processor's state as soon
>>> as a bit in XFD is set to 1.
>> 
>> 3.3 RECOMMENDATIONS FOR SYSTEM SOFTWARE
>> 
>> System software may disable use of Intel AMX by clearing XCR0[18:17],
>> by clearing CR4.OSXSAVE, or by setting IA32_XFD[18]. It is recommended
>> that system software initialize AMX state (e.g., by executing
>> TILERELEASE) before doing so. This is because maintaining AMX state in
>> a non-initialized state may have negative power and performance
>> implications.
>> 
>> System software should not use XFD to implement a “lazy restore”
>> approach to management of the XTILEDATA state component. This approach
>> will not operate correctly for a variety of reasons. One is that the
>> LDTILECFG and TILERELEASE instructions initialize XTILEDATA and do not
>> cause an #NM exception. Another is that an execution of XSAVE by a
>> user thread will save XTILEDATA as initialized instead of the data
>> expected by the user thread.
> 
> Can this pretty please be reworded so that it says:
> 
>  When setting IA32_XFD[18] the AMX register state is not guaranteed to
>  be preserved. The resulting register state depends on the
>  implementation.
> 
> Also it's a real design disaster that component 17 cannot be fenced off
> via XFD. That's really inconsistent and leads exactly to this half
> defined state.
> 

I’ll work with the H/W team on those (i.e. rewording and the component 17 issue).

Thanks,
---
Jun







^ permalink raw reply	[flat|nested] 23+ messages in thread

* Rewording of Setting IA32_XFD[18] (Re: Thoughts of AMX KVM support based on latest kernel)
  2021-11-19 15:41               ` Nakajima, Jun
@ 2021-12-08  0:50                 ` Nakajima, Jun
  2021-12-08 13:57                   ` Paolo Bonzini
  2021-12-10 21:53                   ` Thomas Gleixner
  0 siblings, 2 replies; 23+ messages in thread
From: Nakajima, Jun @ 2021-12-08  0:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, Tian, Kevin, Liu, Jing2, LKML, x86, kvm,
	Dave Hansen, Arjan van de Ven, Jing Liu, seanjc, Cooper, Andrew,
	Bae, Chang Seok



> On Nov 19, 2021, at 7:41 AM, Nakajima, Jun <jun.nakajima@intel.com> wrote:
> 
> Hi Thomas,
> 
>> On Nov 19, 2021, at 2:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> 
>> Jun,
>> 
>> On Thu, Nov 18 2021 at 23:17, Jun Nakajima wrote:
>>> On Nov 17, 2021, at 4:53 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> 
>>>> It doesn't have to happen in current processors, but it should be
>>>> architecturally valid behavior to clear the processor's state as soon
>>>> as a bit in XFD is set to 1.
>>> 
>>> 3.3 RECOMMENDATIONS FOR SYSTEM SOFTWARE
>>> 
>>> System software may disable use of Intel AMX by clearing XCR0[18:17],
>>> by clearing CR4.OSXSAVE, or by setting IA32_XFD[18]. It is recommended
>>> that system software initialize AMX state (e.g., by executing
>>> TILERELEASE) before doing so. This is because maintaining AMX state in
>>> a non-initialized state may have negative power and performance
>>> implications.
>>> 
>>> System software should not use XFD to implement a “lazy restore”
>>> approach to management of the XTILEDATA state component. This approach
>>> will not operate correctly for a variety of reasons. One is that the
>>> LDTILECFG and TILERELEASE instructions initialize XTILEDATA and do not
>>> cause an #NM exception. Another is that an execution of XSAVE by a
>>> user thread will save XTILEDATA as initialized instead of the data
>>> expected by the user thread.
>> 
>> Can this pretty please be reworded so that it says:
>> 
>> When setting IA32_XFD[18] the AMX register state is not guaranteed to
>> be preserved. The resulting register state depends on the
>> implementation.
>> 
>> Also it's a real design disaster that component 17 cannot be fenced off
>> via XFD. That's really inconsistent and leads exactly to this half
>> defined state.
>> 
> 
> I’ll work with the H/W team on those (i.e. rewording and the component 17 issue).
> 

The following is a possible documentation update that may convey the rewording you requested.
Does this (the last sentence, “In addition, “) work for you?


3.3 RECOMMENDATIONS FOR SYSTEM SOFTWARE

System software may disable use of Intel AMX by clearing XCR0[18:17], by clearing CR4.OSXSAVE, or by setting IA32_XFD[18]. System software should initialize AMX state (e.g., by executing TILERELEASE) when doing so because maintaining AMX state in a non-initialized state may have negative power and performance implications. In addition, software should not rely on the state of the tile data after setting IA32_XFD[18]; software should always reload or reinitialize the tile data after clearing IA32_XFD[18].

Thanks,
--- 
Jun




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Rewording of Setting IA32_XFD[18] (Re: Thoughts of AMX KVM support based on latest kernel)
  2021-12-08  0:50                 ` Rewording of Setting IA32_XFD[18] (Re: Thoughts of AMX KVM support based on latest kernel) Nakajima, Jun
@ 2021-12-08 13:57                   ` Paolo Bonzini
  2021-12-10 21:53                   ` Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2021-12-08 13:57 UTC (permalink / raw)
  To: Nakajima, Jun, Thomas Gleixner
  Cc: Tian, Kevin, Liu, Jing2, LKML, x86, kvm, Dave Hansen,
	Arjan van de Ven, Jing Liu, seanjc, Cooper, Andrew, Bae,
	Chang Seok

On 12/8/21 01:50, Nakajima, Jun wrote:
> 3.3 RECOMMENDATIONS FOR SYSTEM SOFTWARE
> 
> System software may disable use of Intel AMX by clearing XCR0[18:17],
> by clearing CR4.OSXSAVE, or by setting IA32_XFD[18]. System software
> should initialize AMX state (e.g., by executing TILERELEASE) when
> doing so because maintaining AMX state in a non-initialized state may
> have negative power and performance implications. In addition,
> software should not rely on the state of the tile data after setting

I would change this to "must not rely", otherwise looks good.  Thanks!

Paolo

> IA32_XFD[18]; software should always reload or reinitialize the tile
> data after clearing IA32_XFD[18].



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Rewording of Setting IA32_XFD[18] (Re: Thoughts of AMX KVM support based on latest kernel)
  2021-12-08  0:50                 ` Rewording of Setting IA32_XFD[18] (Re: Thoughts of AMX KVM support based on latest kernel) Nakajima, Jun
  2021-12-08 13:57                   ` Paolo Bonzini
@ 2021-12-10 21:53                   ` Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2021-12-10 21:53 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: Paolo Bonzini, Tian, Kevin, Liu, Jing2, LKML, x86, kvm,
	Dave Hansen, Arjan van de Ven, Jing Liu, seanjc, Cooper, Andrew,
	Bae, Chang Seok

Jun,

On Wed, Dec 08 2021 at 00:50, Jun Nakajima wrote:
>> On Nov 19, 2021, at 7:41 AM, Nakajima, Jun <jun.nakajima@intel.com> wrote:
>> I’ll work with the H/W team on those (i.e. rewording and the component 17 issue).
>> 
>
> The following is a possible documentation update that may convey the
> rewording you requested.  Does this (the last sentence, “In addition,
> “) work for you?
>
> 3.3 RECOMMENDATIONS FOR SYSTEM SOFTWARE
>
> System software may disable use of Intel AMX by clearing XCR0[18:17],
> by clearing CR4.OSXSAVE, or by setting IA32_XFD[18]. System software
> should initialize AMX state (e.g., by executing TILERELEASE) when
> doing so because maintaining AMX state in a non-initialized state may
> have negative power and performance implications. In addition,
> software should not rely on the state of the tile data after setting
> IA32_XFD[18]; software should always reload or reinitialize the tile
> data after clearing IA32_XFD[18].

looks good to me.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-12-10 21:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 13:01 Thoughts of AMX KVM support based on latest kernel Liu, Jing2
2021-11-16 12:18 ` Thomas Gleixner
2021-11-16 16:05   ` Sean Christopherson
2021-11-16 18:55     ` Thomas Gleixner
2021-11-16 19:49       ` Paolo Bonzini
2021-11-16 20:14         ` Sean Christopherson
2021-11-16 20:36         ` Thomas Gleixner
2021-11-16 22:11           ` Nakajima, Jun
2021-11-17  7:39           ` Paolo Bonzini
2021-11-16 20:12       ` Sean Christopherson
2021-11-16 23:35         ` Thomas Gleixner
2021-11-16 19:20 ` Thomas Gleixner
2021-11-17  4:52   ` Nakajima, Jun
2021-11-17  7:31     ` Paolo Bonzini
2021-11-17 10:15       ` Tian, Kevin
2021-11-17 12:24         ` Thomas Gleixner
2021-11-17 12:53         ` Paolo Bonzini
2021-11-18 23:17           ` Nakajima, Jun
2021-11-19 10:13             ` Thomas Gleixner
2021-11-19 15:41               ` Nakajima, Jun
2021-12-08  0:50                 ` Rewording of Setting IA32_XFD[18] (Re: Thoughts of AMX KVM support based on latest kernel) Nakajima, Jun
2021-12-08 13:57                   ` Paolo Bonzini
2021-12-10 21:53                   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).