All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Paolo Bonzini <pbonzini@redhat.com>,
	"Liu, Jing2" <jing2.liu@intel.com>,
	LKML <linux-kernel@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>,
	"Bae, Chang Seok" <chang.seok.bae@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	Jing Liu <jing2.liu@linux.intel.com>,
	"seanjc@google.com" <seanjc@google.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
Date: Wed, 13 Oct 2021 16:06:46 +0200	[thread overview]
Message-ID: <871r4p9fyh.ffs@tglx> (raw)
In-Reply-To: <da47ba42-b61e-d236-2c1c-9c5504e48091@redhat.com>

Paolo,

On Wed, Oct 13 2021 at 10:42, Paolo Bonzini wrote:
> On 13/10/21 09:46, Liu, Jing2 wrote:
>>> Yes, the host value of XFD (which is zero) has to be restored after vmexit.
>>> See how KVM already handles SPEC_CTRL.
>> 
>> I'm trying to understand why qemu's XFD is zero after kernel supports AMX.
>
> There are three copies of XFD:
>
> - the guest value stored in vcpu->arch.
>
> - the "QEMU" value attached to host_fpu.  This one only becomes zero if 
> QEMU requires AMX (which shouldn't happen).

I don't think that makes sense.

First of all, if QEMU wants to expose AMX to guests, then it has to ask
for permission to do so as any other user space process. We're not going
to make that special just because.

The guest configuration will have to have a 'needs AMX' flag set. So
QEMU knows that it is required upfront.

Which also means that a guest configuration which has it not set will
never get AMX passed through.

That tells me, that we should not bother at all with on demand buffer
reallocations for that case and just keep things simple.

The on demand buffer allocation from the general OS point of view makes
sense because there it really matters whether we allocate $N kilobytes
per thread or not.

But does it matter for the QEMU process and its vCPU threads when the
guest is allowed to use AMX? I don't think so. It's an academic exercise
IMO and just makes the handling of this way more complex than required.

So the logic should be:

   qemu()
     read_config()
     if (dynamic_features_passthrough())
     	request_permission(feature)

     create_vcpu_threads()
       ....

       vcpu_thread()
         kvm_ioctl(ENABLE_DYN_FEATURE, feature)
           reallocate_buffers()
             realloc(tsk->fpu.fpstate, feature)
             realloc(guest_fpu.fpstate, feature)
             realloc(host_fpu.fpstate, feature)

             All of them will have

             fpstate.xfd = default_xfd & ~feature

That makes also resume and migration simple because that's going to use
exactly the same mechanism.

Yes, it _allows_ QEMU user space to use AMX, but that's not the end of
the world, really and avoids a ton of special cases to worry about.

Also the extra memory consumption per vCPU thread is probably just noise
compared to the rest of the vCPU state.

With that the only thing you have to take care of is in vmx_vcpu_run():

   local_irq_disable();
   ...
   vmx_vcpu_run()
     wrmsrl(XFD, guest->xfd)
     vmenter()
     guest->xfd = rdmsrl(XFD)
     wrmsrl(XFD, host->xfd)

It does not matter when at some day there is a XFD controlled bit 19 and
you want to selectively allow access to guests because we have two
mechanisms here:

  1) XCR0

    XSETBV in the guest is intercepted and checked against the allowed
    bits. If it tries to set one which is not allowed, then this is
    not any different from what KVM is doing today.

    I.e. Guest1 is allowed to set bit 18, but not 19
         Guest2 is allowed to set bit 19, but not 18
         Guest3 is allowed to set both 18 and 19

  2) XFD

     Intercepting XFD is optional I think. It does not matter what the
     guest writes into it, because if XCRO[i] = 0 then the state of
     XFD[i] is irrelevant according to the ISE:

     "(IA32_XFD[i] does not affect processor operations if XCR0[i] = 0.)"

     The only thing different vs. bare metal is that when guest writes
     XFD[i]=1 it wont get #GP despite the fact that virtualized CPUID
     suggest that it should get one:
     
     "Bit i of either MSR can be set to 1 only if CPUID.(EAX=0DH,ECX=i):ECX[2]
      is enumerated as 1.  An execution of WRMSR that attempts to set an
      unsupported bit in either MSR causes a general-protection fault
      (#GP)."

     Does it matter?  Probably not, all it can figure out is that
     component[i] is supported in hardware, but it can't do anything
     with that information because the VMM will not allow it to set the
     corresponding XCR0 bit...

     Sure you can intercept XFD, check the write against the allowed
     guest bits and inject #GP if not.

     But keep in mind that the guest kernel will context switch it and
     that will not be any better than context switching XCR0 in the
     guest kernel...

The thing we need to think about is the case where guest has XCR0[i] =
XFD[i] = 1 and host has XFD[i] = 0, because setting XFD[i] = 1 does not
bring the component[i] into init state.

In that case we have the following situation after a vmexit:

     guest->xfd = rdmsrl(XFD)         [i] = 1
     wrmsrl(XFD, host->xfd)           [i] = 0

If the component[i] is _not_ in init state then the next XSAVES on the
host will save it and therefore have xsave.header.XSAVE_BV[i] = 1 in the
buffer. A subsequent XRSTORS of that buffer on the host will restore the
saved data into component[i].

But the subsequent vmenter() will restore the guest XFD which will just
bring the guest into the exactly same state as before the VMEXIT.

Ergo it does not matter at all.

That also makes #NM handling trivial. Any #NM generated in the guest is
completely uninteresting for the host with that scheme and it's the
guests problem to deal with it.

But that brings me to another issue: XFD_ERR.

Assume guest takes #NM and before the handler can run and read/clear
XFD_ERR a VMEXIT happens which means XFD_ERR will have the guest error
bit set and nothing will clear it. So XFD_ERR has to be handled properly
otherwise a subsequent #NM on the host will see a stale bit from the
guest.

   vmx_vcpu_run()
     wrmsrl(XFD, guest->xfd)
     wrmsrl(XFD_ERR, guest->xfd_err)
     vmenter()
     guest->xfd_err = rdmsrl(XFD_ERR)
     guest->xfd = rdmsrl(XFD)
     wrmsrl(XFD_ERR, 0)
     wrmsrl(XFD, host->xfd)

Of course that want's to be conditional on the guest configuration and
you probably want all of that to be in the auto-load/store area, but
you get the idea.

Anything else will just create more problems than it solves. Especially
#NM handling (think nested guest) and the XFD_ERR additive behaviour
will be a nasty playground and easy to get wrong.

Not having that at all makes life way simpler, right?

Thanks,

        tglx

  parent reply	other threads:[~2021-10-13 14:06 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-11 23:59 [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner
2021-10-11 23:59 ` [patch 01/31] x86/fpu: Remove pointless argument from switch_fpu_finish() Thomas Gleixner
2021-10-12  0:00 ` [patch 02/31] x86/fpu: Update stale comments Thomas Gleixner
2021-10-12  0:00 ` [patch 03/31] x86/pkru: Remove useless include Thomas Gleixner
2021-10-12  0:00 ` [patch 04/31] x86/fpu: Restrict xsaves()/xrstors() to independent states Thomas Gleixner
2021-10-12 14:24   ` Borislav Petkov
2021-10-12  0:00 ` [patch 05/31] x86/fpu: Cleanup the on_boot_cpu clutter Thomas Gleixner
2021-10-12  0:00 ` [patch 06/31] x86/fpu: Remove pointless memset in fpu_clone() Thomas Gleixner
2021-10-12  0:00 ` [patch 07/31] x86/process: Clone FPU in copy_thread() Thomas Gleixner
2021-10-12  0:00 ` [patch 08/31] x86/fpu: Do not inherit FPU context for kernel and IO worker threads Thomas Gleixner
2021-10-12  0:00 ` [patch 09/31] x86/fpu: Do not inherit FPU context for CLONE_THREAD Thomas Gleixner
2021-10-12 16:10   ` Borislav Petkov
2021-10-12 18:52     ` Thomas Gleixner
2021-10-12 19:01       ` Thomas Gleixner
2021-10-12  0:00 ` [patch 10/31] x86/fpu: Cleanup xstate xcomp_bv initialization Thomas Gleixner
2021-10-12  0:00 ` [patch 11/31] x86/fpu/xstate: Provide and use for_each_xfeature() Thomas Gleixner
2021-10-12 16:45   ` Borislav Petkov
2021-10-12  0:00 ` [patch 12/31] x86/fpu/xstate: Mark all init only functions __init Thomas Gleixner
2021-10-12  0:00 ` [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core Thomas Gleixner
2021-10-12 16:53   ` Borislav Petkov
2021-10-12 18:25     ` Thomas Gleixner
2021-10-12 18:26       ` Thomas Gleixner
2021-10-12 17:22   ` Paolo Bonzini
2021-10-13  6:15     ` Liu, Jing2
2021-10-13  6:26       ` Paolo Bonzini
2021-10-13  7:46         ` Liu, Jing2
2021-10-13  8:42           ` Paolo Bonzini
2021-10-13 10:14             ` Andy Lutomirski
2021-10-13 12:26               ` Paolo Bonzini
2021-10-13 14:14                 ` Thomas Gleixner
2021-10-13 14:24                   ` Thomas Gleixner
2021-10-13 14:59                 ` Andy Lutomirski
2021-10-13 15:05                   ` Paolo Bonzini
2021-10-13 10:25             ` Liu, Jing2
2021-10-13 12:37               ` Paolo Bonzini
2021-10-13 14:06             ` Thomas Gleixner [this message]
2021-10-14  6:50               ` Paolo Bonzini
2021-10-14  8:02                 ` Liu, Jing2
2021-10-14  9:01                   ` Paolo Bonzini
2021-10-14 11:21                     ` Liu, Jing2
2021-10-14 11:33                       ` Paolo Bonzini
2021-10-14 11:30                     ` Liu, Jing2
2021-10-14 11:39                       ` Paolo Bonzini
2021-11-22  8:50                         ` Liu, Jing2
2021-10-14 14:09                     ` Thomas Gleixner
2021-10-14 14:37                       ` Thomas Gleixner
2021-10-14 15:01                       ` Paolo Bonzini
2021-10-14 19:14                         ` Thomas Gleixner
2021-10-15  9:20                           ` Liu, Jing2
2021-10-15  9:36                           ` Thomas Gleixner
2021-10-15 14:24                             ` Liu, Jing2
2021-10-15 15:53                               ` Paolo Bonzini
2021-10-16 14:45                               ` Thomas Gleixner
2021-10-15  9:00                         ` Liu, Jing2
2021-10-15 10:50                           ` Thomas Gleixner
2021-10-15 11:17                             ` Paolo Bonzini
2021-10-15 13:01                             ` Liu, Jing2
2021-10-14 12:23                 ` Thomas Gleixner
2021-10-14 12:26                   ` Paolo Bonzini
2021-10-14 14:23                     ` Thomas Gleixner
2021-10-13 15:12       ` Thomas Gleixner
2021-10-14  8:21         ` Liu, Jing2
2021-10-14 13:08           ` Thomas Gleixner
2021-10-12  0:00 ` [patch 14/31] x86/fpu: Replace KVMs homebrewn FPU copy from user Thomas Gleixner
2021-10-12 17:00   ` Borislav Petkov
2021-10-13 14:57     ` Sean Christopherson
2021-10-13 15:12       ` Paolo Bonzini
2021-10-13 15:16       ` Thomas Gleixner
2021-10-12 17:30   ` Paolo Bonzini
2021-10-12  0:00 ` [patch 15/31] x86/fpu: Rework copy_xstate_to_uabi_buf() Thomas Gleixner
2021-10-12 17:30   ` Paolo Bonzini
2021-10-12  0:00 ` [patch 16/31] x86/fpu: Replace KVMs homebrewn FPU copy to user Thomas Gleixner
2021-10-12 17:10   ` Borislav Petkov
2021-10-12 17:36   ` Paolo Bonzini
2021-10-12 17:47     ` Thomas Gleixner
2021-10-12 18:40       ` [patch V2 16/31] x86/fpu: Replace KVMs home brewed " Thomas Gleixner
2021-10-13  5:34       ` [patch 16/31] x86/fpu: Replace KVMs homebrewn " Paolo Bonzini
2021-10-12  0:00 ` [patch 17/31] x86/fpu: Mark fpu__init_prepare_fx_sw_frame() as __init Thomas Gleixner
2021-10-12  0:00 ` [patch 18/31] x86/fpu: Move context switch and exit to user inlines into sched.h Thomas Gleixner
2021-10-12  0:00 ` [patch 19/31] x86/fpu: Clean up cpu feature tests Thomas Gleixner
2021-10-12  0:00 ` [patch 20/31] x86/fpu: Make os_xrstor_booting() private Thomas Gleixner
2021-10-12  0:00 ` [patch 21/31] x86/fpu: Move os_xsave() and os_xrstor() to core Thomas Gleixner
2021-10-12  0:00 ` [patch 22/31] x86/fpu: Move legacy ASM wrappers " Thomas Gleixner
2021-10-12  0:00 ` [patch 23/31] x86/fpu: Make WARN_ON_FPU() private Thomas Gleixner
2021-10-12  0:00 ` [patch 24/31] x86/fpu: Move fpregs_restore_userregs() to core Thomas Gleixner
2021-10-12 17:32   ` Borislav Petkov
2021-10-12  0:00 ` [patch 25/31] x86/fpu: Move mxcsr related code " Thomas Gleixner
2021-10-12  0:00 ` [patch 26/31] x86/fpu: Move fpstate functions to api.h Thomas Gleixner
2021-10-12 17:46   ` Borislav Petkov
2021-10-12  0:00 ` [patch 27/31] x86/fpu: Remove internal.h dependency from fpu/signal.h Thomas Gleixner
2021-10-12  0:00 ` [patch 28/31] x86/sev: Include fpu/xcr.h Thomas Gleixner
2021-10-12  7:24   ` Xiaoyao Li
2021-10-12  0:00 ` [patch 29/31] x86/fpu: Mop up the internal.h leftovers Thomas Gleixner
2021-10-12  0:00 ` [patch 30/31] x86/fpu: Replace the includes of fpu/internal.h Thomas Gleixner
2021-10-12  0:00 ` [patch 31/31] x86/fpu: Provide a proper function for ex_handler_fprestore() Thomas Gleixner
2021-10-12 21:15 ` [patch 00/31] x86/fpu: Preparatory cleanups for AMX support (part 1) Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871r4p9fyh.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=andrew.cooper3@citrix.com \
    --cc=arjan@linux.intel.com \
    --cc=chang.seok.bae@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jing2.liu@intel.com \
    --cc=jing2.liu@linux.intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.