kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Vivek Goyal <vgoyal@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	kvm list <kvm@vger.kernel.org>, stable <stable@vger.kernel.org>
Subject: Re: [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND_ALWAYS
Date: Wed, 08 Apr 2020 15:01:58 +0200	[thread overview]
Message-ID: <87pncib06x.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <274f3d14-08ac-e5cc-0b23-e6e0274796c8@redhat.com>

Paolo Bonzini <pbonzini@redhat.com> writes:
> On 08/04/20 01:21, Thomas Gleixner wrote:
>>>> No. Async PF is not a real exception. It has interrupt semantics and it
>>>> can only be injected when the guest has interrupts enabled. It's bad
>>>> design.
>>>
>>> Page-ready async PF has interrupt semantics.
>>>
>>> Page-not-present async PF however does not have interrupt semantics, it
>>> has to be injected immediately or not at all (falling back to host page
>>> fault in the latter case).
>> 
>> If interrupts are disabled in the guest then it is NOT injected and the
>> guest is suspended. So it HAS interrupt semantics. Conditional ones,
>> i.e. if interrupts are disabled, bail, if not then inject it.
>
> Interrupts can be delayed by TPR or STI/MOV SS interrupt window, async
> page faults cannot (again, not the page-ready kind).

Can we pretty please stop using the term async page fault? It's just
wrong and causes more confusion than anything else.

What this does is really what I called Opportunistic Make Guest Do Other
Stuff. And it has neither true exception nor true interrupt semantics.

It's a software event which is injected into the guest to let the guest
do something else than waiting for the actual #PF cause to be
resolved. It's part of a software protocol between host and guest.

And it comes with restrictions:

    The Do Other Stuff event can only be delivered when guest IF=1.

    If guest IF=0 then the host has to suspend the guest until the
    situation is resolved.

    The 'Situation resolved' event must also wait for a guest IF=1 slot.

> Page-not-present async page faults are almost a perfect match for the
> hardware use of #VE (and it might even be possible to let the
> processor deliver the exceptions).  There are other advantages:
>
> - the only real problem with using #PF (with or without
> KVM_ASYNC_PF_SEND_ALWAYS) seems to be the NMI reentrancy issue, which
> would not be there for #VE.
>
> - #VE are combined the right way with other exceptions (the
> benign/contributory/pagefault stuff)
>
> - adjusting KVM and Linux to use #VE instead of #PF would be less than
> 100 lines of code.

If you just want to solve Viveks problem, then its good enough. I.e. the
file truncation turns the EPT entries into #VE convertible entries and
the guest #VE handler can figure it out. This one can be injected
directly by the hardware, i.e. you don't need a VMEXIT.

If you want the opportunistic do other stuff mechanism, then #VE has
exactly the same problems as the existing async "PF". It's not magicaly
making that go away.

One possible solution might be to make all recoverable EPT entries
convertible and let the HW inject #VE for those.

So the #VE handler in the guest would have to do:

       if (!recoverable()) {
       		if (user_mode)
                	send_signal();
                else if (!fixup_exception())
                	die_hard();
                goto done;  
       }                 

       store_ve_info_in_pv_page();

       if (!user_mode(regs) || !preemptible()) {
       		hypercall_resolve_ept(can_continue = false);
       } else {
                init_completion();
       		hypercall_resolve_ept(can_continue = true);
                wait_for_completion();
       }

or something like that.

The hypercall to resolve the EPT fail on the host acts on the
can_continue argument.

If false, it suspends the guest vCPU and only returns when done.

If true it kicks the resolve process and returns to the guest which
suspends the task and tries to do something else.

The wakeup side needs to be a regular interrupt and cannot go through
#VE.

Thanks,

        tglx

  reply	other threads:[~2020-04-08 13:02 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-07  2:26 [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND_ALWAYS Andy Lutomirski
2020-03-07 15:03 ` Andy Lutomirski
2020-03-07 15:47   ` Thomas Gleixner
2020-03-07 15:59     ` Andy Lutomirski
2020-03-07 19:01       ` Thomas Gleixner
2020-03-07 19:34         ` Andy Lutomirski
2020-03-08  7:23         ` Thomas Gleixner
2020-03-09  6:57           ` Thomas Gleixner
2020-03-09  8:40             ` Paolo Bonzini
2020-03-09  9:09               ` Thomas Gleixner
2020-03-09 18:14                 ` Andy Lutomirski
2020-03-09 19:05                   ` Thomas Gleixner
2020-03-09 20:22                     ` Peter Zijlstra
2020-04-06 19:09                       ` Vivek Goyal
2020-04-06 20:25                         ` Peter Zijlstra
2020-04-06 20:32                           ` Andy Lutomirski
2020-04-06 20:42                             ` Andy Lutomirski
2020-04-07 17:21                               ` Vivek Goyal
2020-04-07 17:38                                 ` Andy Lutomirski
2020-04-07 20:20                                   ` Thomas Gleixner
2020-04-07 21:41                                     ` Andy Lutomirski
2020-04-07 22:07                                       ` Paolo Bonzini
2020-04-07 22:29                                         ` Andy Lutomirski
2020-04-08  0:30                                           ` Paolo Bonzini
2020-05-21 15:55                                         ` Vivek Goyal
2020-04-07 22:48                                       ` Thomas Gleixner
2020-04-08  4:48                                         ` Andy Lutomirski
2020-04-08  9:32                                           ` Borislav Petkov
2020-04-08 10:12                                           ` Thomas Gleixner
2020-04-08 18:23                                           ` Vivek Goyal
2020-04-07 22:49                                       ` Vivek Goyal
2020-04-08 10:01                                         ` Borislav Petkov
2020-04-07 22:04                                     ` Paolo Bonzini
2020-04-07 23:21                                       ` Thomas Gleixner
2020-04-08  8:23                                         ` Paolo Bonzini
2020-04-08 13:01                                           ` Thomas Gleixner [this message]
2020-04-08 15:38                                             ` Peter Zijlstra
2020-04-08 16:41                                               ` Thomas Gleixner
2020-04-09  9:03                                             ` Paolo Bonzini
2020-04-08 15:34                                           ` Sean Christopherson
2020-04-08 16:50                                             ` Paolo Bonzini
2020-04-08 18:01                                               ` Thomas Gleixner
2020-04-08 20:34                                                 ` Vivek Goyal
2020-04-08 23:06                                                   ` Thomas Gleixner
2020-04-08 23:14                                                     ` Thomas Gleixner
2020-04-09  4:50                                                 ` Andy Lutomirski
2020-04-09  9:43                                                   ` Paolo Bonzini
2020-04-09 11:36                                                   ` Andrew Cooper
2020-04-09 12:47                                                   ` Paolo Bonzini
2020-04-09 14:13                                                     ` Andrew Cooper
2020-04-09 14:32                                                       ` Paolo Bonzini
2020-04-09 15:03                                                         ` Andy Lutomirski
2020-04-09 15:17                                                           ` Paolo Bonzini
2020-04-09 17:32                                                             ` Andy Lutomirski
2020-04-06 21:32                         ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pncib06x.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).