RE: Bug? Incompatible APF for 4.14 guest on 5.10 and later host

* RE: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
@ 2023-10-05 17:24 Mancini, Riccardo
  2023-10-06  1:39 ` Gavin Shan
  0 siblings, 1 reply; 6+ messages in thread
From: Mancini, Riccardo @ 2023-10-05 17:24 UTC (permalink / raw)
  To: Paolo Bonzini, vkuznets
  Cc: kvm, Graf (AWS), Alexander, Teragni, Matias, Batalov, Eugene

Thanks, Vitaly, Paolo for your replies!
I'll reply just to this message to avoid branching the conversation too much.

> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: 05 October 2023 17:15
> To: Mancini, Riccardo <mancio@amazon.com>; vkuznets@redhat.com
> Cc: kvm@vger.kernel.org; Graf (AWS), Alexander <graf@amazon.de>; Teragni,
> Matias <mteragni@amazon.com>; Batalov, Eugene <bataloe@amazon.com>
> Subject: RE: [EXTERNAL] Bug? Incompatible APF for 4.14 guest on 5.10 and
> later host
> 
> 
> 
> On 10/5/23 17:08, Mancini, Riccardo wrote:
> > Hi,
> >
> > when a 4.14 guest runs on a 5.10 host (and later), it cannot use APF
> > (despite CPUID advertising KVM_FEATURE_ASYNC_PF) due to the new
> > interrupt-based mechanism 2635b5c4a0 (KVM: x86: interrupt based APF
> 'page ready' event delivery).
> > Kernels after 5.9 won't satisfy the guest request to enable APF
> > through KVM_ASYNC_PF_ENABLED, requiring also
> KVM_ASYNC_PF_DELIVERY_AS_INT to be set.
> > Furthermore, the patch set seems to be dropping parts of the legacy
> > #PF handling as well.
> > I consider this as a bug as it breaks APF compatibility for older
> > guests running on newer kernels, by breaking the underlying ABI.
> > What do you think? Was this a deliberate decision?
> 
> Yes, this is intentional.  It is not a breakage because the APF interface
> only tells how asynchronous page faults are delivered; it doesn't promise
> that they are actually delivered.  However, I admit that the change was
> unfortunate.

:(

Makes sense, thanks for the explanation.

> 
> Apart from the concerns about reentrancy, there were two more issues with
> the old API:
> 
> - the page-ready notification lacked an acknowledge mechanism if many
> pages became ready at the same time (see commit 557a961abbe0, "KVM: x86:
> acknowledgment mechanism for async pf page ready notifications").  This
> delayed the notifications of pages after the first.  The new API uses
> MSR_KVM_ASYNC_PF_ACK to fix the problem.
> 
> - the old API confused synchronous events (exceptions) with asynchronous
> events (interrupts); this created a unique case where a page fault was
> generated on a page that is not accessed by the instruction.  (The new API
> only fixes half of this, because it also has a bogus CR2, but it's a bit
> better).  It also meant that page-ready events were suppressed by disabled
> interrupts---but they were not necessarily injected when IF became 1,
> because KVM did not enable the interrupt window.  This is solved
> automatically by just injecting an interrupt.  On the theoretical side,
> it's also just ugly that page-ready events could only be enabled/disabled
> with CLI/STI and not APIC (TPR).
> 
> > Was this already reported in the past (I couldn't find anything in the
> > mailing list but I might have missed it!)?
> > Would it be much effort to support the legacy #PF based mechanism for
> > older guests that choose to only set KVM_ASYNC_PF_ENABLED?
> 
> It is not hard.  However, I don't think we should accept such a patch
> upstream.

Regarding also Vitaly comment about backporting the changes to 4.14, I think
supporting both modes in 5.10 (at least) might be the least effort path
(fewer changes), at least to my naive untrained eye.
I tried to playing around by partially reverting some of the changes to handle
both cases but only got kernel panics in the guest so far, so I might be
missing something. 
However, I have absolutely no experience with KVM code, so I wasn't expecting
to get far in any case.

> I do have a question for you.  Can you describe the context in which you
> are using APF, and would you be interested in ARM support?  We (Red Hat,
> not me the maintainer :)) have been trying to understand for a long time
> if cloud providers use or need APF.

Keeping it short, we resume "remote" VM snapshots so page faults might
be very expensive on local cache misses. We have a few optimizations to work
around some of the issues, but even on local hits there are still a lot
of expensive page faults compared to a normal VM use-case, I believe.
To be fair, I didn't even realise the benefits we were getting from APF 
until it actually broke :) 
It indeed plays a big role in keeping the resumption quick and efficient
in our use-case.
I didn't know that it wasn't available for ARM, as we don't use it at
the moment, but that would be interesting for the future.

Thanks,
Riccardo

> 
> Paolo
> 
> > The reason this is an issue for us now is that not having APF for
> > older guests introduces a significant performance regression on 4.14
> > guests when paired to uffd handling of "remote" page-faults (similar
> > to a live migration scenario) when we update from a 4.14 host kernel to
> a 5.10 host kernel.

^ permalink raw reply	[flat|nested] 6+ messages in thread