All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
@ 2023-10-05 17:24 Mancini, Riccardo
  2023-10-06  1:39 ` Gavin Shan
  0 siblings, 1 reply; 6+ messages in thread
From: Mancini, Riccardo @ 2023-10-05 17:24 UTC (permalink / raw)
  To: Paolo Bonzini, vkuznets
  Cc: kvm, Graf (AWS), Alexander, Teragni, Matias, Batalov, Eugene

Thanks, Vitaly, Paolo for your replies!
I'll reply just to this message to avoid branching the conversation too much.

> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: 05 October 2023 17:15
> To: Mancini, Riccardo <mancio@amazon.com>; vkuznets@redhat.com
> Cc: kvm@vger.kernel.org; Graf (AWS), Alexander <graf@amazon.de>; Teragni,
> Matias <mteragni@amazon.com>; Batalov, Eugene <bataloe@amazon.com>
> Subject: RE: [EXTERNAL] Bug? Incompatible APF for 4.14 guest on 5.10 and
> later host
> 
> 
> 
> On 10/5/23 17:08, Mancini, Riccardo wrote:
> > Hi,
> >
> > when a 4.14 guest runs on a 5.10 host (and later), it cannot use APF
> > (despite CPUID advertising KVM_FEATURE_ASYNC_PF) due to the new
> > interrupt-based mechanism 2635b5c4a0 (KVM: x86: interrupt based APF
> 'page ready' event delivery).
> > Kernels after 5.9 won't satisfy the guest request to enable APF
> > through KVM_ASYNC_PF_ENABLED, requiring also
> KVM_ASYNC_PF_DELIVERY_AS_INT to be set.
> > Furthermore, the patch set seems to be dropping parts of the legacy
> > #PF handling as well.
> > I consider this as a bug as it breaks APF compatibility for older
> > guests running on newer kernels, by breaking the underlying ABI.
> > What do you think? Was this a deliberate decision?
> 
> Yes, this is intentional.  It is not a breakage because the APF interface
> only tells how asynchronous page faults are delivered; it doesn't promise
> that they are actually delivered.  However, I admit that the change was
> unfortunate.

:(

Makes sense, thanks for the explanation.

> 
> Apart from the concerns about reentrancy, there were two more issues with
> the old API:
> 
> - the page-ready notification lacked an acknowledge mechanism if many
> pages became ready at the same time (see commit 557a961abbe0, "KVM: x86:
> acknowledgment mechanism for async pf page ready notifications").  This
> delayed the notifications of pages after the first.  The new API uses
> MSR_KVM_ASYNC_PF_ACK to fix the problem.
> 
> - the old API confused synchronous events (exceptions) with asynchronous
> events (interrupts); this created a unique case where a page fault was
> generated on a page that is not accessed by the instruction.  (The new API
> only fixes half of this, because it also has a bogus CR2, but it's a bit
> better).  It also meant that page-ready events were suppressed by disabled
> interrupts---but they were not necessarily injected when IF became 1,
> because KVM did not enable the interrupt window.  This is solved
> automatically by just injecting an interrupt.  On the theoretical side,
> it's also just ugly that page-ready events could only be enabled/disabled
> with CLI/STI and not APIC (TPR).
> 
> > Was this already reported in the past (I couldn't find anything in the
> > mailing list but I might have missed it!)?
> > Would it be much effort to support the legacy #PF based mechanism for
> > older guests that choose to only set KVM_ASYNC_PF_ENABLED?
> 
> It is not hard.  However, I don't think we should accept such a patch
> upstream.

Regarding also Vitaly comment about backporting the changes to 4.14, I think
supporting both modes in 5.10 (at least) might be the least effort path
(fewer changes), at least to my naive untrained eye.
I tried to playing around by partially reverting some of the changes to handle
both cases but only got kernel panics in the guest so far, so I might be
missing something. 
However, I have absolutely no experience with KVM code, so I wasn't expecting
to get far in any case.

> I do have a question for you.  Can you describe the context in which you
> are using APF, and would you be interested in ARM support?  We (Red Hat,
> not me the maintainer :)) have been trying to understand for a long time
> if cloud providers use or need APF.

Keeping it short, we resume "remote" VM snapshots so page faults might
be very expensive on local cache misses. We have a few optimizations to work
around some of the issues, but even on local hits there are still a lot
of expensive page faults compared to a normal VM use-case, I believe.
To be fair, I didn't even realise the benefits we were getting from APF 
until it actually broke :) 
It indeed plays a big role in keeping the resumption quick and efficient
in our use-case.
I didn't know that it wasn't available for ARM, as we don't use it at
the moment, but that would be interesting for the future.

Thanks,
Riccardo

> 
> Paolo
> 
> > The reason this is an issue for us now is that not having APF for
> > older guests introduces a significant performance regression on 4.14
> > guests when paired to uffd handling of "remote" page-faults (similar
> > to a live migration scenario) when we update from a 4.14 host kernel to
> a 5.10 host kernel.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
  2023-10-05 17:24 Bug? Incompatible APF for 4.14 guest on 5.10 and later host Mancini, Riccardo
@ 2023-10-06  1:39 ` Gavin Shan
  0 siblings, 0 replies; 6+ messages in thread
From: Gavin Shan @ 2023-10-06  1:39 UTC (permalink / raw)
  To: Mancini, Riccardo, Paolo Bonzini, vkuznets
  Cc: kvm, Graf (AWS),
	Alexander, Teragni, Matias, Batalov, Eugene, Marc Zyngier,
	Oliver Upton, kvmarm


On 10/6/23 03:24, Mancini, Riccardo wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Sent: 05 October 2023 17:15

[...]

>> I do have a question for you.  Can you describe the context in which you
>> are using APF, and would you be interested in ARM support?  We (Red Hat,
>> not me the maintainer :)) have been trying to understand for a long time
>> if cloud providers use or need APF.
> 
> Keeping it short, we resume "remote" VM snapshots so page faults might
> be very expensive on local cache misses. We have a few optimizations to work
> around some of the issues, but even on local hits there are still a lot
> of expensive page faults compared to a normal VM use-case, I believe.
> To be fair, I didn't even realise the benefits we were getting from APF
> until it actually broke :)
> It indeed plays a big role in keeping the resumption quick and efficient
> in our use-case.
> I didn't know that it wasn't available for ARM, as we don't use it at
> the moment, but that would be interesting for the future.
> 

Adding Marc, Oliver and kvmarm@lists.linux.dev

I tried to make the feature available to ARM64 long time ago, but the efforts
were discontinued as the significant concern was no users demanding for it [1].
It's definitely exciting news to know it's a important feature to AWS. I guess
it's probably another chance to re-evaluate the feature for ARM64?

[1] https://lore.kernel.org/kvmarm/87iloq2oke.wl-maz@kernel.org/

Async PF needs two signals sent from host to guest, SDEI (Software Delegated
Exception Interface) is leveraged for that. So there were two series to support
SDEI virtualization [1] and Async PF on ARM64 [2].

[1] https://lore.kernel.org/kvmarm/20220527080253.1562538-1-gshan@redhat.com/
[2] https://lore.kernel.org/kvmarm/20210815005947.83699-1-gshan@redhat.com/

I got several questions for Mancini to answer, helpful understand the situation
better.

- VM shapshot is stored somewhere remotely. It means the page fault on
   instruction fetch becomes expensive. Do we have benchmarks how much
   benefits brought by Async PF on x86 in AWS environment?

- I'm wandering if the data can be fetched from somewhere remotely in AWS
   environment?

- The data can be stored in local DRAM or swapping space, the page fault
   to fetch data becomes expensive if the data is stored in swapping space.
   I'm not sure if it's possible the data resides in the swapping space in
   AWS environment? Note that the swapping space, corresponding to disk,
   could be somewhere remotely seated.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
@ 2023-10-13 15:40 Mancini, Riccardo
  0 siblings, 0 replies; 6+ messages in thread
From: Mancini, Riccardo @ 2023-10-13 15:40 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, Graf (AWS),
	Alexander, Teragni, Matias, Batalov, Eugene, Marc Zyngier,
	Oliver Upton, kvmarm, Paolo Bonzini, vkuznets

> Adding Marc, Oliver and kvmarm@lists.linux.dev
> 
> I tried to make the feature available to ARM64 long time ago, but the
> efforts were discontinued as the significant concern was no users
> demanding for it [1].
> It's definitely exciting news to know it's a important feature to AWS. I
> guess it's probably another chance to re-evaluate the feature for ARM64?
> 
> [1] https://lore.kernel.org/kvmarm/87iloq2oke.wl-maz@kernel.org/
> 
> Async PF needs two signals sent from host to guest, SDEI (Software
> Delegated Exception Interface) is leveraged for that. So there were two
> series to support SDEI virtualization [1] and Async PF on ARM64 [2].
> 
> [1] https://lore.kernel.org/kvmarm/20220527080253.1562538-1-
> gshan@redhat.com/
> [2] https://lore.kernel.org/kvmarm/20210815005947.83699-1-
> gshan@redhat.com/

Thanks for all the information! This might become useful in the future,
when we'll enable this feature on ARM, given the improvements we saw in x86.

> 
> I got several questions for Mancini to answer, helpful understand the
> situation better.
> 
> - VM shapshot is stored somewhere remotely. It means the page fault on
>    instruction fetch becomes expensive. Do we have benchmarks how much
>    benefits brought by Async PF on x86 in AWS environment?

In our small local repro (only local disk access) which runs a Java load after
resume of the Firecracker VM, we saw a 20% performance regression (from ~80ms 
to ~100ms) and the time spent outside the VM due to EPT_VIOLATION increased 3x 
from 30ms to 90ms. This impact is amplified when access is not local.

> 
> - I'm wandering if the data can be fetched from somewhere remotely in AWS
>    environment?

Without getting into details, yes, any memory page could be remotely accessed
in the worst case.

> 
> - The data can be stored in local DRAM or swapping space, the page fault
>    to fetch data becomes expensive if the data is stored in swapping
> space.
>    I'm not sure if it's possible the data resides in the swapping space in
>    AWS environment? Note that the swapping space, corresponding to disk,
>    could be somewhere remotely seated.

In our usage, during resumption almost all pages are missing and are populated
on demand with a userfaultfd, either from a local cache (memory or disk) or
from the network.

Thanks,
Riccardo

> 
> Thanks,
> Gavin
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
  2023-10-05 15:08 Mancini, Riccardo
  2023-10-05 15:38 ` Vitaly Kuznetsov
@ 2023-10-05 16:15 ` Paolo Bonzini
  1 sibling, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2023-10-05 16:15 UTC (permalink / raw)
  To: Mancini, Riccardo, vkuznets
  Cc: kvm, Graf (AWS), Alexander, Teragni, Matias, Batalov, Eugene

On 10/5/23 17:08, Mancini, Riccardo wrote:
> Hi,
> 
> when a 4.14 guest runs on a 5.10 host (and later), it cannot use APF (despite
> CPUID advertising KVM_FEATURE_ASYNC_PF) due to the new interrupt-based
> mechanism 2635b5c4a0 (KVM: x86: interrupt based APF 'page ready' event delivery).
> Kernels after 5.9 won't satisfy the guest request to enable APF through
> KVM_ASYNC_PF_ENABLED, requiring also KVM_ASYNC_PF_DELIVERY_AS_INT to be set.
> Furthermore, the patch set seems to be dropping parts of the legacy #PF handling
> as well.
> I consider this as a bug as it breaks APF compatibility for older guests running
> on newer kernels, by breaking the underlying ABI.
> What do you think? Was this a deliberate decision?

Yes, this is intentional.  It is not a breakage because the APF 
interface only tells how asynchronous page faults are delivered; it 
doesn't promise that they are actually delivered.  However, I admit that 
the change was unfortunate.

Apart from the concerns about reentrancy, there were two more issues 
with the old API:

- the page-ready notification lacked an acknowledge mechanism if many 
pages became ready at the same time (see commit 557a961abbe0, "KVM: x86: 
acknowledgment mechanism for async pf page ready notifications").  This 
delayed the notifications of pages after the first.  The new API uses 
MSR_KVM_ASYNC_PF_ACK to fix the problem.

- the old API confused synchronous events (exceptions) with asynchronous 
events (interrupts); this created a unique case where a page fault was 
generated on a page that is not accessed by the instruction.  (The new 
API only fixes half of this, because it also has a bogus CR2, but it's a 
bit better).  It also meant that page-ready events were suppressed by 
disabled interrupts---but they were not necessarily injected when IF 
became 1, because KVM did not enable the interrupt window.  This is 
solved automatically by just injecting an interrupt.  On the theoretical 
side, it's also just ugly that page-ready events could only be 
enabled/disabled with CLI/STI and not APIC (TPR).

> Was this already reported in the past (I couldn't find anything in the mailing list
> but I might have missed it!)?
> Would it be much effort to support the legacy #PF based mechanism for older
> guests that choose to only set KVM_ASYNC_PF_ENABLED?

It is not hard.  However, I don't think we should accept such a patch 
upstream.

I do have a question for you.  Can you describe the context in which you 
are using APF, and would you be interested in ARM support?  We (Red Hat, 
not me the maintainer :)) have been trying to understand for a long time 
if cloud providers use or need APF.

Paolo

> The reason this is an issue for us now is that not having APF for older guests
> introduces a significant performance regression on 4.14 guests when paired to
> uffd handling of "remote" page-faults (similar to a live migration scenario)
> when we update from a 4.14 host kernel to a 5.10 host kernel.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug? Incompatible APF for 4.14 guest on 5.10 and later host
  2023-10-05 15:08 Mancini, Riccardo
@ 2023-10-05 15:38 ` Vitaly Kuznetsov
  2023-10-05 16:15 ` Paolo Bonzini
  1 sibling, 0 replies; 6+ messages in thread
From: Vitaly Kuznetsov @ 2023-10-05 15:38 UTC (permalink / raw)
  To: Mancini, Riccardo
  Cc: kvm, Graf (AWS), Alexander, Teragni, Matias, Batalov, Eugene, pbonzini

"Mancini, Riccardo" <mancio@amazon.com> writes:

> Hi,
>
> when a 4.14 guest runs on a 5.10 host (and later), it cannot use APF (despite
> CPUID advertising KVM_FEATURE_ASYNC_PF) due to the new interrupt-based
> mechanism 2635b5c4a0 (KVM: x86: interrupt based APF 'page ready' event delivery).
> Kernels after 5.9 won't satisfy the guest request to enable APF through
> KVM_ASYNC_PF_ENABLED, requiring also KVM_ASYNC_PF_DELIVERY_AS_INT to be set.
> Furthermore, the patch set seems to be dropping parts of the legacy #PF handling
> as well.
> I consider this as a bug as it breaks APF compatibility for older guests running
> on newer kernels, by breaking the underlying ABI.
> What do you think? Was this a deliberate decision?

It was. #PF based "page ready" injection was found to be fragile as in
some cases it can collide with an actual #PF and nothing good is
expected if this ever happens. I don't think we've actually broken the
ABI as "asynchronous page fault" was always a "best effort" service: the
guest indicates its readiness to process 'page missing' events but the
host is under no obligation to actually send such notifications.

> Was this already reported in the past (I couldn't find anything in the mailing list
> but I might have missed it!)?

I think it was Andy Lutomirski who started the discussion, see
e.g. https://lore.kernel.org/lkml/ed71d0967113a35f670a9625a058b8e6e0b2f104.1583547991.git.luto@kernel.org/

the patch is about KVM_ASYNC_PF_SEND_ALWAYS but if you go down the
discussion you'll find more concerns expressed.

> Would it be much effort to support the legacy #PF based mechanism for older
> guests that choose to only set KVM_ASYNC_PF_ENABLED?

Personally, I wouldn't go down this road: #PF injection at random time
(for page-ready events) is still considered being fragile.

>
> The reason this is an issue for us now is that not having APF for older guests
> introduces a significant performance regression on 4.14 guests when paired to
> uffd handling of "remote" page-faults (similar to a live migration scenario)
> when we update from a 4.14 host kernel to a 5.10 host kernel.

What about backporting interrupt-based APF mechanism to older guests?

-- 
Vitaly


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Bug? Incompatible APF for 4.14 guest on 5.10 and later host
@ 2023-10-05 15:08 Mancini, Riccardo
  2023-10-05 15:38 ` Vitaly Kuznetsov
  2023-10-05 16:15 ` Paolo Bonzini
  0 siblings, 2 replies; 6+ messages in thread
From: Mancini, Riccardo @ 2023-10-05 15:08 UTC (permalink / raw)
  To: pbonzini, vkuznets
  Cc: kvm, Graf (AWS), Alexander, Teragni, Matias, Batalov, Eugene

Hi,

when a 4.14 guest runs on a 5.10 host (and later), it cannot use APF (despite
CPUID advertising KVM_FEATURE_ASYNC_PF) due to the new interrupt-based
mechanism 2635b5c4a0 (KVM: x86: interrupt based APF 'page ready' event delivery).
Kernels after 5.9 won't satisfy the guest request to enable APF through
KVM_ASYNC_PF_ENABLED, requiring also KVM_ASYNC_PF_DELIVERY_AS_INT to be set.
Furthermore, the patch set seems to be dropping parts of the legacy #PF handling
as well.
I consider this as a bug as it breaks APF compatibility for older guests running
on newer kernels, by breaking the underlying ABI.
What do you think? Was this a deliberate decision?
Was this already reported in the past (I couldn't find anything in the mailing list
but I might have missed it!)?
Would it be much effort to support the legacy #PF based mechanism for older
guests that choose to only set KVM_ASYNC_PF_ENABLED?

The reason this is an issue for us now is that not having APF for older guests
introduces a significant performance regression on 4.14 guests when paired to
uffd handling of "remote" page-faults (similar to a live migration scenario)
when we update from a 4.14 host kernel to a 5.10 host kernel.

Thanks,
Riccardo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-13 15:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-05 17:24 Bug? Incompatible APF for 4.14 guest on 5.10 and later host Mancini, Riccardo
2023-10-06  1:39 ` Gavin Shan
  -- strict thread matches above, loose matches on Subject: below --
2023-10-13 15:40 Mancini, Riccardo
2023-10-05 15:08 Mancini, Riccardo
2023-10-05 15:38 ` Vitaly Kuznetsov
2023-10-05 16:15 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.