All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Upton <oliver.upton@linux.dev>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory
Date: Wed, 27 Jul 2022 16:06:30 +0000	[thread overview]
Message-ID: <YuFihl0AQvb5w/M3@google.com> (raw)
In-Reply-To: <YuEVq8Au7YsDLOdI@monolith.localdoman>

On Wed, Jul 27, 2022 at 11:38:53AM +0100, Alexandru Elisei wrote:
> Hi Marc,
> 
> On Wed, Jul 27, 2022 at 10:52:34AM +0100, Marc Zyngier wrote:
> > On Wed, 27 Jul 2022 10:30:59 +0100,
> > Marc Zyngier <maz@kernel.org> wrote:
> > > 
> > > On Tue, 26 Jul 2022 18:51:21 +0100,
> > > Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > 
> > > > Doesn't pinning the buffer also imply pinning the stage 1 tables
> > > > responsible for its translation as well? I agree that pinning the buffer
> > > > is likely the best way forward as pinning the whole of guest memory is
> > > > entirely impractical.
> > 
> > Huh, I just realised that you were talking about S1. I don't think we
> > need to do this. As long as the translation falls into a mapped
> > region (pinned or not), we don't need to worry.

Right, but my issue is what happens when a fragment of the S1 becomes
unmapped at S2. We were discussing the idea of faulting once on the
buffer at the beginning of profiling, seems to me that it could just as
easily happen at runtime and get tripped up by what Alex points out
below:

> PMBSR_EL1.DL might be set 1 as a result of stage 2 fault reported by SPE,
> which means the last record written is incomplete. Records have a variable
> size, so it's impossible for KVM to revert to the end of the last known
> good record without parsing the buffer (references here [1]). And even if
> KVM would know the size of a record, there's this bit in the Arm ARM which
> worries me (ARM DDI 0487H.a, page D10-5177):
> 
> "The architecture does not require that a sample record is written
> sequentially by the SPU, only that:
> [..]
> - On a Profiling Buffer management interrupt, PMBSR_EL1.DL indicates
>   whether PMBPTR_EL1 points to the first byte after the last complete
>   sample record."
> 
> So there might be gaps in the buffer, meaning that the entire buffer would
> have to be discarded if DL is set as a result of a stage 2 fault.

Attempting to avoid thrashing with more threads so I'm going to summon back
some context from your original reply, Marc:

> > > > Live migration also throws a wrench in this. IOW, there are still potential
> > > > sources of blackout unattributable to guest manipulation of the SPU.
> > >
> > > Can you chime some light on this? I appreciate that you can't play the
> > > R/O trick on the SPE buffer as it invalidates the above discussion,
> > > but it should be relatively easy to track these pages and never reset
> > > them as clean until the vcpu is stopped. Unless you foresee other
> > > issues?

Right, we can play tricks on pre-copy to avoid write protecting the SPE
buffer. My concern was more around post-copy, where userspace could've
decided to leave the buffer behind and demand it back on the resulting
S2 fault.

> > > To be clear, I don't worry too much about these blind windows. The
> > > architecture doesn't really give us the right tools to make it work
> > > reliably, making this a best effort only. Unless we pin the whole
> > > guest and forego migration and other fault-driven mechanisms.
> > >
> > > Maybe that is a choice we need to give to the user: cheap, fast,
> > > reliable. Pick two.

As long as we crisply document the errata in KVM's virtualized SPE (and
inform the guest), that sounds reasonable. I'm just uneasy about
proceeding with an implementation w/ so many gotchas unless all parties
involved are aware of the quirks.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Oliver Upton <oliver.upton@linux.dev>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory
Date: Wed, 27 Jul 2022 16:06:30 +0000	[thread overview]
Message-ID: <YuFihl0AQvb5w/M3@google.com> (raw)
In-Reply-To: <YuEVq8Au7YsDLOdI@monolith.localdoman>

On Wed, Jul 27, 2022 at 11:38:53AM +0100, Alexandru Elisei wrote:
> Hi Marc,
> 
> On Wed, Jul 27, 2022 at 10:52:34AM +0100, Marc Zyngier wrote:
> > On Wed, 27 Jul 2022 10:30:59 +0100,
> > Marc Zyngier <maz@kernel.org> wrote:
> > > 
> > > On Tue, 26 Jul 2022 18:51:21 +0100,
> > > Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > 
> > > > Doesn't pinning the buffer also imply pinning the stage 1 tables
> > > > responsible for its translation as well? I agree that pinning the buffer
> > > > is likely the best way forward as pinning the whole of guest memory is
> > > > entirely impractical.
> > 
> > Huh, I just realised that you were talking about S1. I don't think we
> > need to do this. As long as the translation falls into a mapped
> > region (pinned or not), we don't need to worry.

Right, but my issue is what happens when a fragment of the S1 becomes
unmapped at S2. We were discussing the idea of faulting once on the
buffer at the beginning of profiling, seems to me that it could just as
easily happen at runtime and get tripped up by what Alex points out
below:

> PMBSR_EL1.DL might be set 1 as a result of stage 2 fault reported by SPE,
> which means the last record written is incomplete. Records have a variable
> size, so it's impossible for KVM to revert to the end of the last known
> good record without parsing the buffer (references here [1]). And even if
> KVM would know the size of a record, there's this bit in the Arm ARM which
> worries me (ARM DDI 0487H.a, page D10-5177):
> 
> "The architecture does not require that a sample record is written
> sequentially by the SPU, only that:
> [..]
> - On a Profiling Buffer management interrupt, PMBSR_EL1.DL indicates
>   whether PMBPTR_EL1 points to the first byte after the last complete
>   sample record."
> 
> So there might be gaps in the buffer, meaning that the entire buffer would
> have to be discarded if DL is set as a result of a stage 2 fault.

Attempting to avoid thrashing with more threads so I'm going to summon back
some context from your original reply, Marc:

> > > > Live migration also throws a wrench in this. IOW, there are still potential
> > > > sources of blackout unattributable to guest manipulation of the SPU.
> > >
> > > Can you chime some light on this? I appreciate that you can't play the
> > > R/O trick on the SPE buffer as it invalidates the above discussion,
> > > but it should be relatively easy to track these pages and never reset
> > > them as clean until the vcpu is stopped. Unless you foresee other
> > > issues?

Right, we can play tricks on pre-copy to avoid write protecting the SPE
buffer. My concern was more around post-copy, where userspace could've
decided to leave the buffer behind and demand it back on the resulting
S2 fault.

> > > To be clear, I don't worry too much about these blind windows. The
> > > architecture doesn't really give us the right tools to make it work
> > > reliably, making this a best effort only. Unless we pin the whole
> > > guest and forego migration and other fault-driven mechanisms.
> > >
> > > Maybe that is a choice we need to give to the user: cheap, fast,
> > > reliable. Pick two.

As long as we crisply document the errata in KVM's virtualized SPE (and
inform the guest), that sounds reasonable. I'm just uneasy about
proceeding with an implementation w/ so many gotchas unless all parties
involved are aware of the quirks.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-07-27 16:06 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-19 13:51 KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory Alexandru Elisei
2022-04-19 13:51 ` Alexandru Elisei
2022-04-19 14:10 ` Will Deacon
2022-04-19 14:10   ` Will Deacon
2022-04-19 14:44   ` Alexandru Elisei
2022-04-19 14:44     ` Alexandru Elisei
2022-04-19 14:59     ` Will Deacon
2022-04-19 14:59       ` Will Deacon
2022-04-19 15:20       ` Alexandru Elisei
2022-04-19 15:20         ` Alexandru Elisei
2022-04-19 15:35         ` Alexandru Elisei
2022-04-19 15:35           ` Alexandru Elisei
2022-07-25 10:06   ` Alexandru Elisei
2022-07-25 10:06     ` Alexandru Elisei
2022-07-26 17:51     ` Oliver Upton
2022-07-26 17:51       ` Oliver Upton
2022-07-27  9:30       ` Marc Zyngier
2022-07-27  9:30         ` Marc Zyngier
2022-07-27  9:52         ` Marc Zyngier
2022-07-27  9:52           ` Marc Zyngier
2022-07-27 10:38           ` Alexandru Elisei
2022-07-27 10:38             ` Alexandru Elisei
2022-07-27 16:06             ` Oliver Upton [this message]
2022-07-27 16:06               ` Oliver Upton
2022-07-27 10:56         ` Alexandru Elisei
2022-07-27 10:56           ` Alexandru Elisei
2022-07-27 11:18           ` Marc Zyngier
2022-07-27 11:18             ` Marc Zyngier
2022-07-27 12:10             ` Alexandru Elisei
2022-07-27 12:10               ` Alexandru Elisei
2022-07-27 10:19       ` Alexandru Elisei
2022-07-27 10:19         ` Alexandru Elisei
2022-07-27 10:29         ` Marc Zyngier
2022-07-27 10:29           ` Marc Zyngier
2022-07-27 10:44           ` Alexandru Elisei
2022-07-27 10:44             ` Alexandru Elisei
2022-07-27 11:08             ` Marc Zyngier
2022-07-27 11:08               ` Marc Zyngier
2022-07-27 11:57               ` Alexandru Elisei
2022-07-27 11:57                 ` Alexandru Elisei
2022-07-27 15:15                 ` Oliver Upton
2022-07-27 15:15                   ` Oliver Upton
2022-07-27 11:00       ` Alexandru Elisei
2022-07-27 11:00         ` Alexandru Elisei
2022-08-01 17:00     ` Will Deacon
2022-08-01 17:00       ` Will Deacon
2022-08-02  9:49       ` Alexandru Elisei
2022-08-02  9:49         ` Alexandru Elisei
2022-08-02 19:34         ` Oliver Upton
2022-08-02 19:34           ` Oliver Upton
2022-08-09 14:01           ` Alexandru Elisei
2022-08-09 14:01             ` Alexandru Elisei
2022-08-09 18:43             ` Oliver Upton
2022-08-09 18:43               ` Oliver Upton
2022-08-10  9:37               ` Alexandru Elisei
2022-08-10  9:37                 ` Alexandru Elisei
2022-08-10 15:25                 ` Oliver Upton
2022-08-10 15:25                   ` Oliver Upton
2022-08-12 13:05                   ` Alexandru Elisei
2022-08-12 13:05                     ` Alexandru Elisei
2022-08-17 15:05                     ` Oliver Upton
2022-08-17 15:05                       ` Oliver Upton
2022-09-12 14:50                       ` Alexandru Elisei
2022-09-12 14:50                         ` Alexandru Elisei
2022-09-13 10:58                         ` Oliver Upton
2022-09-13 10:58                           ` Oliver Upton
2022-09-13 12:41                           ` Alexandru Elisei
2022-09-13 12:41                             ` Alexandru Elisei
2022-09-13 14:13                             ` Oliver Upton
2022-09-13 14:13                               ` Oliver Upton
2023-01-03 14:26                               ` Alexandru Elisei
2023-01-03 14:26                                 ` Alexandru Elisei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuFihl0AQvb5w/M3@google.com \
    --to=oliver.upton@linux.dev \
    --cc=alexandru.elisei@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.