From: Catalin Marinas <catalin.marinas@arm.com> To: Will Deacon <will@kernel.org> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>, ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Subject: Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory Date: Fri, 13 Oct 2023 14:08:10 +0100 [thread overview] Message-ID: <ZSlBOiebenPKXBY4@arm.com> (raw) In-Reply-To: <20231013092934.GA13524@willie-the-truck> On Fri, Oct 13, 2023 at 10:29:35AM +0100, Will Deacon wrote: > On Thu, Oct 12, 2023 at 06:26:01PM +0100, Catalin Marinas wrote: > > On Thu, Oct 12, 2023 at 03:48:08PM +0100, Will Deacon wrote: > > > Claiming back the device also seems strange if the guest has been using > > > non-cacheable accesses since I think you could get write merging and > > > reordering with subsequent device accesses trying to reset the device. > > > > True. Not sure we have a good story here (maybe reinvent the DWB barrier ;)). > > We do have a good story for this part: use Device-nGnRE! Don't we actually need Device-nGnRnE for this, coupled with a DSB for endpoint completion? Device-nGnRE may be sufficient as a read from that device would ensure that the previous write is observable (potentially with a DMB if accessing separate device regions) but I don't think we do this now either. Even this, isn't it device-specific? I don't know enough about PCIe, posted writes, reordering, maybe others can shed some light. For Normal NC, if the access doesn't have side-effects (or rather the endpoint is memory-like), I think we are fine. The Stage 2 unmapping + TLBI + DSB (DVM + DVMSync) should ensure that a pending write by the CPU was pushed sufficiently far as not to affect subsequent writes by other CPUs. For I/O accesses that change some state of the device, I'm not sure the TLBI+DSB is sufficient. But I don't think Device nGnRE is either, only nE + DSB as long as the PCIe device plays along nicely. > Could we change these patches so that the memory type of the stage-1 VMA > in the VMM is reflected in the stage-2? In other words, continue to use > Device mappings at stage-2 for I/O but relax to Normal-NC if that's > how the VMM has it mapped? We've been through this and it's not feasible. The VMM does not have detailed knowledge of the BARs of the PCIe device it is mapping (and the prefetchable BAR attribute is useless). It may end up with a Normal mapping of a BAR with read side-effects. It's only the guest driver that knows all the details. The safest is for the VMM to keep it as Device (I think vfio-pci goes for the strongest nGnRnE). Yes, we end up with mismatched aliases but they only matter if the VMM also accesses the I/O range via its own mapping. So far I haven't seen case that suggests this. > > Things can go wrong but that's not because Device does anything better. > > Given the RAS implementation, external aborts caused on Device memory > > (e.g. wrong size access) is uncontainable. For Normal NC it can be > > contained (I can dig out the reasoning behind this if you want, IIUC > > something to do with not being able to cancel an already issued Device > > access since such accesses don't allow speculation due to side-effects; > > for Normal NC, it's just about the software not getting the data). > > I really think these details belong in the commit message. I guess another task for Lorenzo ;). > > > Obviously, it's up to Marc and Oliver if they want to do this, but I'm > > > wary without an official statement from Arm to say that Normal-NC is > > > correct. There's mention of such a statement in the cover letter: > > > > > > > We hope ARM will publish information helping platform designers > > > > follow these guidelines. > > > > > > but imo we shouldn't merge this without either: > > > > > > (a) _Architectural_ guidance (as opposed to some random whitepaper or > > > half-baked certification scheme). > > > > Well, you know the story, the architects will probably make it a SoC or > > integration issue, PCIe etc., not something that can live in the Arm > > ARM. The best we could get is more recommendations in the RAS spec > > around containment but not for things that might happen outside the CPU, > > e.g. PCIe root complex. > > The Arm ARM _does_ mention PCI config space when talking about early write > acknowledgement, so there's some precedence for providing guidance around > which memory types to use. Ah, yes, it looks like it does, though mostly around the config space. We could ask them to add some notes but I don't think we have the problem well defined yet. Trying to restate what we aim: the guest driver knows what attributes it needs and would set the appropriate attributes: Device or Normal. KVM's role is not to fix bugs in the guest driver by constraining the attributes but rather to avoid potential security issues with malicious (or buggy) guests: 1) triggering uncontained errors 2) accessing memory that it shouldn't (like the MTE tag access) 3) causing delayed side-effects after the host reclaims the device ... anything else? For (1), Normal NC vs. Device doesn't make any difference, slightly better for the former. (2) so far is solved by not allowing Cacheable (or disabling MTE, enabling FEAT_MTE_PERM in the future). I'm now trying to understand (3), I think it needs more digging. > > > (b) A concrete justification based on the current architecture as to > > > why Normal-NC is the right thing to do for KVM. > > > > To put it differently, we don't have any strong arguments why Device is > > the right thing to do. We chose Device based on some understanding > > software people had about how the hardware behaves, which apparently > > wasn't entirely correct (and summarised by Lorenzo). > > I think we use Device because that's what the host uses in its stage-1 > and mismatched aliases are bad. They are "constrained" bad ;). -- Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com> To: Will Deacon <will@kernel.org> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>, ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Subject: Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory Date: Fri, 13 Oct 2023 14:08:10 +0100 [thread overview] Message-ID: <ZSlBOiebenPKXBY4@arm.com> (raw) In-Reply-To: <20231013092934.GA13524@willie-the-truck> On Fri, Oct 13, 2023 at 10:29:35AM +0100, Will Deacon wrote: > On Thu, Oct 12, 2023 at 06:26:01PM +0100, Catalin Marinas wrote: > > On Thu, Oct 12, 2023 at 03:48:08PM +0100, Will Deacon wrote: > > > Claiming back the device also seems strange if the guest has been using > > > non-cacheable accesses since I think you could get write merging and > > > reordering with subsequent device accesses trying to reset the device. > > > > True. Not sure we have a good story here (maybe reinvent the DWB barrier ;)). > > We do have a good story for this part: use Device-nGnRE! Don't we actually need Device-nGnRnE for this, coupled with a DSB for endpoint completion? Device-nGnRE may be sufficient as a read from that device would ensure that the previous write is observable (potentially with a DMB if accessing separate device regions) but I don't think we do this now either. Even this, isn't it device-specific? I don't know enough about PCIe, posted writes, reordering, maybe others can shed some light. For Normal NC, if the access doesn't have side-effects (or rather the endpoint is memory-like), I think we are fine. The Stage 2 unmapping + TLBI + DSB (DVM + DVMSync) should ensure that a pending write by the CPU was pushed sufficiently far as not to affect subsequent writes by other CPUs. For I/O accesses that change some state of the device, I'm not sure the TLBI+DSB is sufficient. But I don't think Device nGnRE is either, only nE + DSB as long as the PCIe device plays along nicely. > Could we change these patches so that the memory type of the stage-1 VMA > in the VMM is reflected in the stage-2? In other words, continue to use > Device mappings at stage-2 for I/O but relax to Normal-NC if that's > how the VMM has it mapped? We've been through this and it's not feasible. The VMM does not have detailed knowledge of the BARs of the PCIe device it is mapping (and the prefetchable BAR attribute is useless). It may end up with a Normal mapping of a BAR with read side-effects. It's only the guest driver that knows all the details. The safest is for the VMM to keep it as Device (I think vfio-pci goes for the strongest nGnRnE). Yes, we end up with mismatched aliases but they only matter if the VMM also accesses the I/O range via its own mapping. So far I haven't seen case that suggests this. > > Things can go wrong but that's not because Device does anything better. > > Given the RAS implementation, external aborts caused on Device memory > > (e.g. wrong size access) is uncontainable. For Normal NC it can be > > contained (I can dig out the reasoning behind this if you want, IIUC > > something to do with not being able to cancel an already issued Device > > access since such accesses don't allow speculation due to side-effects; > > for Normal NC, it's just about the software not getting the data). > > I really think these details belong in the commit message. I guess another task for Lorenzo ;). > > > Obviously, it's up to Marc and Oliver if they want to do this, but I'm > > > wary without an official statement from Arm to say that Normal-NC is > > > correct. There's mention of such a statement in the cover letter: > > > > > > > We hope ARM will publish information helping platform designers > > > > follow these guidelines. > > > > > > but imo we shouldn't merge this without either: > > > > > > (a) _Architectural_ guidance (as opposed to some random whitepaper or > > > half-baked certification scheme). > > > > Well, you know the story, the architects will probably make it a SoC or > > integration issue, PCIe etc., not something that can live in the Arm > > ARM. The best we could get is more recommendations in the RAS spec > > around containment but not for things that might happen outside the CPU, > > e.g. PCIe root complex. > > The Arm ARM _does_ mention PCI config space when talking about early write > acknowledgement, so there's some precedence for providing guidance around > which memory types to use. Ah, yes, it looks like it does, though mostly around the config space. We could ask them to add some notes but I don't think we have the problem well defined yet. Trying to restate what we aim: the guest driver knows what attributes it needs and would set the appropriate attributes: Device or Normal. KVM's role is not to fix bugs in the guest driver by constraining the attributes but rather to avoid potential security issues with malicious (or buggy) guests: 1) triggering uncontained errors 2) accessing memory that it shouldn't (like the MTE tag access) 3) causing delayed side-effects after the host reclaims the device ... anything else? For (1), Normal NC vs. Device doesn't make any difference, slightly better for the former. (2) so far is solved by not allowing Cacheable (or disabling MTE, enabling FEAT_MTE_PERM in the future). I'm now trying to understand (3), I think it needs more digging. > > > (b) A concrete justification based on the current architecture as to > > > why Normal-NC is the right thing to do for KVM. > > > > To put it differently, we don't have any strong arguments why Device is > > the right thing to do. We chose Device based on some understanding > > software people had about how the hardware behaves, which apparently > > wasn't entirely correct (and summarised by Lorenzo). > > I think we use Device because that's what the host uses in its stage-1 > and mismatched aliases are bad. They are "constrained" bad ;). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-10-13 13:08 UTC|newest] Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-09-07 18:14 [PATCH v1 0/2] KVM: arm64: support write combining and cachable IO memory in VMs ankita 2023-09-07 18:14 ` ankita 2023-09-07 18:14 ` [PATCH v1 1/2] KVM: arm64: determine memory type from VMA ankita 2023-09-07 18:14 ` ankita 2023-09-07 19:12 ` Jason Gunthorpe 2023-09-07 19:12 ` Jason Gunthorpe 2023-10-05 16:15 ` Catalin Marinas 2023-10-05 16:15 ` Catalin Marinas 2023-10-05 16:54 ` Jason Gunthorpe 2023-10-05 16:54 ` Jason Gunthorpe 2023-10-10 14:25 ` Catalin Marinas 2023-10-10 14:25 ` Catalin Marinas 2023-10-10 15:05 ` Jason Gunthorpe 2023-10-10 15:05 ` Jason Gunthorpe 2023-10-10 17:19 ` Catalin Marinas 2023-10-10 17:19 ` Catalin Marinas 2023-10-10 18:23 ` Jason Gunthorpe 2023-10-10 18:23 ` Jason Gunthorpe 2023-10-11 17:45 ` Catalin Marinas 2023-10-11 17:45 ` Catalin Marinas 2023-10-11 18:38 ` Jason Gunthorpe 2023-10-11 18:38 ` Jason Gunthorpe 2023-10-12 16:16 ` Catalin Marinas 2023-10-12 16:16 ` Catalin Marinas 2024-03-10 3:49 ` Ankit Agrawal 2024-03-10 3:49 ` Ankit Agrawal 2024-03-19 13:38 ` Jason Gunthorpe 2024-03-19 13:38 ` Jason Gunthorpe 2023-10-23 13:20 ` Shameerali Kolothum Thodi 2023-10-23 13:20 ` Shameerali Kolothum Thodi 2023-09-07 18:14 ` [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory ankita 2023-09-07 18:14 ` ankita 2023-09-08 16:40 ` Catalin Marinas 2023-09-08 16:40 ` Catalin Marinas 2023-09-11 14:57 ` Lorenzo Pieralisi 2023-09-11 14:57 ` Lorenzo Pieralisi 2023-09-11 17:20 ` Jason Gunthorpe 2023-09-11 17:20 ` Jason Gunthorpe 2023-09-13 15:26 ` Lorenzo Pieralisi 2023-09-13 15:26 ` Lorenzo Pieralisi 2023-09-13 18:54 ` Jason Gunthorpe 2023-09-13 18:54 ` Jason Gunthorpe 2023-09-26 8:31 ` Lorenzo Pieralisi 2023-09-26 8:31 ` Lorenzo Pieralisi 2023-09-26 12:25 ` Jason Gunthorpe 2023-09-26 12:25 ` Jason Gunthorpe 2023-09-26 13:52 ` Catalin Marinas 2023-09-26 13:52 ` Catalin Marinas 2023-09-26 16:12 ` Lorenzo Pieralisi 2023-09-26 16:12 ` Lorenzo Pieralisi 2023-10-05 9:56 ` Lorenzo Pieralisi 2023-10-05 9:56 ` Lorenzo Pieralisi 2023-10-05 11:56 ` Jason Gunthorpe 2023-10-05 11:56 ` Jason Gunthorpe 2023-10-05 14:08 ` Lorenzo Pieralisi 2023-10-05 14:08 ` Lorenzo Pieralisi 2023-10-12 12:35 ` Will Deacon 2023-10-12 12:35 ` Will Deacon 2023-10-12 13:20 ` Jason Gunthorpe 2023-10-12 13:20 ` Jason Gunthorpe 2023-10-12 14:29 ` Lorenzo Pieralisi 2023-10-12 14:29 ` Lorenzo Pieralisi 2023-10-12 13:53 ` Catalin Marinas 2023-10-12 13:53 ` Catalin Marinas 2023-10-12 14:48 ` Will Deacon 2023-10-12 14:48 ` Will Deacon 2023-10-12 15:44 ` Jason Gunthorpe 2023-10-12 15:44 ` Jason Gunthorpe 2023-10-12 16:39 ` Will Deacon 2023-10-12 16:39 ` Will Deacon 2023-10-12 18:36 ` Jason Gunthorpe 2023-10-12 18:36 ` Jason Gunthorpe 2023-10-13 9:29 ` Will Deacon 2023-10-13 9:29 ` Will Deacon 2023-10-12 17:26 ` Catalin Marinas 2023-10-12 17:26 ` Catalin Marinas 2023-10-13 9:29 ` Will Deacon 2023-10-13 9:29 ` Will Deacon 2023-10-13 13:08 ` Catalin Marinas [this message] 2023-10-13 13:08 ` Catalin Marinas 2023-10-13 13:45 ` Jason Gunthorpe 2023-10-13 13:45 ` Jason Gunthorpe 2023-10-19 11:07 ` Catalin Marinas 2023-10-19 11:07 ` Catalin Marinas 2023-10-19 11:51 ` Jason Gunthorpe 2023-10-19 11:51 ` Jason Gunthorpe 2023-10-20 11:21 ` Catalin Marinas 2023-10-20 11:21 ` Catalin Marinas 2023-10-20 11:47 ` Jason Gunthorpe 2023-10-20 11:47 ` Jason Gunthorpe 2023-10-20 14:03 ` Lorenzo Pieralisi 2023-10-20 14:03 ` Lorenzo Pieralisi 2023-10-20 14:28 ` Jason Gunthorpe 2023-10-20 14:28 ` Jason Gunthorpe 2023-10-19 13:35 ` Lorenzo Pieralisi 2023-10-19 13:35 ` Lorenzo Pieralisi 2023-10-13 15:28 ` Lorenzo Pieralisi 2023-10-13 15:28 ` Lorenzo Pieralisi 2023-10-19 11:12 ` Catalin Marinas 2023-10-19 11:12 ` Catalin Marinas 2023-11-09 15:34 ` Lorenzo Pieralisi 2023-11-09 15:34 ` Lorenzo Pieralisi 2023-11-10 14:26 ` Jason Gunthorpe 2023-11-10 14:26 ` Jason Gunthorpe 2023-11-13 0:42 ` Lorenzo Pieralisi 2023-11-13 0:42 ` Lorenzo Pieralisi 2023-11-13 17:41 ` Catalin Marinas 2023-11-13 17:41 ` Catalin Marinas 2023-10-12 12:27 ` Will Deacon 2023-10-12 12:27 ` Will Deacon
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ZSlBOiebenPKXBY4@arm.com \ --to=catalin.marinas@arm.com \ --cc=acurrid@nvidia.com \ --cc=aniketa@nvidia.com \ --cc=ankita@nvidia.com \ --cc=apopple@nvidia.com \ --cc=cjia@nvidia.com \ --cc=danw@nvidia.com \ --cc=jgg@nvidia.com \ --cc=jhubbard@nvidia.com \ --cc=kvmarm@lists.linux.dev \ --cc=kwankhede@nvidia.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lpieralisi@kernel.org \ --cc=maz@kernel.org \ --cc=oliver.upton@linux.dev \ --cc=targupta@nvidia.com \ --cc=vsethi@nvidia.com \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.