From: Catalin Marinas <catalin.marinas@arm.com> To: Will Deacon <will@kernel.org> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>, ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Subject: Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory Date: Thu, 12 Oct 2023 18:26:01 +0100 [thread overview] Message-ID: <ZSgsKSCv-zWgtWkm@arm.com> (raw) In-Reply-To: <20231012144807.GA12374@willie-the-truck> On Thu, Oct 12, 2023 at 03:48:08PM +0100, Will Deacon wrote: > On Thu, Oct 12, 2023 at 02:53:21PM +0100, Catalin Marinas wrote: > > On Thu, Oct 12, 2023 at 01:35:41PM +0100, Will Deacon wrote: > > > On Thu, Oct 05, 2023 at 11:56:55AM +0200, Lorenzo Pieralisi wrote: > > > > For all these reasons, relax the KVM stage 2 device > > > > memory attributes from DEVICE_nGnRE to NormalNC. > > > > > > The reasoning above suggests to me that this should probably just be > > > Normal cacheable, as that is what actually allows the guest to control > > > the attributes. So what is the rationale behind stopping at Normal-NC? > > > > It's more like we don't have any clue on what may happen. MTE is > > obviously a case where it can go wrong (we can blame the architecture > > design here) but I recall years ago where a malicious guest could bring > > the platform down by mapping the GIC CPU interface as cacheable. > > ... and do we know that isn't the case for non-cacheable? If not, why not? Trying to get this information from the hw folk and architects is really hard. So we only relax it one step at a time ;). But given the MTE problems, I'd not go for cacheable Stage 2 unless we have FEAT_MTE_PERM implemented (both hw and sw). S2 cacheable allows the guest to map it as Normal Tagged. > Also, are you saying we used to map the GIC CPU interface as cacheable > at stage-2? I remember exclusives causing a problem, but I don't remember > the guest having a cacheable mapping. The guest never had a cacheable mapping, IIRC it was more of a theoretical problem, plugging a hole. Now, maybe I misremember, it's pretty hard to search the git logs given how the code was moved around (but I do remember the building we were in when discussing this, it was on the ground floor ;)). > > Not sure how error containment works with cacheable memory. A cacheable > > access to a device may stay in the cache a lot longer after the guest > > has been scheduled out, only evicted at some random time. > > But similarly, non-cacheable stores can be buffered. Why isn't that a > problem? RAS might track this for cacheable mappings as well, I just haven't figured out the details. > > We may no longer be able to associate it with the guest, especially if the > > guest exited. Also not sure about claiming back the device after killing > > the guest, do we need cache maintenance? > > Claiming back the device also seems strange if the guest has been using > non-cacheable accesses since I think you could get write merging and > reordering with subsequent device accesses trying to reset the device. True. Not sure we have a good story here (maybe reinvent the DWB barrier ;)). > > So, for now I'd only relax this if we know there's RAM(-like) on the > > other side and won't trigger some potentially uncontainable errors as a > > result. > > I guess my wider point is that I'm not convinced that non-cacheable is > actually much better and I think we're going way off the deep end looking > at what particular implementations do and trying to justify to ourselves > that non-cacheable is safe, even though it's still a normal memory type > at the end of the day. Is this about Device vs NC or Device/NC vs Normal Cacheable? The justification for the former has been summarised in Lorenzo's write-up. How the hardware behaves, it depends a lot on the RAS implementation. The BSA has some statements but not sure it covers everything. Things can go wrong but that's not because Device does anything better. Given the RAS implementation, external aborts caused on Device memory (e.g. wrong size access) is uncontainable. For Normal NC it can be contained (I can dig out the reasoning behind this if you want, IIUC something to do with not being able to cancel an already issued Device access since such accesses don't allow speculation due to side-effects; for Normal NC, it's just about the software not getting the data). > Obviously, it's up to Marc and Oliver if they want to do this, but I'm > wary without an official statement from Arm to say that Normal-NC is > correct. There's mention of such a statement in the cover letter: > > > We hope ARM will publish information helping platform designers > > follow these guidelines. > > but imo we shouldn't merge this without either: > > (a) _Architectural_ guidance (as opposed to some random whitepaper or > half-baked certification scheme). Well, you know the story, the architects will probably make it a SoC or integration issue, PCIe etc., not something that can live in the Arm ARM. The best we could get is more recommendations in the RAS spec around containment but not for things that might happen outside the CPU, e.g. PCIe root complex. > - or - > > (b) A concrete justification based on the current architecture as to > why Normal-NC is the right thing to do for KVM. To put it differently, we don't have any strong arguments why Device is the right thing to do. We chose Device based on some understanding software people had about how the hardware behaves, which apparently wasn't entirely correct (and summarised by Lorenzo). -- Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com> To: Will Deacon <will@kernel.org> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>, ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Subject: Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory Date: Thu, 12 Oct 2023 18:26:01 +0100 [thread overview] Message-ID: <ZSgsKSCv-zWgtWkm@arm.com> (raw) In-Reply-To: <20231012144807.GA12374@willie-the-truck> On Thu, Oct 12, 2023 at 03:48:08PM +0100, Will Deacon wrote: > On Thu, Oct 12, 2023 at 02:53:21PM +0100, Catalin Marinas wrote: > > On Thu, Oct 12, 2023 at 01:35:41PM +0100, Will Deacon wrote: > > > On Thu, Oct 05, 2023 at 11:56:55AM +0200, Lorenzo Pieralisi wrote: > > > > For all these reasons, relax the KVM stage 2 device > > > > memory attributes from DEVICE_nGnRE to NormalNC. > > > > > > The reasoning above suggests to me that this should probably just be > > > Normal cacheable, as that is what actually allows the guest to control > > > the attributes. So what is the rationale behind stopping at Normal-NC? > > > > It's more like we don't have any clue on what may happen. MTE is > > obviously a case where it can go wrong (we can blame the architecture > > design here) but I recall years ago where a malicious guest could bring > > the platform down by mapping the GIC CPU interface as cacheable. > > ... and do we know that isn't the case for non-cacheable? If not, why not? Trying to get this information from the hw folk and architects is really hard. So we only relax it one step at a time ;). But given the MTE problems, I'd not go for cacheable Stage 2 unless we have FEAT_MTE_PERM implemented (both hw and sw). S2 cacheable allows the guest to map it as Normal Tagged. > Also, are you saying we used to map the GIC CPU interface as cacheable > at stage-2? I remember exclusives causing a problem, but I don't remember > the guest having a cacheable mapping. The guest never had a cacheable mapping, IIRC it was more of a theoretical problem, plugging a hole. Now, maybe I misremember, it's pretty hard to search the git logs given how the code was moved around (but I do remember the building we were in when discussing this, it was on the ground floor ;)). > > Not sure how error containment works with cacheable memory. A cacheable > > access to a device may stay in the cache a lot longer after the guest > > has been scheduled out, only evicted at some random time. > > But similarly, non-cacheable stores can be buffered. Why isn't that a > problem? RAS might track this for cacheable mappings as well, I just haven't figured out the details. > > We may no longer be able to associate it with the guest, especially if the > > guest exited. Also not sure about claiming back the device after killing > > the guest, do we need cache maintenance? > > Claiming back the device also seems strange if the guest has been using > non-cacheable accesses since I think you could get write merging and > reordering with subsequent device accesses trying to reset the device. True. Not sure we have a good story here (maybe reinvent the DWB barrier ;)). > > So, for now I'd only relax this if we know there's RAM(-like) on the > > other side and won't trigger some potentially uncontainable errors as a > > result. > > I guess my wider point is that I'm not convinced that non-cacheable is > actually much better and I think we're going way off the deep end looking > at what particular implementations do and trying to justify to ourselves > that non-cacheable is safe, even though it's still a normal memory type > at the end of the day. Is this about Device vs NC or Device/NC vs Normal Cacheable? The justification for the former has been summarised in Lorenzo's write-up. How the hardware behaves, it depends a lot on the RAS implementation. The BSA has some statements but not sure it covers everything. Things can go wrong but that's not because Device does anything better. Given the RAS implementation, external aborts caused on Device memory (e.g. wrong size access) is uncontainable. For Normal NC it can be contained (I can dig out the reasoning behind this if you want, IIUC something to do with not being able to cancel an already issued Device access since such accesses don't allow speculation due to side-effects; for Normal NC, it's just about the software not getting the data). > Obviously, it's up to Marc and Oliver if they want to do this, but I'm > wary without an official statement from Arm to say that Normal-NC is > correct. There's mention of such a statement in the cover letter: > > > We hope ARM will publish information helping platform designers > > follow these guidelines. > > but imo we shouldn't merge this without either: > > (a) _Architectural_ guidance (as opposed to some random whitepaper or > half-baked certification scheme). Well, you know the story, the architects will probably make it a SoC or integration issue, PCIe etc., not something that can live in the Arm ARM. The best we could get is more recommendations in the RAS spec around containment but not for things that might happen outside the CPU, e.g. PCIe root complex. > - or - > > (b) A concrete justification based on the current architecture as to > why Normal-NC is the right thing to do for KVM. To put it differently, we don't have any strong arguments why Device is the right thing to do. We chose Device based on some understanding software people had about how the hardware behaves, which apparently wasn't entirely correct (and summarised by Lorenzo). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-10-12 17:37 UTC|newest] Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-09-07 18:14 [PATCH v1 0/2] KVM: arm64: support write combining and cachable IO memory in VMs ankita 2023-09-07 18:14 ` ankita 2023-09-07 18:14 ` [PATCH v1 1/2] KVM: arm64: determine memory type from VMA ankita 2023-09-07 18:14 ` ankita 2023-09-07 19:12 ` Jason Gunthorpe 2023-09-07 19:12 ` Jason Gunthorpe 2023-10-05 16:15 ` Catalin Marinas 2023-10-05 16:15 ` Catalin Marinas 2023-10-05 16:54 ` Jason Gunthorpe 2023-10-05 16:54 ` Jason Gunthorpe 2023-10-10 14:25 ` Catalin Marinas 2023-10-10 14:25 ` Catalin Marinas 2023-10-10 15:05 ` Jason Gunthorpe 2023-10-10 15:05 ` Jason Gunthorpe 2023-10-10 17:19 ` Catalin Marinas 2023-10-10 17:19 ` Catalin Marinas 2023-10-10 18:23 ` Jason Gunthorpe 2023-10-10 18:23 ` Jason Gunthorpe 2023-10-11 17:45 ` Catalin Marinas 2023-10-11 17:45 ` Catalin Marinas 2023-10-11 18:38 ` Jason Gunthorpe 2023-10-11 18:38 ` Jason Gunthorpe 2023-10-12 16:16 ` Catalin Marinas 2023-10-12 16:16 ` Catalin Marinas 2024-03-10 3:49 ` Ankit Agrawal 2024-03-10 3:49 ` Ankit Agrawal 2024-03-19 13:38 ` Jason Gunthorpe 2024-03-19 13:38 ` Jason Gunthorpe 2023-10-23 13:20 ` Shameerali Kolothum Thodi 2023-10-23 13:20 ` Shameerali Kolothum Thodi 2023-09-07 18:14 ` [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory ankita 2023-09-07 18:14 ` ankita 2023-09-08 16:40 ` Catalin Marinas 2023-09-08 16:40 ` Catalin Marinas 2023-09-11 14:57 ` Lorenzo Pieralisi 2023-09-11 14:57 ` Lorenzo Pieralisi 2023-09-11 17:20 ` Jason Gunthorpe 2023-09-11 17:20 ` Jason Gunthorpe 2023-09-13 15:26 ` Lorenzo Pieralisi 2023-09-13 15:26 ` Lorenzo Pieralisi 2023-09-13 18:54 ` Jason Gunthorpe 2023-09-13 18:54 ` Jason Gunthorpe 2023-09-26 8:31 ` Lorenzo Pieralisi 2023-09-26 8:31 ` Lorenzo Pieralisi 2023-09-26 12:25 ` Jason Gunthorpe 2023-09-26 12:25 ` Jason Gunthorpe 2023-09-26 13:52 ` Catalin Marinas 2023-09-26 13:52 ` Catalin Marinas 2023-09-26 16:12 ` Lorenzo Pieralisi 2023-09-26 16:12 ` Lorenzo Pieralisi 2023-10-05 9:56 ` Lorenzo Pieralisi 2023-10-05 9:56 ` Lorenzo Pieralisi 2023-10-05 11:56 ` Jason Gunthorpe 2023-10-05 11:56 ` Jason Gunthorpe 2023-10-05 14:08 ` Lorenzo Pieralisi 2023-10-05 14:08 ` Lorenzo Pieralisi 2023-10-12 12:35 ` Will Deacon 2023-10-12 12:35 ` Will Deacon 2023-10-12 13:20 ` Jason Gunthorpe 2023-10-12 13:20 ` Jason Gunthorpe 2023-10-12 14:29 ` Lorenzo Pieralisi 2023-10-12 14:29 ` Lorenzo Pieralisi 2023-10-12 13:53 ` Catalin Marinas 2023-10-12 13:53 ` Catalin Marinas 2023-10-12 14:48 ` Will Deacon 2023-10-12 14:48 ` Will Deacon 2023-10-12 15:44 ` Jason Gunthorpe 2023-10-12 15:44 ` Jason Gunthorpe 2023-10-12 16:39 ` Will Deacon 2023-10-12 16:39 ` Will Deacon 2023-10-12 18:36 ` Jason Gunthorpe 2023-10-12 18:36 ` Jason Gunthorpe 2023-10-13 9:29 ` Will Deacon 2023-10-13 9:29 ` Will Deacon 2023-10-12 17:26 ` Catalin Marinas [this message] 2023-10-12 17:26 ` Catalin Marinas 2023-10-13 9:29 ` Will Deacon 2023-10-13 9:29 ` Will Deacon 2023-10-13 13:08 ` Catalin Marinas 2023-10-13 13:08 ` Catalin Marinas 2023-10-13 13:45 ` Jason Gunthorpe 2023-10-13 13:45 ` Jason Gunthorpe 2023-10-19 11:07 ` Catalin Marinas 2023-10-19 11:07 ` Catalin Marinas 2023-10-19 11:51 ` Jason Gunthorpe 2023-10-19 11:51 ` Jason Gunthorpe 2023-10-20 11:21 ` Catalin Marinas 2023-10-20 11:21 ` Catalin Marinas 2023-10-20 11:47 ` Jason Gunthorpe 2023-10-20 11:47 ` Jason Gunthorpe 2023-10-20 14:03 ` Lorenzo Pieralisi 2023-10-20 14:03 ` Lorenzo Pieralisi 2023-10-20 14:28 ` Jason Gunthorpe 2023-10-20 14:28 ` Jason Gunthorpe 2023-10-19 13:35 ` Lorenzo Pieralisi 2023-10-19 13:35 ` Lorenzo Pieralisi 2023-10-13 15:28 ` Lorenzo Pieralisi 2023-10-13 15:28 ` Lorenzo Pieralisi 2023-10-19 11:12 ` Catalin Marinas 2023-10-19 11:12 ` Catalin Marinas 2023-11-09 15:34 ` Lorenzo Pieralisi 2023-11-09 15:34 ` Lorenzo Pieralisi 2023-11-10 14:26 ` Jason Gunthorpe 2023-11-10 14:26 ` Jason Gunthorpe 2023-11-13 0:42 ` Lorenzo Pieralisi 2023-11-13 0:42 ` Lorenzo Pieralisi 2023-11-13 17:41 ` Catalin Marinas 2023-11-13 17:41 ` Catalin Marinas 2023-10-12 12:27 ` Will Deacon 2023-10-12 12:27 ` Will Deacon
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ZSgsKSCv-zWgtWkm@arm.com \ --to=catalin.marinas@arm.com \ --cc=acurrid@nvidia.com \ --cc=aniketa@nvidia.com \ --cc=ankita@nvidia.com \ --cc=apopple@nvidia.com \ --cc=cjia@nvidia.com \ --cc=danw@nvidia.com \ --cc=jgg@nvidia.com \ --cc=jhubbard@nvidia.com \ --cc=kvmarm@lists.linux.dev \ --cc=kwankhede@nvidia.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lpieralisi@kernel.org \ --cc=maz@kernel.org \ --cc=oliver.upton@linux.dev \ --cc=targupta@nvidia.com \ --cc=vsethi@nvidia.com \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.