All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev,
	will@kernel.org, aniketa@nvidia.com, cjia@nvidia.com,
	kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com,
	acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com,
	danw@nvidia.com, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] KVM: arm64: determine memory type from VMA
Date: Tue, 10 Oct 2023 15:25:22 +0100	[thread overview]
Message-ID: <ZSVe0nb02S4kz5D2@arm.com> (raw)
In-Reply-To: <20231005165458.GM682044@nvidia.com>

On Thu, Oct 05, 2023 at 01:54:58PM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 05, 2023 at 05:15:37PM +0100, Catalin Marinas wrote:
> > On Thu, Sep 07, 2023 at 11:14:58AM -0700, ankita@nvidia.com wrote:
> > > From: Ankit Agrawal <ankita@nvidia.com>
> > > Currently KVM determines if a VMA is pointing at IO memory by checking
> > > pfn_is_map_memory(). However, the MM already gives us a way to tell what
> > > kind of memory it is by inspecting the VMA.
> > 
> > Well, it doesn't. It tells us what attributes the user mapped that
> > memory with, not whether it's I/O memory or standard RAM.
> 
> There is VM_IO which is intended to be used for address space with
> side effects.
> 
> And there is VM_PFNMAP which is intended to be used for address space
> without struct page (IO or not)
> 
> And finally we have the pgprot bit which define the cachability.
> 
> Do you have a definition of IO memory that those three things don't
> cover?
> 
> I would propose that, for KVM's purpose, IO memory is marked with
> VM_IO or a non-cachable pgprot
> 
> And "standard RAM" is defined by a cachable pgprot. Linux never makes
> something that is VM_IO cachable.

I think we can safely set a stage 2 Normal NC for a vma with pgprot
other than MT_NORMAL or MT_NORMAL_TAGGED. But the other way around is
not that simple. Just because the VMM was allowed to map it as cacheable
does not mean that it supports all the CPU features. One example is MTE
where we can only guarantee that the RAM given to the OS at boot
supports tagged accesses. I've seen something similar in the past with
LSE atomics (or was it exclusives?) not being propagated. These don't
make the memory safe for a guest to use as general purpose RAM.

I don't have a nice solution, it looks more like the host kernel being
able to trust what the VMM maps and gives to a guest (or we map
everything as Device at Stage 2 as we currently do). An alternative
would be for the host to know which physical address ranges support
which attributes and ignore the vma but not sure we have such
information in the ACPI tables (we can make something up for DT).

> > > Unfortunately when FWB is not enabled, the kernel expects to naively do
> > > cache management by flushing the memory using an address in the
> > > kernel's map. This does not work in several of the newly allowed
> > > cases such as dcache_clean_inval_poc(). Check whether the targeted pfn
> > > and its mapping KVA is valid in case the FWB is absent before continuing.
> > 
> > I would only allow cacheable stage 2 mappings if FWB is enabled.
> > Otherwise we end up with a mismatch between the VMM mapping and whatever
> > the guest may do.
> 
> Does it need to be stronger? If FWB is disabled and the cache flush
> works then what is the issue?

I was thinking more about keeping things simpler and avoid any lack of
coherence between the VMM and the VM, in case the latter maps it as
Normal NC. But if the VMM doesn't touch it, the initial cache
maintenance by the KVM would do.

> I think there are two issues here. 
> 
> 1) KVM uses pfn_is_map_memory() which does not cover all our modern
> NUMA and memory hotplug cases for normal struct page backed cachable
> memory.
> 
> 2) KVM doesn't work with normal cachable memory that does not have
> struct pages.
> 
> For 1 the test should be 'does the pfn have a struct page, does the
> struct page refer to cachable memory?'
> 
> For 2 the test should be 'does the VMA have pgprot = cachable,
> VM_PFNMAP and not VM_IO (both implied)'

See above on the characteristics of the memory. If some of them are not
supported, it's probably fine (atomics not working) but others like MTE
accesses could either cause external aborts or access random addresses
from elsewhere. Currently we rely on pfn_is_map_memory() for this but we
need a way to tell that other ranges outside the initial RAM supports
all features. IOW, is any of this memory (mapped as cacheable in the
VMM) special purpose with only a subset of the CPU features supported?

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: ankita@nvidia.com, maz@kernel.org, oliver.upton@linux.dev,
	will@kernel.org, aniketa@nvidia.com, cjia@nvidia.com,
	kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com,
	acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com,
	danw@nvidia.com, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] KVM: arm64: determine memory type from VMA
Date: Tue, 10 Oct 2023 15:25:22 +0100	[thread overview]
Message-ID: <ZSVe0nb02S4kz5D2@arm.com> (raw)
In-Reply-To: <20231005165458.GM682044@nvidia.com>

On Thu, Oct 05, 2023 at 01:54:58PM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 05, 2023 at 05:15:37PM +0100, Catalin Marinas wrote:
> > On Thu, Sep 07, 2023 at 11:14:58AM -0700, ankita@nvidia.com wrote:
> > > From: Ankit Agrawal <ankita@nvidia.com>
> > > Currently KVM determines if a VMA is pointing at IO memory by checking
> > > pfn_is_map_memory(). However, the MM already gives us a way to tell what
> > > kind of memory it is by inspecting the VMA.
> > 
> > Well, it doesn't. It tells us what attributes the user mapped that
> > memory with, not whether it's I/O memory or standard RAM.
> 
> There is VM_IO which is intended to be used for address space with
> side effects.
> 
> And there is VM_PFNMAP which is intended to be used for address space
> without struct page (IO or not)
> 
> And finally we have the pgprot bit which define the cachability.
> 
> Do you have a definition of IO memory that those three things don't
> cover?
> 
> I would propose that, for KVM's purpose, IO memory is marked with
> VM_IO or a non-cachable pgprot
> 
> And "standard RAM" is defined by a cachable pgprot. Linux never makes
> something that is VM_IO cachable.

I think we can safely set a stage 2 Normal NC for a vma with pgprot
other than MT_NORMAL or MT_NORMAL_TAGGED. But the other way around is
not that simple. Just because the VMM was allowed to map it as cacheable
does not mean that it supports all the CPU features. One example is MTE
where we can only guarantee that the RAM given to the OS at boot
supports tagged accesses. I've seen something similar in the past with
LSE atomics (or was it exclusives?) not being propagated. These don't
make the memory safe for a guest to use as general purpose RAM.

I don't have a nice solution, it looks more like the host kernel being
able to trust what the VMM maps and gives to a guest (or we map
everything as Device at Stage 2 as we currently do). An alternative
would be for the host to know which physical address ranges support
which attributes and ignore the vma but not sure we have such
information in the ACPI tables (we can make something up for DT).

> > > Unfortunately when FWB is not enabled, the kernel expects to naively do
> > > cache management by flushing the memory using an address in the
> > > kernel's map. This does not work in several of the newly allowed
> > > cases such as dcache_clean_inval_poc(). Check whether the targeted pfn
> > > and its mapping KVA is valid in case the FWB is absent before continuing.
> > 
> > I would only allow cacheable stage 2 mappings if FWB is enabled.
> > Otherwise we end up with a mismatch between the VMM mapping and whatever
> > the guest may do.
> 
> Does it need to be stronger? If FWB is disabled and the cache flush
> works then what is the issue?

I was thinking more about keeping things simpler and avoid any lack of
coherence between the VMM and the VM, in case the latter maps it as
Normal NC. But if the VMM doesn't touch it, the initial cache
maintenance by the KVM would do.

> I think there are two issues here. 
> 
> 1) KVM uses pfn_is_map_memory() which does not cover all our modern
> NUMA and memory hotplug cases for normal struct page backed cachable
> memory.
> 
> 2) KVM doesn't work with normal cachable memory that does not have
> struct pages.
> 
> For 1 the test should be 'does the pfn have a struct page, does the
> struct page refer to cachable memory?'
> 
> For 2 the test should be 'does the VMA have pgprot = cachable,
> VM_PFNMAP and not VM_IO (both implied)'

See above on the characteristics of the memory. If some of them are not
supported, it's probably fine (atomics not working) but others like MTE
accesses could either cause external aborts or access random addresses
from elsewhere. Currently we rely on pfn_is_map_memory() for this but we
need a way to tell that other ranges outside the initial RAM supports
all features. IOW, is any of this memory (mapped as cacheable in the
VMM) special purpose with only a subset of the CPU features supported?

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-10-10 14:25 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-07 18:14 [PATCH v1 0/2] KVM: arm64: support write combining and cachable IO memory in VMs ankita
2023-09-07 18:14 ` ankita
2023-09-07 18:14 ` [PATCH v1 1/2] KVM: arm64: determine memory type from VMA ankita
2023-09-07 18:14   ` ankita
2023-09-07 19:12   ` Jason Gunthorpe
2023-09-07 19:12     ` Jason Gunthorpe
2023-10-05 16:15   ` Catalin Marinas
2023-10-05 16:15     ` Catalin Marinas
2023-10-05 16:54     ` Jason Gunthorpe
2023-10-05 16:54       ` Jason Gunthorpe
2023-10-10 14:25       ` Catalin Marinas [this message]
2023-10-10 14:25         ` Catalin Marinas
2023-10-10 15:05         ` Jason Gunthorpe
2023-10-10 15:05           ` Jason Gunthorpe
2023-10-10 17:19           ` Catalin Marinas
2023-10-10 17:19             ` Catalin Marinas
2023-10-10 18:23             ` Jason Gunthorpe
2023-10-10 18:23               ` Jason Gunthorpe
2023-10-11 17:45               ` Catalin Marinas
2023-10-11 17:45                 ` Catalin Marinas
2023-10-11 18:38                 ` Jason Gunthorpe
2023-10-11 18:38                   ` Jason Gunthorpe
2023-10-12 16:16                   ` Catalin Marinas
2023-10-12 16:16                     ` Catalin Marinas
2024-03-10  3:49                     ` Ankit Agrawal
2024-03-10  3:49                       ` Ankit Agrawal
2024-03-19 13:38                       ` Jason Gunthorpe
2024-03-19 13:38                         ` Jason Gunthorpe
2023-10-23 13:20   ` Shameerali Kolothum Thodi
2023-10-23 13:20     ` Shameerali Kolothum Thodi
2023-09-07 18:14 ` [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory ankita
2023-09-07 18:14   ` ankita
2023-09-08 16:40   ` Catalin Marinas
2023-09-08 16:40     ` Catalin Marinas
2023-09-11 14:57   ` Lorenzo Pieralisi
2023-09-11 14:57     ` Lorenzo Pieralisi
2023-09-11 17:20     ` Jason Gunthorpe
2023-09-11 17:20       ` Jason Gunthorpe
2023-09-13 15:26       ` Lorenzo Pieralisi
2023-09-13 15:26         ` Lorenzo Pieralisi
2023-09-13 18:54         ` Jason Gunthorpe
2023-09-13 18:54           ` Jason Gunthorpe
2023-09-26  8:31           ` Lorenzo Pieralisi
2023-09-26  8:31             ` Lorenzo Pieralisi
2023-09-26 12:25             ` Jason Gunthorpe
2023-09-26 12:25               ` Jason Gunthorpe
2023-09-26 13:52             ` Catalin Marinas
2023-09-26 13:52               ` Catalin Marinas
2023-09-26 16:12               ` Lorenzo Pieralisi
2023-09-26 16:12                 ` Lorenzo Pieralisi
2023-10-05  9:56               ` Lorenzo Pieralisi
2023-10-05  9:56                 ` Lorenzo Pieralisi
2023-10-05 11:56                 ` Jason Gunthorpe
2023-10-05 11:56                   ` Jason Gunthorpe
2023-10-05 14:08                   ` Lorenzo Pieralisi
2023-10-05 14:08                     ` Lorenzo Pieralisi
2023-10-12 12:35                 ` Will Deacon
2023-10-12 12:35                   ` Will Deacon
2023-10-12 13:20                   ` Jason Gunthorpe
2023-10-12 13:20                     ` Jason Gunthorpe
2023-10-12 14:29                     ` Lorenzo Pieralisi
2023-10-12 14:29                       ` Lorenzo Pieralisi
2023-10-12 13:53                   ` Catalin Marinas
2023-10-12 13:53                     ` Catalin Marinas
2023-10-12 14:48                     ` Will Deacon
2023-10-12 14:48                       ` Will Deacon
2023-10-12 15:44                       ` Jason Gunthorpe
2023-10-12 15:44                         ` Jason Gunthorpe
2023-10-12 16:39                         ` Will Deacon
2023-10-12 16:39                           ` Will Deacon
2023-10-12 18:36                           ` Jason Gunthorpe
2023-10-12 18:36                             ` Jason Gunthorpe
2023-10-13  9:29                             ` Will Deacon
2023-10-13  9:29                               ` Will Deacon
2023-10-12 17:26                       ` Catalin Marinas
2023-10-12 17:26                         ` Catalin Marinas
2023-10-13  9:29                         ` Will Deacon
2023-10-13  9:29                           ` Will Deacon
2023-10-13 13:08                           ` Catalin Marinas
2023-10-13 13:08                             ` Catalin Marinas
2023-10-13 13:45                             ` Jason Gunthorpe
2023-10-13 13:45                               ` Jason Gunthorpe
2023-10-19 11:07                               ` Catalin Marinas
2023-10-19 11:07                                 ` Catalin Marinas
2023-10-19 11:51                                 ` Jason Gunthorpe
2023-10-19 11:51                                   ` Jason Gunthorpe
2023-10-20 11:21                                   ` Catalin Marinas
2023-10-20 11:21                                     ` Catalin Marinas
2023-10-20 11:47                                     ` Jason Gunthorpe
2023-10-20 11:47                                       ` Jason Gunthorpe
2023-10-20 14:03                                       ` Lorenzo Pieralisi
2023-10-20 14:03                                         ` Lorenzo Pieralisi
2023-10-20 14:28                                         ` Jason Gunthorpe
2023-10-20 14:28                                           ` Jason Gunthorpe
2023-10-19 13:35                                 ` Lorenzo Pieralisi
2023-10-19 13:35                                   ` Lorenzo Pieralisi
2023-10-13 15:28                             ` Lorenzo Pieralisi
2023-10-13 15:28                               ` Lorenzo Pieralisi
2023-10-19 11:12                               ` Catalin Marinas
2023-10-19 11:12                                 ` Catalin Marinas
2023-11-09 15:34                             ` Lorenzo Pieralisi
2023-11-09 15:34                               ` Lorenzo Pieralisi
2023-11-10 14:26                               ` Jason Gunthorpe
2023-11-10 14:26                                 ` Jason Gunthorpe
2023-11-13  0:42                                 ` Lorenzo Pieralisi
2023-11-13  0:42                                   ` Lorenzo Pieralisi
2023-11-13 17:41                               ` Catalin Marinas
2023-11-13 17:41                                 ` Catalin Marinas
2023-10-12 12:27   ` Will Deacon
2023-10-12 12:27     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZSVe0nb02S4kz5D2@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=acurrid@nvidia.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=danw@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=kwankhede@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.