All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	will@kernel.org, catalin.marinas@arm.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, robin.murphy@arm.com,
	nicolinc@nvidia.com, linux-arm-kernel@lists.infradead.org,
	kvm@vger.kernel.org, John Hubbard <jhubbard@nvidia.com>,
	zhi.wang.linux@gmail.com, Sean Christopherson <seanjc@google.com>
Subject: Re: [PATCH 2/2] arm64: Notify on pte permission upgrades
Date: Tue, 30 May 2023 18:05:41 +1000	[thread overview]
Message-ID: <87pm6ii6qi.fsf@nvidia.com> (raw)
In-Reply-To: <ZHKaBQt8623s9+VK@nvidia.com>


Jason Gunthorpe <jgg@nvidia.com> writes:

> On Wed, May 24, 2023 at 11:47:29AM +1000, Alistair Popple wrote:
>> ARM64 requires TLB invalidates when upgrading pte permission from
>> read-only to read-write. However mmu_notifiers assume upgrades do not
>> need notifications and none are sent. This causes problems when a
>> secondary TLB such as implemented by an ARM SMMU doesn't support
>> broadcast TLB maintenance (BTM) and caches a read-only PTE.
>
> I don't really like this design, but I see how you get here..

Not going to argue with that, I don't love it either but it seemed like
the most straight forward approach.

> mmu notifiers behavior should not be tied to the architecture, they
> are supposed to be generic reflections of what the MM is doing so that
> they can be hooked into by general purpose drivers.

Interesting. I've always assumed mmu notifiers were primarly about
keeping cache invalidations in sync with what the MM is doing. This is
based on the fact that you can't use mmu notifiers to establish mappings
and we instead have this rather complex dance with hmm_range_fault() to
establish a mapping.

My initial version [1] effectivly did add a generic event. Admittedly it
was somewhat incomplete, because I didn't hook up the new mmu notifier
event type to every user that could possibliy ignore it (eg. KVM). But
that was part of the problem - it was hard to figure out which mmu
notifier users can safely ignore it versus ones that can't, and that may
depend on what architecture it's running on. Hence why I hooked it up to
ptep_set_access_flags, because you get arch specific filtering as
required.

Perhaps the better option is to introduce a new mmu notifier op and let
drivers opt-in?

> If you want to hardwire invalidate_range to be only for SVA cases that
> actually share the page table itself and rely on some arch-defined
> invalidation, then we should give the op a much better name and
> discourage anyone else from abusing the new ops variable behavior.

Well that's the only use case I currently care about because we have hit
this issue, so for now at least I'd much rather a straight forward fix
we can backport.

The problem is an invalidation isn't well defined. If we are to make
this architecture independent then we need to be sending an invalidation
for any PTE state change
(ie. clean/dirty/writeable/read-only/present/not-present/etc) which we
don't do currently.

>> As no notification is sent and the SMMU does not snoop TLB invalidates
>> it will continue to return read-only entries to a device even though
>> the CPU page table contains a writable entry. This leads to a
>> continually faulting device and no way of handling the fault.
>
> Doesn't the fault generate a PRI/etc? If we get a PRI maybe we should
> just have the iommu driver push an iotlb invalidation command before
> it acks it? PRI is already really slow so I'm not sure a pipelined
> invalidation is going to be a problem? Does the SMMU architecture
> permit negative caching which would suggest we need it anyhow?

Yes, SMMU architecture (which matches the ARM architecture in regards to
TLB maintenance requirements) permits negative caching of some mapping
attributes including the read-only attribute. Hence without the flushing
we fault continuously.

> Jason

[1] - https://lore.kernel.org/linux-mm/ZGxg+I8FWz3YqBMk@infradead.org/T/

WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <apopple@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	will@kernel.org, catalin.marinas@arm.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, robin.murphy@arm.com,
	nicolinc@nvidia.com, linux-arm-kernel@lists.infradead.org,
	kvm@vger.kernel.org, John Hubbard <jhubbard@nvidia.com>,
	zhi.wang.linux@gmail.com, Sean Christopherson <seanjc@google.com>
Subject: Re: [PATCH 2/2] arm64: Notify on pte permission upgrades
Date: Tue, 30 May 2023 18:05:41 +1000	[thread overview]
Message-ID: <87pm6ii6qi.fsf@nvidia.com> (raw)
In-Reply-To: <ZHKaBQt8623s9+VK@nvidia.com>


Jason Gunthorpe <jgg@nvidia.com> writes:

> On Wed, May 24, 2023 at 11:47:29AM +1000, Alistair Popple wrote:
>> ARM64 requires TLB invalidates when upgrading pte permission from
>> read-only to read-write. However mmu_notifiers assume upgrades do not
>> need notifications and none are sent. This causes problems when a
>> secondary TLB such as implemented by an ARM SMMU doesn't support
>> broadcast TLB maintenance (BTM) and caches a read-only PTE.
>
> I don't really like this design, but I see how you get here..

Not going to argue with that, I don't love it either but it seemed like
the most straight forward approach.

> mmu notifiers behavior should not be tied to the architecture, they
> are supposed to be generic reflections of what the MM is doing so that
> they can be hooked into by general purpose drivers.

Interesting. I've always assumed mmu notifiers were primarly about
keeping cache invalidations in sync with what the MM is doing. This is
based on the fact that you can't use mmu notifiers to establish mappings
and we instead have this rather complex dance with hmm_range_fault() to
establish a mapping.

My initial version [1] effectivly did add a generic event. Admittedly it
was somewhat incomplete, because I didn't hook up the new mmu notifier
event type to every user that could possibliy ignore it (eg. KVM). But
that was part of the problem - it was hard to figure out which mmu
notifier users can safely ignore it versus ones that can't, and that may
depend on what architecture it's running on. Hence why I hooked it up to
ptep_set_access_flags, because you get arch specific filtering as
required.

Perhaps the better option is to introduce a new mmu notifier op and let
drivers opt-in?

> If you want to hardwire invalidate_range to be only for SVA cases that
> actually share the page table itself and rely on some arch-defined
> invalidation, then we should give the op a much better name and
> discourage anyone else from abusing the new ops variable behavior.

Well that's the only use case I currently care about because we have hit
this issue, so for now at least I'd much rather a straight forward fix
we can backport.

The problem is an invalidation isn't well defined. If we are to make
this architecture independent then we need to be sending an invalidation
for any PTE state change
(ie. clean/dirty/writeable/read-only/present/not-present/etc) which we
don't do currently.

>> As no notification is sent and the SMMU does not snoop TLB invalidates
>> it will continue to return read-only entries to a device even though
>> the CPU page table contains a writable entry. This leads to a
>> continually faulting device and no way of handling the fault.
>
> Doesn't the fault generate a PRI/etc? If we get a PRI maybe we should
> just have the iommu driver push an iotlb invalidation command before
> it acks it? PRI is already really slow so I'm not sure a pipelined
> invalidation is going to be a problem? Does the SMMU architecture
> permit negative caching which would suggest we need it anyhow?

Yes, SMMU architecture (which matches the ARM architecture in regards to
TLB maintenance requirements) permits negative caching of some mapping
attributes including the read-only attribute. Hence without the flushing
we fault continuously.

> Jason

[1] - https://lore.kernel.org/linux-mm/ZGxg+I8FWz3YqBMk@infradead.org/T/

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-05-30  8:06 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-24  1:47 [PATCH 1/2] mmu_notifiers: Restore documentation for .invalidate_range() Alistair Popple
2023-05-24  1:47 ` Alistair Popple
2023-05-24  1:47 ` [PATCH 2/2] arm64: Notify on pte permission upgrades Alistair Popple
2023-05-24  1:47   ` Alistair Popple
2023-05-28  0:02   ` Jason Gunthorpe
2023-05-28  0:02     ` Jason Gunthorpe
2023-05-30  8:05     ` Alistair Popple [this message]
2023-05-30  8:05       ` Alistair Popple
2023-05-30 11:54       ` Jason Gunthorpe
2023-05-30 11:54         ` Jason Gunthorpe
2023-05-30 12:14         ` Robin Murphy
2023-05-30 12:14           ` Robin Murphy
2023-05-30 12:52           ` Jason Gunthorpe
2023-05-30 12:52             ` Jason Gunthorpe
2023-05-30 13:44             ` Robin Murphy
2023-05-30 13:44               ` Robin Murphy
2023-05-30 14:06               ` Jason Gunthorpe
2023-05-30 14:06                 ` Jason Gunthorpe
2023-05-30 21:44                 ` Sean Christopherson
2023-05-30 21:44                   ` Sean Christopherson
2023-05-30 23:08                   ` Jason Gunthorpe
2023-05-30 23:08                     ` Jason Gunthorpe
2023-05-31  0:30                     ` Alistair Popple
2023-05-31  0:30                       ` Alistair Popple
2023-05-31  0:32                       ` Jason Gunthorpe
2023-05-31  0:32                         ` Jason Gunthorpe
2023-05-31  2:46                         ` Alistair Popple
2023-05-31  2:46                           ` Alistair Popple
2023-05-31 15:30                           ` Jason Gunthorpe
2023-05-31 15:30                             ` Jason Gunthorpe
2023-05-31 23:56                             ` Alistair Popple
2023-05-31 23:56                               ` Alistair Popple
     [not found]                       ` <31cdd164783fefad4c9ef4a6d33c1e0094405d0f03added523a82dd9febdf15f@mu.id>
2023-06-09  2:06                         ` Alistair Popple
2023-06-09  2:06                           ` Alistair Popple
2023-06-09  6:05                           ` Alistair Popple
2023-06-09  6:05                             ` Alistair Popple
2023-05-24  2:20 ` [PATCH 1/2] mmu_notifiers: Restore documentation for .invalidate_range() John Hubbard
2023-05-24  2:20   ` John Hubbard
2023-05-24  4:45   ` Alistair Popple
2023-05-24  4:45     ` Alistair Popple
2023-05-27 23:56   ` Jason Gunthorpe
2023-05-27 23:56     ` Jason Gunthorpe
2023-05-24  3:48 ` Zhi Wang
2023-05-24  3:48   ` Zhi Wang
2023-05-24  4:57   ` Alistair Popple
2023-05-24  4:57     ` Alistair Popple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pm6ii6qi.fsf@nvidia.com \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=seanjc@google.com \
    --cc=will@kernel.org \
    --cc=zhi.wang.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.