Re: [PATCH 2/2] arm64: Notify on pte permission upgrades

From: Robin Murphy <robin.murphy@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	will@kernel.org, catalin.marinas@arm.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, nicolinc@nvidia.com,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	John Hubbard <jhubbard@nvidia.com>,
	zhi.wang.linux@gmail.com, Sean Christopherson <seanjc@google.com>
Subject: Re: [PATCH 2/2] arm64: Notify on pte permission upgrades
Date: Tue, 30 May 2023 14:44:11 +0100	[thread overview]
Message-ID: <89dba89c-cb49-f917-31e4-3eafd484f4b2@arm.com> (raw)
In-Reply-To: <ZHXxkUe4IZXUc1PV@nvidia.com>

On 30/05/2023 1:52 pm, Jason Gunthorpe wrote:
> On Tue, May 30, 2023 at 01:14:41PM +0100, Robin Murphy wrote:
>> On 2023-05-30 12:54, Jason Gunthorpe wrote:
>>> On Tue, May 30, 2023 at 06:05:41PM +1000, Alistair Popple wrote:
>>>>
>>>>>> As no notification is sent and the SMMU does not snoop TLB invalidates
>>>>>> it will continue to return read-only entries to a device even though
>>>>>> the CPU page table contains a writable entry. This leads to a
>>>>>> continually faulting device and no way of handling the fault.
>>>>>
>>>>> Doesn't the fault generate a PRI/etc? If we get a PRI maybe we should
>>>>> just have the iommu driver push an iotlb invalidation command before
>>>>> it acks it? PRI is already really slow so I'm not sure a pipelined
>>>>> invalidation is going to be a problem? Does the SMMU architecture
>>>>> permit negative caching which would suggest we need it anyhow?
>>>>
>>>> Yes, SMMU architecture (which matches the ARM architecture in regards to
>>>> TLB maintenance requirements) permits negative caching of some mapping
>>>> attributes including the read-only attribute. Hence without the flushing
>>>> we fault continuously.
>>>
>>> Sounds like a straight up SMMU bug, invalidate the cache after
>>> resolving the PRI event.
>>
>> No, if the IOPF handler calls back into the mm layer to resolve the fault,
>> and the mm layer issues an invalidation in the process of that which isn't
>> propagated back to the SMMU (as it would be if BTM were in use), logically
>> that's the mm layer's failing. The SMMU driver shouldn't have to issue extra
>> mostly-redundant invalidations just because different CPU architectures have
>> different idiosyncracies around caching of permissions.
> 
> The mm has a definition for invalidate_range that does not include all
> the invalidation points SMMU needs. This is difficult to sort out
> because this is general purpose cross arch stuff.
> 
> You are right that this is worth optimizing, but right now we have a
> -rc bug that needs fixing and adding and extra SMMU invalidation is a
> straightforward -rc friendly way to address it.

Sure; to clarify, I'm not against the overall idea of putting a hack in 
the SMMU driver with a big comment that it is a hack to work around 
missing notifications under SVA, but it would not constitute an "SMMU 
bug" to not do that. SMMU is just another VMSAv8-compatible MMU - if, 
say, KVM or some other arm64 hypervisor driver wanted to do something 
funky with notifiers to shadow stage 1 permissions for some reason, it 
would presumably be equally affected.

FWIW, the VT-d spec seems to suggest that invalidation on RO->RW is only 
optional if the requester supports recoverable page faults, so although 
there's no use-case for non-PRI-based SVA at the moment, there is some 
potential argument that the notifier issue generalises even to x86.

Thanks,
Robin.