iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
To: Peter Xu <peterx@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	Jason Wang <jasowang@redhat.com>
Subject: Re: [PATCH 2/2] iommu/amd: Remove clear_flush_young notifier
Date: Thu, 31 Jan 2019 12:25:40 +0000	[thread overview]
Message-ID: <926c28c2-66a0-8068-f7bc-2d8f9ce9f0ad@arm.com> (raw)
In-Reply-To: <20190131075941.GB25119@xz-x1>

On 31/01/2019 07:59, Peter Xu wrote:
> On Wed, Jan 30, 2019 at 12:27:40PM +0000, Jean-Philippe Brucker wrote:
>> Hi Peter,
> 
> Hi, Jean,
> 
>>
>> On 30/01/2019 05:57, Peter Xu wrote:
>>> AMD IOMMU driver is using the clear_flush_young() to do cache flushing
>>> but that's actually already covered by invalidate_range().  Remove the
>>> extra notifier and the chunks.
>>
>> Aren't uses of clear_flush_young() and invalidate_range() orthogonal? If
>> I understood correctly, when doing reclaim the kernel clears the young
>> bit from the PTE. This requires flushing secondary TLBs (ATCs), so that
>> new accesses from devices will go through the IOMMU, set the young bit
>> again and the kernel can accurately track page use. I didn't see
>> invalidate_range() being called by rmap or vmscan in this case, but
>> might just be missing it.
>>
>> Two MMU notifiers are used for the young bit, clear_flush_young() and
>> clear_young(). I believe the former should invalidate ATCs so that DMA
>> accesses participate in PTE aging. Otherwise the kernel can't know that
>> the device is still using entries marked 'old'. The latter,
>> clear_young() is exempted from invalidating the secondary TLBs by
>> mmu_notifier.h and the IOMMU driver doesn't need to implement it.
> 
> Yes I agree most of your analysis, but IMHO the problem is that the
> AMD IOMMU is not really participating in the PTE aging after all.
> Please have a look at mn_clear_flush_young() below at [1] - it's
> always returning zero, no matter whether the page has been accessed by
> device or not.  A real user of it could be someone like KVM (please
> see kvm_mmu_notifier_clear_flush_young) where we'll try to lookup the
> shadow PTEs and do test-and-clear on that, then return the result to
> the core mm.  That's why I think currently the AMD driver was only
> trying to use that as a way to do an extra flush however IMHO it's
> redundant.

Yes, in IOMMU drivers clear_flush_young() doesn't do the clear, only the
flush, since the level-1 page table is shared with the CPU. But removing
the flush gets you in the following situation:

  (1) Devices wants to access $addr, sends ATS query, the IOMMU sets PTE
young and the entry is cached in the ATC.

  (2) The CPU does ptep_clear_flush_young_notify(), clears young,
notices that the page is being used.

  (3) Device accesses $addr again. Given that we didn't invalidate the
ATC in (2) it accesses the page directly, without going through the IOMMU.

  (4) CPU does ptep_clear_flush_young_notify() again, the PTE doesn't
have the young bit, which means the page isn't being used and can be
reclaimed.

That's not accurate since the page is being used by the device. At step
(2) we should invalidate the ATC, so that (3) fetches the PTE again and
marks it young.

I can agree that the clear_flush_young() notifier is too brutal for this
purpose, since we send spurious ATC invalidation even when the PTE
wasn't young (and ATC inv is expensive). There should be another MMU
notifier "flush_young()" that is called by
ptep_clear_flush_young_notify() only when the page was actually young.
But for the moment it's the best we have to avoid the situation above.

I don't know enough about mm to understand exactly how the
page_referenced() information is used, but I believe the problem is only
about accuracy and not correctness. So applying this patch might not
break anything (after all, intel-svm.c never implemented the notifier)
but I think I'll keep the notifier in my SVA rework [1].

Thanks,
Jean

[1] https://www.spinics.net/lists/iommu/msg30081.html

  reply	other threads:[~2019-01-31 12:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-30  5:57 [PATCH 0/2] Some MMU notifier cleanups for Intel/AMD IOMMU Peter Xu
2019-01-30  5:57 ` [PATCH 1/2] iommu/vt-d: Remove change_pte notifier Peter Xu
2019-01-30 16:32   ` Joerg Roedel
2019-01-30  5:57 ` [PATCH 2/2] iommu/amd: Remove clear_flush_young notifier Peter Xu
2019-01-30 12:27   ` Jean-Philippe Brucker
2019-01-31  7:59     ` Peter Xu
2019-01-31 12:25       ` Jean-Philippe Brucker [this message]
2019-02-01  3:51         ` Peter Xu
2019-02-01 11:46           ` Jean-Philippe Brucker
2019-02-02  7:08             ` Peter Xu
2019-01-30 16:31   ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=926c28c2-66a0-8068-f7bc-2d8f9ce9f0ad@arm.com \
    --to=jean-philippe.brucker@arm.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jasowang@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).