linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"jrdr.linux@gmail.com" <jrdr.linux@gmail.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"minchan@kernel.org" <minchan@kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"will.deacon@arm.com" <will.deacon@arm.com>,
	"mhocko@suse.com" <mhocko@suse.com>,
	Linux-graphics-maintainer <Linux-graphics-maintainer@vmware.com>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"riel@surriel.com" <riel@surriel.com>
Subject: Re: [PATCH 2/9] mm: Add an apply_to_pfn_range interface
Date: Wed, 17 Apr 2019 10:28:35 -0400	[thread overview]
Message-ID: <20190417142835.GB3229@redhat.com> (raw)
In-Reply-To: <2dd9b36444dc92f409b44c74667b6d63dc1713a8.camel@vmware.com>

On Wed, Apr 17, 2019 at 09:15:52AM +0000, Thomas Hellstrom wrote:
> On Tue, 2019-04-16 at 10:46 -0400, Jerome Glisse wrote:
> > On Sat, Apr 13, 2019 at 08:34:02AM +0000, Thomas Hellstrom wrote:
> > > Hi, Jérôme
> > > 
> > > On Fri, 2019-04-12 at 17:07 -0400, Jerome Glisse wrote:
> > > > On Fri, Apr 12, 2019 at 04:04:18PM +0000, Thomas Hellstrom wrote:

[...]

> > > > > -/*
> > > > > - * Scan a region of virtual memory, filling in page tables as
> > > > > necessary
> > > > > - * and calling a provided function on each leaf page table.
> > > > > +/**
> > > > > + * apply_to_pfn_range - Scan a region of virtual memory,
> > > > > calling a
> > > > > provided
> > > > > + * function on each leaf page table entry
> > > > > + * @closure: Details about how to scan and what function to
> > > > > apply
> > > > > + * @addr: Start virtual address
> > > > > + * @size: Size of the region
> > > > > + *
> > > > > + * If @closure->alloc is set to 1, the function will fill in
> > > > > the
> > > > > page table
> > > > > + * as necessary. Otherwise it will skip non-present parts.
> > > > > + * Note: The caller must ensure that the range does not
> > > > > contain
> > > > > huge pages.
> > > > > + * The caller must also assure that the proper mmu_notifier
> > > > > functions are
> > > > > + * called. Either in the pte leaf function or before and after
> > > > > the
> > > > > call to
> > > > > + * apply_to_pfn_range.
> > > > 
> > > > This is wrong there should be a big FAT warning that this can
> > > > only be
> > > > use
> > > > against mmap of device file. The page table walking above is
> > > > broken
> > > > for
> > > > various thing you might find in any other vma like THP, device
> > > > pte,
> > > > hugetlbfs,
> > > 
> > > I was figuring since we didn't export the function anymore, the
> > > warning
> > > and checks could be left to its users, assuming that any other
> > > future
> > > usage of this function would require mm people audit anyway. But I
> > > can
> > > of course add that warning also to this function if you still want
> > > that?
> > 
> > Yeah more warning are better, people might start using this, i know
> > some poeple use unexported symbol and then report bugs while they
> > just were doing something illegal.
> > 
> > > > ...
> > > > 
> > > > Also the mmu notifier can not be call from the pfn callback as
> > > > that
> > > > callback
> > > > happens under page table lock (the change_pte notifier callback
> > > > is
> > > > useless
> > > > and not enough). So it _must_ happen around the call to
> > > > apply_to_pfn_range
> > > 
> > > In the comments I was having in mind usage of, for example
> > > ptep_clear_flush_notify(). But you're the mmu_notifier expert here.
> > > Are
> > > you saying that function by itself would not be sufficient?
> > > In that case, should I just scratch the text mentioning the pte
> > > leaf
> > > function?
> > 
> > ptep_clear_flush_notify() is useless ... i have posted patches to
> > either
> > restore it or remove it. In any case you must call mmu notifier range
> > and
> > they can not happen under lock. You usage looked fine (in the next
> > patch)
> > but i would rather have a bit of comment here to make sure people are
> > also
> > aware of that.
> > 
> > While we can hope that people would cc mm when using mm function, it
> > is
> > not always the case. So i rather be cautious and warn in comment as
> > much
> > as possible.
> > 
> 
> OK. Understood. All this actually makes me tend to want to try a bit
> harder using a slight modification to the pagewalk code instead. Don't
> really want to encourage two parallel code paths doing essentially the
> same thing; one good and one bad.
> 
> One thing that confuses me a bit with the pagewalk code is that callers
> (for example softdirty) typically call
> mmu_notifier_invalidate_range_start() around the pagewalk, but then if
> it ends up splitting a pmd, mmu_notifier_invalidate_range is called
> again, within the first range. Docs aren't really clear whether that's
> permitted or not. Is it?

It is mandatory ie you have to call mmu_notifier_invalidate_range()
in some cases. This is all documented in mmu_notifier.h see struct
mmu_notifier_ops comments and also Documentation/vm/mmu_notifier.rst

Roughly anytime you go from one valid pte (pmd/pud/p4d) to another
valid pte (pmd/pud/p4d) with a different page then you have to call
after clearing pte (pmd/pud/p4d) and before replacing it with its
new value. Changing permission on same page ie going from read and
write to read only, or read only to read and write, does not require
any extra call to mmu_notifier_invalidate_range()

The mmu_notifier_invalidate_range() is important for IOMMU with ATS/
PASID as it is when the flush the TLB and remote device TLB. So you
must flush those secondary TLB after clearing entry so that it can
not race to repopulate the TLB and before setting the new entry so
that at no point in time any hardware can wrongly access old page
while a new page is just now active.

Hopes that clarify it, between if you see any improvement to mmu-
notifier doc it would be more than welcome. I try to put comments
in enough places that people should see at least one of them but
maybe i miss a place where i should have put a comments to point
to the doc :)

Cheers,
Jérôme

  reply	other threads:[~2019-04-17 14:28 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 16:04 [PATCH 0/9] Emulated coherent graphics memory Thomas Hellstrom
2019-04-12 16:04 ` [PATCH 1/9] mm: Allow the [page|pfn]_mkwrite callbacks to drop the mmap_sem Thomas Hellstrom
2019-04-12 18:52   ` Ralph Campbell
2019-04-13 15:11   ` Souptick Joarder
2019-04-17 10:58     ` Thomas Hellstrom
2019-04-17 13:00       ` Souptick Joarder
2019-04-12 16:04 ` [PATCH 2/9] mm: Add an apply_to_pfn_range interface Thomas Hellstrom
2019-04-12 18:52   ` Ralph Campbell
2019-04-12 21:07   ` Jerome Glisse
2019-04-13  8:34     ` Thomas Hellstrom
2019-04-16 14:46       ` Jerome Glisse
2019-04-17  9:15         ` Thomas Hellstrom
2019-04-17 14:28           ` Jerome Glisse [this message]
2019-04-12 16:04 ` [PATCH 3/9] mm: Add write-protect and clean utilities for address space ranges Thomas Hellstrom
2019-04-12 18:52   ` Ralph Campbell
2019-04-13  8:40     ` Thomas Hellstrom
2019-04-12 16:04 ` [PATCH 4/9] drm/ttm: Allow the driver to provide the ttm struct vm_operations_struct Thomas Hellstrom
2019-04-13 18:39   ` Koenig, Christian
2019-04-12 16:04 ` [PATCH 5/9] drm/ttm: TTM fault handler helpers Thomas Hellstrom
2019-04-15  6:34   ` Christian König
2019-04-17 10:53     ` Thomas Hellstrom
2019-04-12 16:04 ` [PATCH 6/9] drm/vmwgfx: Implement an infrastructure for write-coherent resources Thomas Hellstrom
2019-04-22 18:54   ` Deepak Singh Rawat
2019-04-24  9:10     ` Thomas Hellstrom
2019-04-12 16:04 ` [PATCH 7/9] drm/vmwgfx: Use an RBtree instead of linked list for MOB resources Thomas Hellstrom
2019-04-22 19:15   ` Deepak Singh Rawat
2019-04-12 16:04 ` [PATCH 8/9] drm/vmwgfx: Implement an infrastructure for read-coherent resources Thomas Hellstrom
2019-04-22 20:12   ` Deepak Singh Rawat
2019-04-24  9:26     ` Thomas Hellstrom
2019-04-12 16:04 ` [PATCH 9/9] drm/vmwgfx: Add surface dirty-tracking callbacks Thomas Hellstrom
2019-04-23  2:51   ` Deepak Singh Rawat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190417142835.GB3229@redhat.com \
    --to=jglisse@redhat.com \
    --cc=Linux-graphics-maintainer@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jrdr.linux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=thellstrom@vmware.com \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).