linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	"Kim, Dongwon" <dongwon.kim@intel.com>,
	David Hildenbrand <david@redhat.com>,
	"Chang, Junxiao" <junxiao.chang@intel.com>,
	Hugh Dickins <hughd@google.com>, Peter Xu <peterx@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Mike Kravetz" <mike.kravetz@oracle.com>
Subject: RE: [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages)
Date: Sat, 29 Jul 2023 00:46:59 +0000	[thread overview]
Message-ID: <CH3PR11MB71779C83F8A0EC6C2F3F4B0CF807A@CH3PR11MB7177.namprd11.prod.outlook.com> (raw)
In-Reply-To: <ZMJbywGYN0QLh3vF@nvidia.com>

Hi Jason,

> > > > > If you still need the memory mapped then you re-call
> hmm_range_fault
> > > > > and re-obtain it. hmm_range_fault will resolve all the races and you
> > > > > get new pages.
> > >
> > > > IIUC, for my udmabuf use-case, it looks like calling hmm_range_fault
> > > > immediately after an invalidate (range notification) would preemptively
> > > fault in
> > > > new pages before a write. The problem with that is if a read occurs on
> > > those
> > > > new pages, then the data is incorrect as a write may not have
> > > > happened yet.
> > >
> > > It cannot be, if you use hmm_range_fault correctly you cannot get
> > > corruption no matter what is done to the mmap'd memfd. If there is
> > > otherwise it is a hmm_range_fault bug plain and simple.
> > >
> > > > Ideally, what I am looking for is for getting new pages at the time of or
> after
> > > > a write; until then, it is ok to use the old pages given my use-case.
> > >
> > > It is wrong, if you are synchronizing the vma then you must use the
> > > latest copy. If your use case can tolerate it then keep a 'not
> > > present' indication for the missing pages until you actually need
> > > them, but dmabuf doesn't really provide an API for that.
> > >
> > > > I think the difference comes down to whether we (udmabuf driver)
> want to
> > > > grab the new pages after getting notified about a PTE update because
> > > > of a fault
> > >
> > > Why? You still haven't explained why you want this.
> > Ok, let me explain using one of the udmabuf selftests (added in patch #3)
> > to describe the problem (sorry, I'd have to use the terms memfd, hole, etc)
> > I am trying to solve:
> >         size = MEMFD_SIZE * page_size;
> >         memfd = create_memfd_with_seals(size, false);
> >         addr1 = mmap_fd(memfd, size);
> >         write_to_memfd(addr1, size, 'a');
> >         buf = create_udmabuf_list(devfd, memfd, size);
> >         addr2 = mmap_fd(buf, NUM_PAGES * NUM_ENTRIES * getpagesize());
> >         punch_hole(memfd, MEMFD_SIZE / 2);
> >    -> At this point, if I were to read addr1, it'd still have "a" in relevant areas
> >         because a new write hasn't happened yet. And, since this results in an
> >         invalidation (notification) of the associated VMA range, I could register
> >         a callback in udmabuf driver and get notified but I am not sure how or
> >         why that would be useful.
> 
> When you get an invalidation you trigger dmabuf move, which revokes
> the importes use of the dmabuf because the underlying memory has
> changed. This is exactly the same as a GPU driver migrating memory
> to/fro CPU memory.
> 
> >
> >         write_to_memfd(addr1, size, 'b');
> >    -> Here, the hole gets refilled as a result of the above writes which trigger
> >         faults and the PTEs are updated to point to new pages. When this
> happens,
> >         the udmabuf driver needs to be made aware of the new pages that
> were
> >         faulted in because of the new writes.
> 
> You only need this because you are not processing the invalidate.
> 
> >         a way to get notified when the hole is written to, the solution I came
> up
> >         with is to either add a new notifier or add calls to change_pte() when
> the
> >         PTEs do get updated. However, considering your suggestion to use
> >         hmm_range_fault(), it is not clear to me how it would help while the
> hole
> >         is being written to as the writes occur outside of the
> >         udmabuf driver.
> 
> You have the design backwards.
> 
> When a dmabuf importer asks for the dmabuf to be present you call
> hmm_range_fault() and you get back whatever memory is appropriate. The
> importer can then use it.
> 
> If the underlying memory changes then you get the invalidation and you
> trigger move. The importer stops using the memory and the underlying
> pages change.
> 
> Later the importer decides it needs the memory again so it again asks
> for the dmabuf to be present, which does hmm_range_fault and gets
> whatever is appropriate at the time.
Unless I am missing something, I think just doing the above still won't solve
the problem. Consider this sequence:
     write_to_memfd(addr1, size, 'a');
     buf = create_udmabuf_list(devfd, memfd, size);
     addr2 = mmap_fd(buf, NUM_PAGES * NUM_ENTRIES * getpagesize());
     read(addr2);
     write_to_memfd(addr1, size, 'b');
     punch_hole(memfd, MEMFD_SIZE / 2);
-> Since we can process the invalidate at this point, as per your suggestion,
     we can trigger dmabuf move to let the importers know that the dmabuf's
     backing memory has changed (or moved).

     read(addr2);
-> Because there is a hole, we can handle the read by either providing the
     old pages or zero pages (if using hmm_range_fault()) to the importers.
     Maybe it is against convention, but I think it makes sense to provide old
     pages (that were mapped before the hole punch) because the importers
     have not read the data in these pages ('b' above) yet. And, another reason
     to provide old pages is because the data in these pages is shown in a window
     on the Host's screen so it doesn't make sense to show zero page data.

-> write_to_memfd(addr1, size, 'c');
     As the hole gets refilled (with new pages) after the above write, AFAIU, we
     have to tell the importers again that since the backing memory has changed,
     (new pages) they need to recreate their mappings. But herein lies the problem:
     from inside the udmabuf driver, we cannot know when this write occurs, so we
     would not be able to notify the importers of the dmabuf move. Since Qemu knows
     about this write, I was initially thinking of adding a new udmabuf IOCTL (something
     like UDMABUF_PREPARE) to have Qemu let udmabuf know after writes occur.
     This would provide an opportunity after an invalidate to notify the importers of
     the (dmabuf) move. I think this would solve the problem with changes only to the
     udmabuf driver (including adding the new IOCTL + handling the invalidate) but I
     was hoping to solve it in a generic way by adding a new notifier or using change_pte()
     to get notified about PTE updates (when faults occur).

Thanks,
Vivek

> Jason



  reply	other threads:[~2023-07-29  0:47 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18  8:28 [RFC v1 0/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-07-18  8:28 ` [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages) Vivek Kasireddy
2023-07-18 15:36   ` Jason Gunthorpe
2023-07-19  0:05     ` Kasireddy, Vivek
2023-07-19  0:24       ` Jason Gunthorpe
2023-07-19  6:19         ` Kasireddy, Vivek
2023-07-19  2:08   ` Alistair Popple
2023-07-20  7:43     ` Kasireddy, Vivek
2023-07-20  9:00       ` Alistair Popple
2023-07-24  7:54         ` Kasireddy, Vivek
2023-07-24 13:35           ` Jason Gunthorpe
2023-07-24 20:32             ` Kasireddy, Vivek
2023-07-25  4:30               ` Hugh Dickins
2023-07-25 22:24                 ` Kasireddy, Vivek
2023-07-27 21:43                   ` Peter Xu
2023-07-29  0:08                     ` Kasireddy, Vivek
2023-07-31 17:05                       ` Peter Xu
2023-08-01  7:11                         ` Kasireddy, Vivek
2023-08-01 21:57                           ` Peter Xu
2023-08-03  8:08                             ` Kasireddy, Vivek
2023-08-03 13:02                               ` Peter Xu
2023-07-25 12:36               ` Jason Gunthorpe
2023-07-25 22:44                 ` Kasireddy, Vivek
2023-07-25 22:53                   ` Jason Gunthorpe
2023-07-27  7:34                     ` Kasireddy, Vivek
2023-07-27 11:58                       ` Jason Gunthorpe
2023-07-29  0:46                         ` Kasireddy, Vivek [this message]
2023-07-30 23:09                           ` Jason Gunthorpe
2023-08-01  5:32                             ` Kasireddy, Vivek
2023-08-01 12:19                               ` Jason Gunthorpe
2023-08-01 12:22                                 ` David Hildenbrand
2023-08-01 12:23                                   ` Jason Gunthorpe
2023-08-01 12:26                                     ` David Hildenbrand
2023-08-01 12:26                                       ` Jason Gunthorpe
2023-08-01 12:28                                         ` David Hildenbrand
2023-08-01 17:53                                           ` Kasireddy, Vivek
2023-08-01 18:19                                             ` Jason Gunthorpe
2023-08-03  7:35                                               ` Kasireddy, Vivek
2023-08-03 12:14                                                 ` Jason Gunthorpe
2023-08-03 12:32                                                   ` David Hildenbrand
2023-08-04  0:14                                                     ` Alistair Popple
2023-08-04  6:39                                                       ` Kasireddy, Vivek
2023-08-04  7:23                                                         ` David Hildenbrand
2023-08-04 21:53                                                           ` Kasireddy, Vivek
2023-08-04 12:49                                                         ` Jason Gunthorpe
2023-08-08  7:37                                                           ` Kasireddy, Vivek
2023-08-08 12:42                                                             ` Jason Gunthorpe
2023-08-16  6:43                                                               ` Kasireddy, Vivek
2023-08-21  9:02                                                                 ` Alistair Popple
2023-08-22  6:14                                                                   ` Kasireddy, Vivek
2023-08-22  8:15                                                                     ` Alistair Popple
2023-08-24  6:48                                                                       ` Kasireddy, Vivek
2023-08-28  4:38                                                                         ` Kasireddy, Vivek
2023-08-30 16:02                                                                           ` Jason Gunthorpe
2023-07-25  3:38             ` Alistair Popple
2023-07-24 13:36           ` Alistair Popple
2023-07-24 13:37             ` Jason Gunthorpe
2023-07-24 20:42             ` Kasireddy, Vivek
2023-07-25  3:14               ` Alistair Popple
2023-07-18  8:28 ` [RFC v1 2/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-08-02 12:40   ` Daniel Vetter
2023-08-03  8:24     ` Kasireddy, Vivek
2023-08-03  8:32       ` Daniel Vetter
2023-07-18  8:28 ` [RFC v1 3/3] selftests/dma-buf/udmabuf: Add tests for huge pages and FALLOC_FL_PUNCH_HOLE Vivek Kasireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CH3PR11MB71779C83F8A0EC6C2F3F4B0CF807A@CH3PR11MB7177.namprd11.prod.outlook.com \
    --to=vivek.kasireddy@intel.com \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=junxiao.chang@intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).