linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
Cc: Hugh Dickins <hughd@google.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	"Kim, Dongwon" <dongwon.kim@intel.com>,
	David Hildenbrand <david@redhat.com>,
	"Chang, Junxiao" <junxiao.chang@intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages)
Date: Thu, 27 Jul 2023 17:43:13 -0400	[thread overview]
Message-ID: <ZMLk8aMmpkK7ZCsW@x1n> (raw)
In-Reply-To: <CH3PR11MB71777432A63D3FAAE7E70F22F803A@CH3PR11MB7177.namprd11.prod.outlook.com>

Hi, Vivek,

On Tue, Jul 25, 2023 at 10:24:21PM +0000, Kasireddy, Vivek wrote:
> Hi Hugh,
> 
> > 
> > On Mon, 24 Jul 2023, Kasireddy, Vivek wrote:
> > > Hi Jason,
> > > > On Mon, Jul 24, 2023 at 07:54:38AM +0000, Kasireddy, Vivek wrote:
> > > >
> > > > > > I'm not at all familiar with the udmabuf use case but that sounds
> > > > > > brittle and effectively makes this notifier udmabuf specific right?
> > > > > Oh, Qemu uses the udmabuf driver to provide Host Graphics
> > components
> > > > > (such as Spice, Gstreamer, UI, etc) zero-copy access to Guest created
> > > > > buffers. In other words, from a core mm standpoint, udmabuf just
> > > > > collects a bunch of pages (associated with buffers) scattered inside
> > > > > the memfd (Guest ram backed by shmem or hugetlbfs) and wraps
> > > > > them in a dmabuf fd. And, since we provide zero-copy access, we
> > > > > use DMA fences to ensure that the components on the Host and
> > > > > Guest do not access the buffer simultaneously.
> > > >
> > > > So why do you need to track updates proactively like this?
> > > As David noted in the earlier series, if Qemu punches a hole in its memfd
> > > that goes through pages that are registered against a udmabuf fd, then
> > > udmabuf needs to update its list with new pages when the hole gets
> > > filled after (guest) writes. Otherwise, we'd run into the coherency
> > > problem (between udmabuf and memfd) as demonstrated in the selftest
> > > (patch #3 in this series).
> > 
> > Wouldn't this all be very much better if Qemu stopped punching holes there?
> I think holes can be punched anywhere in the memfd for various reasons. Some

I just start to read this thread, even haven't finished all of them.. but
so far I'm not sure whether this is right at all..

udmabuf is a file, it means it should follow the file semantics. mmu
notifier is per-mm, otoh.

Imagine for some reason QEMU mapped the guest pages twice, udmabuf is
created with vma1, so udmabuf registers the mm changes over vma1 only.

However the shmem/hugetlb page cache can be populated in either vma1, or
vma2.  It means when populating on vma2 udmabuf won't get update notify at
all, udmabuf pages can still be obsolete.  Same thing to when multi-process
QEMU is used, where we can have vma1 in QEMU while vma2 in the other
process like vhost-user.

I think the trick here is we tried to "hide" the fact that these are
actually normal file pages, but we're doing PFNMAP on them... then we want
the file features back, like hole punching..

If we used normal file operations, everything will just work fine; TRUNCATE
will unmap the host mapped frame buffers when needed, and when accessed
it'll fault on demand from the page cache.  We seem to be trying to
reinvent "truncation" for pfnmap but mmu notifier doesn't sound right to
this at least..

> of the use-cases where this would be done were identified by David. Here is what
> he said in an earlier discussion:
> "There are *probably* more issues on the QEMU side when udmabuf is paired 
> with things like MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE used for 
> virtio-balloon, virtio-mem, postcopy live migration, ... for example, in"

Now after seething this, I'm truly wondering whether we can still simply
use the file semantics we already have (for either shmem/hugetlb/...), or
is it a must we need to use a single fd to represent all?

Say, can we just use a tuple (fd, page_array) rather than the udmabuf
itself to do host zero-copy mapping?  the page_array can be e.g. a list of
file offsets that points to the pages (rather than pinning the pages using
FOLL_GET).  The good thing is then the fd can be the guest memory file
itself.  With that, we can mmap() over the shmem/hugetlb in whatever vma
and whatever process.  Truncation (and actually everything... e.g. page
migration, swapping, ... which will be disabled if we use PFNMAP pins) will
just all start to work, afaiu.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2023-07-27 21:43 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-18  8:28 [RFC v1 0/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-07-18  8:28 ` [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages) Vivek Kasireddy
2023-07-18 15:36   ` Jason Gunthorpe
2023-07-19  0:05     ` Kasireddy, Vivek
2023-07-19  0:24       ` Jason Gunthorpe
2023-07-19  6:19         ` Kasireddy, Vivek
2023-07-19  2:08   ` Alistair Popple
2023-07-20  7:43     ` Kasireddy, Vivek
2023-07-20  9:00       ` Alistair Popple
2023-07-24  7:54         ` Kasireddy, Vivek
2023-07-24 13:35           ` Jason Gunthorpe
2023-07-24 20:32             ` Kasireddy, Vivek
2023-07-25  4:30               ` Hugh Dickins
2023-07-25 22:24                 ` Kasireddy, Vivek
2023-07-27 21:43                   ` Peter Xu [this message]
2023-07-29  0:08                     ` Kasireddy, Vivek
2023-07-31 17:05                       ` Peter Xu
2023-08-01  7:11                         ` Kasireddy, Vivek
2023-08-01 21:57                           ` Peter Xu
2023-08-03  8:08                             ` Kasireddy, Vivek
2023-08-03 13:02                               ` Peter Xu
2023-07-25 12:36               ` Jason Gunthorpe
2023-07-25 22:44                 ` Kasireddy, Vivek
2023-07-25 22:53                   ` Jason Gunthorpe
2023-07-27  7:34                     ` Kasireddy, Vivek
2023-07-27 11:58                       ` Jason Gunthorpe
2023-07-29  0:46                         ` Kasireddy, Vivek
2023-07-30 23:09                           ` Jason Gunthorpe
2023-08-01  5:32                             ` Kasireddy, Vivek
2023-08-01 12:19                               ` Jason Gunthorpe
2023-08-01 12:22                                 ` David Hildenbrand
2023-08-01 12:23                                   ` Jason Gunthorpe
2023-08-01 12:26                                     ` David Hildenbrand
2023-08-01 12:26                                       ` Jason Gunthorpe
2023-08-01 12:28                                         ` David Hildenbrand
2023-08-01 17:53                                           ` Kasireddy, Vivek
2023-08-01 18:19                                             ` Jason Gunthorpe
2023-08-03  7:35                                               ` Kasireddy, Vivek
2023-08-03 12:14                                                 ` Jason Gunthorpe
2023-08-03 12:32                                                   ` David Hildenbrand
2023-08-04  0:14                                                     ` Alistair Popple
2023-08-04  6:39                                                       ` Kasireddy, Vivek
2023-08-04  7:23                                                         ` David Hildenbrand
2023-08-04 21:53                                                           ` Kasireddy, Vivek
2023-08-04 12:49                                                         ` Jason Gunthorpe
2023-08-08  7:37                                                           ` Kasireddy, Vivek
2023-08-08 12:42                                                             ` Jason Gunthorpe
2023-08-16  6:43                                                               ` Kasireddy, Vivek
2023-08-21  9:02                                                                 ` Alistair Popple
2023-08-22  6:14                                                                   ` Kasireddy, Vivek
2023-08-22  8:15                                                                     ` Alistair Popple
2023-08-24  6:48                                                                       ` Kasireddy, Vivek
2023-08-28  4:38                                                                         ` Kasireddy, Vivek
2023-08-30 16:02                                                                           ` Jason Gunthorpe
2023-07-25  3:38             ` Alistair Popple
2023-07-24 13:36           ` Alistair Popple
2023-07-24 13:37             ` Jason Gunthorpe
2023-07-24 20:42             ` Kasireddy, Vivek
2023-07-25  3:14               ` Alistair Popple
2023-07-18  8:28 ` [RFC v1 2/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Vivek Kasireddy
2023-08-02 12:40   ` Daniel Vetter
2023-08-03  8:24     ` Kasireddy, Vivek
2023-08-03  8:32       ` Daniel Vetter
2023-07-18  8:28 ` [RFC v1 3/3] selftests/dma-buf/udmabuf: Add tests for huge pages and FALLOC_FL_PUNCH_HOLE Vivek Kasireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMLk8aMmpkK7ZCsW@x1n \
    --to=peterx@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=junxiao.chang@intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).