linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Vivek Kasireddy <vivek.kasireddy@intel.com>,
	dri-devel@lists.freedesktop.org, linux-mm@kvack.org
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Dongwon Kim <dongwon.kim@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	James Houghton <jthoughton@google.com>,
	Jerome Marchand <jmarchan@redhat.com>,
	Junxiao Chang <junxiao.chang@intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Muchun Song <muchun.song@linux.dev>,
	Jason Gunthorpe <jgg@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>
Subject: Re: [PATCH v1 0/2] udmabuf: Add back support for mapping hugetlb pages
Date: Thu, 22 Jun 2023 10:25:17 +0200	[thread overview]
Message-ID: <6e429fbc-e0e6-53c0-c545-2e2cbbe757de@redhat.com> (raw)
In-Reply-To: <20230622072710.3707315-1-vivek.kasireddy@intel.com>

On 22.06.23 09:27, Vivek Kasireddy wrote:
> The first patch ensures that the mappings needed for handling mmap
> operation would be managed by using the pfn instead of struct page.
> The second patch restores support for mapping hugetlb pages where
> subpages of a hugepage are not directly used anymore (main reason
> for revert) and instead the hugetlb pages and the relevant offsets
> are used to populate the scatterlist for dma-buf export and for
> mmap operation.
> 
> Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options
> were passed to the Host kernel and Qemu was launched with these
> relevant options: qemu-system-x86_64 -m 4096m....
> -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080
> -display gtk,gl=on
> -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M
> -machine memory-backend=mem1
> 
> Replacing -display gtk,gl=on with -display gtk,gl=off above would
> exercise the mmap handler.
> 

While I think the VM_PFNMAP approach is much better and should fix that 
issue at hand, I thought more about missing memlock support and realized 
that we might have to fix something else. SO I'm going to raise the 
issue here.

I think udmabuf chose the wrong interface to do what it's doing, that 
makes it harder to fix it eventually.

Instead of accepting a range in a memfd, it should just have accepted a 
user space address range and then used 
pin_user_pages(FOLL_WRITE|FOLL_LONGTERM) to longterm-pin the pages 
"officially".

So what's the issue? Udma effectively pins pages longterm ("possibly 
forever") simply by grabbing a reference on them. These pages might 
easily reside in ZONE_MOVABLE or in MIGRATE_CMA pageblocks.

So what udmabuf does is break memory hotunplug and CMA, because it turns 
pages that have to remain movable unmovable.

In the pin_user_pages(FOLL_LONGTERM) case we make sure to migrate these 
pages. See mm/gup.c:check_and_migrate_movable_pages() and especially 
folio_is_longterm_pinnable(). We'd probably have to implement something 
similar for udmabuf, where we detect such unpinnable pages and migrate them.


For example, pairing udmabuf with vfio (which pins pages using 
pin_user_pages(FOLL_LONGTERM)) in QEMU will most probably not work in 
all cases: if udmabuf longterm pinned the pages "the wrong way", vfio 
will fail to migrate them during FOLL_LONGTERM and consequently fail 
pin_user_pages(). As long as udmabuf holds a reference on these pages, 
that will never succeed.


There are *probably* more issues on the QEMU side when udmabuf is paired 
with things like MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE used for 
virtio-balloon, virtio-mem, postcopy live migration, ... for example, in 
the vfio/vdpa case we make sure that we disallow most of these, because 
otherwise there can be an accidental "disconnect" between the pages 
mapped into the VM (guest view) and the pages mapped into the IOMMU 
(device view), for example, after a reboot.

-- 
Cheers,

David / dhildenb



  parent reply	other threads:[~2023-06-22  8:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-22  7:27 [PATCH v1 0/2] udmabuf: Add back support for mapping hugetlb pages Vivek Kasireddy
2023-06-22  7:27 ` [PATCH v1 1/2] udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap Vivek Kasireddy
2023-06-22  7:27 ` [PATCH v1 2/2] udmabuf: Add back support for mapping hugetlb pages Vivek Kasireddy
2023-06-22 22:10   ` kernel test robot
2023-06-22  8:25 ` David Hildenbrand [this message]
2023-06-22 21:33   ` [PATCH v1 0/2] " Mike Kravetz
2023-06-23  6:13   ` Kasireddy, Vivek
2023-06-23 16:35     ` Peter Xu
2023-06-23 16:37       ` Jason Gunthorpe
2023-06-23 17:28         ` Peter Xu
2023-06-26 12:57           ` Jason Gunthorpe
2023-06-26  7:45       ` Kasireddy, Vivek
2023-06-26 17:52         ` Peter Xu
2023-06-26 18:14           ` David Hildenbrand
2023-06-26 18:18             ` Jason Gunthorpe
2023-06-26 19:04               ` Peter Xu
2023-06-27 15:52                 ` Jason Gunthorpe
2023-06-27 16:00                   ` Peter Xu
2023-06-27 16:04                     ` Jason Gunthorpe
2023-06-27  6:37             ` Kasireddy, Vivek
2023-06-27  7:10               ` David Hildenbrand
2023-06-28  8:04                 ` Kasireddy, Vivek
2023-08-08 16:17   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e429fbc-e0e6-53c0-c545-2e2cbbe757de@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jmarchan@redhat.com \
    --cc=jthoughton@google.com \
    --cc=junxiao.chang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).