linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>,
	"Kim, Dongwon" <dongwon.kim@intel.com>,
	"Chang, Junxiao" <junxiao.chang@intel.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"Hocko, Michal" <mhocko@suse.com>,
	"jmarchan@redhat.com" <jmarchan@redhat.com>,
	"muchun.song@linux.dev" <muchun.song@linux.dev>,
	James Houghton <jthoughton@google.com>,
	"David Hildenbrand" <david@redhat.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: RE: [PATCH] udmabuf: revert 'Add support for mapping hugepages (v4)'
Date: Mon, 12 Jun 2023 07:10:31 +0000	[thread overview]
Message-ID: <IA0PR11MB71851B64A5E7062E3BDD8D2FF854A@IA0PR11MB7185.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20230608204927.88711-1-mike.kravetz@oracle.com>

Hi Mike,

Sorry for the late reply; I just got back from vacation.
If it is unsafe to directly use the subpages of a hugetlb page, then reverting
this patch seems like the only option for addressing this issue immediately.
So, this patch is
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>

As far as the use-case is concerned, there are two main users of the udmabuf
driver: Qemu and CrosVM VMMs. However, it appears Qemu is the only one
that uses hugetlb pages (when hugetlb=on is set) as the backing store for
Guest (Linux, Android and Windows) system memory. The main goal is to
share the pages associated with the Guest allocated framebuffer (FB) with
the Host GPU driver and other components in a zero-copy way. To that end,
the guest GPU driver (virtio-gpu) allocates 4k size pages (associated with
the FB) and pins them before sharing the (guest) physical (or dma) addresses
(and lengths) with Qemu. Qemu then translates the addresses into file
offsets and shares these offsets with udmabuf. 

The udmabuf driver obtains the pages associated with the file offsets and
uses these pages to eventually populate a scatterlist. It also creates a 
dmabuf fd and acts as the exporter. AFAIK, it should be possible to populate
the scatterlist with physical/dma addresses (of huge pages) instead of using
subpages but this might limit the capabilities of some (dmabuf) importers.
I'll try to figure out a solution using physical/dma addresses and see if it
works as expected, and will share the patch on linux-mm to request
feedback once it is ready.

Thanks,
Vivek

> 
> This effectively reverts commit 16c243e99d33 ("udmabuf: Add support
> for mapping hugepages (v4)").  Recently, Junxiao Chang found a BUG
> with page map counting as described here [1].  This issue pointed out
> that the udmabuf driver was making direct use of subpages of hugetlb
> pages.  This is not a good idea, and no other mm code attempts such use.
> In addition to the mapcount issue, this also causes issues with hugetlb
> vmemmap optimization and page poisoning.
> 
> For now, remove hugetlb support.
> 
> If udmabuf wants to be used on hugetlb mappings, it should be changed to
> only use complete hugetlb pages.  This will require different alignment
> and size requirements on the UDMABUF_CREATE API.
> 
> [1] https://lore.kernel.org/linux-mm/20230512072036.1027784-1-
> junxiao.chang@intel.com/
> 
> Fixes: 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  drivers/dma-buf/udmabuf.c | 47 +++++----------------------------------
>  1 file changed, 6 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 01f2e86f3f7c..12cf6bb2e3ce 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -12,7 +12,6 @@
>  #include <linux/shmem_fs.h>
>  #include <linux/slab.h>
>  #include <linux/udmabuf.h>
> -#include <linux/hugetlb.h>
>  #include <linux/vmalloc.h>
>  #include <linux/iosys-map.h>
> 
> @@ -207,9 +206,7 @@ static long udmabuf_create(struct miscdevice
> *device,
>  	struct udmabuf *ubuf;
>  	struct dma_buf *buf;
>  	pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit;
> -	struct page *page, *hpage = NULL;
> -	pgoff_t subpgoff, maxsubpgs;
> -	struct hstate *hpstate;
> +	struct page *page;
>  	int seals, ret = -EINVAL;
>  	u32 i, flags;
> 
> @@ -245,7 +242,7 @@ static long udmabuf_create(struct miscdevice
> *device,
>  		if (!memfd)
>  			goto err;
>  		mapping = memfd->f_mapping;
> -		if (!shmem_mapping(mapping) &&
> !is_file_hugepages(memfd))
> +		if (!shmem_mapping(mapping))
>  			goto err;
>  		seals = memfd_fcntl(memfd, F_GET_SEALS, 0);
>  		if (seals == -EINVAL)
> @@ -256,48 +253,16 @@ static long udmabuf_create(struct miscdevice
> *device,
>  			goto err;
>  		pgoff = list[i].offset >> PAGE_SHIFT;
>  		pgcnt = list[i].size   >> PAGE_SHIFT;
> -		if (is_file_hugepages(memfd)) {
> -			hpstate = hstate_file(memfd);
> -			pgoff = list[i].offset >> huge_page_shift(hpstate);
> -			subpgoff = (list[i].offset &
> -				    ~huge_page_mask(hpstate)) >>
> PAGE_SHIFT;
> -			maxsubpgs = huge_page_size(hpstate) >>
> PAGE_SHIFT;
> -		}
>  		for (pgidx = 0; pgidx < pgcnt; pgidx++) {
> -			if (is_file_hugepages(memfd)) {
> -				if (!hpage) {
> -					hpage =
> find_get_page_flags(mapping, pgoff,
> -
> FGP_ACCESSED);
> -					if (!hpage) {
> -						ret = -EINVAL;
> -						goto err;
> -					}
> -				}
> -				page = hpage + subpgoff;
> -				get_page(page);
> -				subpgoff++;
> -				if (subpgoff == maxsubpgs) {
> -					put_page(hpage);
> -					hpage = NULL;
> -					subpgoff = 0;
> -					pgoff++;
> -				}
> -			} else {
> -				page =
> shmem_read_mapping_page(mapping,
> -							       pgoff + pgidx);
> -				if (IS_ERR(page)) {
> -					ret = PTR_ERR(page);
> -					goto err;
> -				}
> +			page = shmem_read_mapping_page(mapping, pgoff
> + pgidx);
> +			if (IS_ERR(page)) {
> +				ret = PTR_ERR(page);
> +				goto err;
>  			}
>  			ubuf->pages[pgbuf++] = page;
>  		}
>  		fput(memfd);
>  		memfd = NULL;
> -		if (hpage) {
> -			put_page(hpage);
> -			hpage = NULL;
> -		}
>  	}
> 
>  	exp_info.ops  = &udmabuf_ops;
> --
> 2.40.1



  parent reply	other threads:[~2023-06-12  7:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-08 20:49 [PATCH] udmabuf: revert 'Add support for mapping hugepages (v4)' Mike Kravetz
2023-06-09  6:09 ` Greg Kroah-Hartman
2023-06-12  7:10 ` Kasireddy, Vivek [this message]
2023-06-12  7:46   ` David Hildenbrand
2023-06-13  8:26     ` Kasireddy, Vivek
2023-06-13 12:25       ` David Laight
2023-06-13 17:51       ` David Hildenbrand
2023-06-14  3:40         ` Hugh Dickins
2023-06-14  7:51         ` Kasireddy, Vivek
2023-06-15  9:48           ` David Hildenbrand
2023-06-19 12:35 ` Gerd Hoffmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=IA0PR11MB71851B64A5E7062E3BDD8D2FF854A@IA0PR11MB7185.namprd11.prod.outlook.com \
    --to=vivek.kasireddy@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jmarchan@redhat.com \
    --cc=jthoughton@google.com \
    --cc=junxiao.chang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=qemu-devel@nongnu.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).