linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices
@ 2022-10-21 17:41 Logan Gunthorpe
  2022-10-21 17:41 ` [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page() Logan Gunthorpe
                   ` (9 more replies)
  0 siblings, 10 replies; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

Hi,

This is the latest P2PDMA userspace patch set. This version includes
some cleanup from feedback from the last posting[1].

This patch set enables userspace P2PDMA by allowing userspace to mmap()
allocated chunks of the CMB. The resulting VMA can be passed only
to O_DIRECT IO on NVMe backed files or block devices. A flag is added
to GUP() in Patch 1, then Patches 2 through 6 wire this flag up based
on whether the block queue indicates P2PDMA support. Patches 7
creates the sysfs resource that can hand out the VMAs and Patch 8
adds brief documentation for the new interface.

Feedback welcome.

This series is based on v6.1-rc1. A git branch is available here:

  https://github.com/sbates130272/linux-p2pmem/  p2pdma_user_cmb_v11

Thanks,

Logan

[1] https://lkml.kernel.org/r/20220922163926.7077-1-logang@deltatee.com

--

Changes in v11:
  - Rebased onto v6.1-rc1, fixed minor conflict in bio_map_user_iov
  - The GUP test was moved to try_grab_page() and try_grab_folio().
    This ought to be a bit more future proof. It required adding a new
    cleanup patch to return a proper error code from try_grab_page().
    (Per Jason)

Changes in v10:
  - Rebased onto v6.0-rc6
  - Reworked iov iter changes to reuse the code better and
    name them without the _flags() prefix (per Christoph)
  - Renamed a number of flags variables to gup_flags (per John)
  - Minor fixups to the last documentation patch (from Greg and John)

Changes in v9:
  - Rebased onto v6.0-rc2, included reworking the iov_iter patch
    due to changes there
  - Drop the char device mmap implementation in favour of a sysfs
    based interface. (per Christoph)

 (v8 only included the first half of the series and was merged for v6.0)

Changes in v8:
  - Rebase onto v5.19-rc1
  - Rework how the pages are stored in the VMA per Jason's suggestion

Changes in v7:
  - Rebased onto v5.18-rc1 which includes Christophs cleanup to
    free_zone_device_page() (similar to Ralph's patch).
  - Fix bug with concurrent first calls to pci_p2pdma_vma_fault()
    that caused a double allocation and lost p2p memory. Noticed
    by Andrew Maier.
  - Collected a Reviewed-by tag from Chaitanya.
  - Numerous minor fixes to commit messages

--

Logan Gunthorpe (9):
  mm: allow multiple error returns in try_grab_page()
  mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
  iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
  block: add check when merging zone device pages
  lib/scatterlist: add check when merging zone device pages
  block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
  block: set FOLL_PCI_P2PDMA in bio_map_user_iov()
  PCI/P2PDMA: Allow userspace VMA allocations through sysfs
  ABI: sysfs-bus-pci: add documentation for p2pmem allocate

 Documentation/ABI/testing/sysfs-bus-pci |  10 ++
 block/bio.c                             |  11 ++-
 block/blk-map.c                         |  12 ++-
 drivers/pci/p2pdma.c                    | 124 ++++++++++++++++++++++++
 include/linux/mm.h                      |   3 +-
 include/linux/mmzone.h                  |  24 +++++
 include/linux/uio.h                     |   6 ++
 lib/iov_iter.c                          |  32 ++++--
 lib/scatterlist.c                       |  25 +++--
 mm/gup.c                                |  45 ++++++---
 mm/huge_memory.c                        |  19 ++--
 mm/hugetlb.c                            |  23 +++--
 12 files changed, 280 insertions(+), 54 deletions(-)


base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
--
2.30.2

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page()
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
@ 2022-10-21 17:41 ` Logan Gunthorpe
  2022-10-24 15:00   ` Christoph Hellwig
                     ` (2 more replies)
  2022-10-21 17:41 ` [PATCH v11 2/9] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages Logan Gunthorpe
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

In order to add checks for P2PDMA memory into try_grab_page(), expand
the error return from a bool to an int/error code. Update all the
callsites handle change in usage.

Also remove the WARN_ON_ONCE() call at the callsites seeing there
already is a WARN_ON_ONCE() inside the function if it fails.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 include/linux/mm.h |  2 +-
 mm/gup.c           | 26 ++++++++++++++------------
 mm/huge_memory.c   | 19 +++++++++++++------
 mm/hugetlb.c       | 17 +++++++++--------
 4 files changed, 37 insertions(+), 27 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8bbcccbc5565..62a91dc1272b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1129,7 +1129,7 @@ static inline void get_page(struct page *page)
 	folio_get(page_folio(page));
 }
 
-bool __must_check try_grab_page(struct page *page, unsigned int flags);
+int __must_check try_grab_page(struct page *page, unsigned int flags);
 
 static inline __must_check bool try_get_page(struct page *page)
 {
diff --git a/mm/gup.c b/mm/gup.c
index fe195d47de74..e2f447446384 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -202,17 +202,19 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
  * time. Cases: please see the try_grab_folio() documentation, with
  * "refs=1".
  *
- * Return: true for success, or if no action was required (if neither FOLL_PIN
- * nor FOLL_GET was set, nothing is done). False for failure: FOLL_GET or
- * FOLL_PIN was set, but the page could not be grabbed.
+ * Return: 0 for success, or if no action was required (if neither FOLL_PIN
+ * nor FOLL_GET was set, nothing is done). A negative error code for failure:
+ *
+ *   -ENOMEM		FOLL_GET or FOLL_PIN was set, but the page could not
+ *			be grabbed.
  */
-bool __must_check try_grab_page(struct page *page, unsigned int flags)
+int __must_check try_grab_page(struct page *page, unsigned int flags)
 {
 	struct folio *folio = page_folio(page);
 
 	WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == (FOLL_GET | FOLL_PIN));
 	if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
-		return false;
+		return -ENOMEM;
 
 	if (flags & FOLL_GET)
 		folio_ref_inc(folio);
@@ -232,7 +234,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
 		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
 	}
 
-	return true;
+	return 0;
 }
 
 /**
@@ -624,8 +626,9 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		       !PageAnonExclusive(page), page);
 
 	/* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
-	if (unlikely(!try_grab_page(page, flags))) {
-		page = ERR_PTR(-ENOMEM);
+	ret = try_grab_page(page, flags);
+	if (unlikely(ret)) {
+		page = ERR_PTR(ret);
 		goto out;
 	}
 	/*
@@ -960,10 +963,9 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
 			goto unmap;
 		*page = pte_page(*pte);
 	}
-	if (unlikely(!try_grab_page(*page, gup_flags))) {
-		ret = -ENOMEM;
+	ret = try_grab_page(*page, gup_flags);
+	if (unlikely(ret))
 		goto unmap;
-	}
 out:
 	ret = 0;
 unmap:
@@ -2536,7 +2538,7 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
 		}
 		SetPageReferenced(page);
 		pages[*nr] = page;
-		if (unlikely(!try_grab_page(page, flags))) {
+		if (unlikely(try_grab_page(page, flags))) {
 			undo_dev_pagemap(nr, nr_start, flags, pages);
 			break;
 		}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1cc4a5f4791e..52f2b2a2ffae 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1035,6 +1035,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
 	unsigned long pfn = pmd_pfn(*pmd);
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
+	int ret;
 
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
@@ -1066,8 +1067,9 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
 	if (!*pgmap)
 		return ERR_PTR(-EFAULT);
 	page = pfn_to_page(pfn);
-	if (!try_grab_page(page, flags))
-		page = ERR_PTR(-ENOMEM);
+	ret = try_grab_page(page, flags);
+	if (ret)
+		page = ERR_PTR(ret);
 
 	return page;
 }
@@ -1193,6 +1195,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
 	unsigned long pfn = pud_pfn(*pud);
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
+	int ret;
 
 	assert_spin_locked(pud_lockptr(mm, pud));
 
@@ -1226,8 +1229,10 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
 	if (!*pgmap)
 		return ERR_PTR(-EFAULT);
 	page = pfn_to_page(pfn);
-	if (!try_grab_page(page, flags))
-		page = ERR_PTR(-ENOMEM);
+
+	ret = try_grab_page(page, flags);
+	if (ret)
+		page = ERR_PTR(ret);
 
 	return page;
 }
@@ -1435,6 +1440,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
+	int ret;
 
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
@@ -1459,8 +1465,9 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 	VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
 			!PageAnonExclusive(page), page);
 
-	if (!try_grab_page(page, flags))
-		return ERR_PTR(-ENOMEM);
+	ret = try_grab_page(page, flags);
+	if (ret)
+		return ERR_PTR(ret);
 
 	if (flags & FOLL_TOUCH)
 		touch_pmd(vma, addr, pmd, flags & FOLL_WRITE);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b586cdd75930..e8d01a19ce46 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7224,14 +7224,15 @@ follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, int flags
 		page = pte_page(pte) +
 			((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
 		/*
-		 * try_grab_page() should always succeed here, because: a) we
-		 * hold the pmd (ptl) lock, and b) we've just checked that the
-		 * huge pmd (head) page is present in the page tables. The ptl
-		 * prevents the head page and tail pages from being rearranged
-		 * in any way. So this page must be available at this point,
-		 * unless the page refcount overflowed:
+		 * try_grab_page() should always be able to get the page here,
+		 * because: a) we hold the pmd (ptl) lock, and b) we've just
+		 * checked that the huge pmd (head) page is present in the
+		 * page tables. The ptl prevents the head page and tail pages
+		 * from being rearranged in any way. So this page must be
+		 * available at this point, unless the page refcount
+		 * overflowed:
 		 */
-		if (WARN_ON_ONCE(!try_grab_page(page, flags))) {
+		if (try_grab_page(page, flags)) {
 			page = NULL;
 			goto out;
 		}
@@ -7269,7 +7270,7 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address,
 	pte = huge_ptep_get((pte_t *)pud);
 	if (pte_present(pte)) {
 		page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
-		if (WARN_ON_ONCE(!try_grab_page(page, flags))) {
+		if (try_grab_page(page, flags)) {
 			page = NULL;
 			goto out;
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 2/9] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
  2022-10-21 17:41 ` [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page() Logan Gunthorpe
@ 2022-10-21 17:41 ` Logan Gunthorpe
  2022-10-24 15:00   ` Christoph Hellwig
  2022-10-25  1:09   ` Chaitanya Kulkarni
  2022-10-21 17:41 ` [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags() Logan Gunthorpe
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

GUP Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to
allow obtaining P2PDMA pages. If GUP is called without the flag and a
P2PDMA page is found, it will return an error in try_grab_page() or
try_grab_folio().

The check is safe to do before taking the reference to the page in both
cases seeing the page should be protected by either the appropriate
ptl or mmap_lock; or the gup fast guarantees preventing TLB flushes.

try_grab_folio() has one call site that WARNs on failure and cannot
actually deal with the failure of this function (it seems it will
get into an infinite loop). Expand the comment there to document a
couple more conditions on why it will not fail.

FOLL_PCI_P2PDMA cannot be set if FOLL_LONGTERM is set. This is to copy
fsdax until pgmap refcounts are fixed (see the link below for more
information).

Link: https://lkml.kernel.org/r/Yy4Ot5MoOhsgYLTQ@ziepe.ca
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 include/linux/mm.h |  1 +
 mm/gup.c           | 19 ++++++++++++++++++-
 mm/hugetlb.c       |  6 ++++--
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 62a91dc1272b..6b081a8dcf88 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2958,6 +2958,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
 #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
 #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
+#define FOLL_PCI_P2PDMA	0x100000 /* allow returning PCI P2PDMA pages */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
diff --git a/mm/gup.c b/mm/gup.c
index e2f447446384..29e28f020f0b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -123,6 +123,9 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
  */
 struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
 {
+	if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+		return NULL;
+
 	if (flags & FOLL_GET)
 		return try_get_folio(page, refs);
 	else if (flags & FOLL_PIN) {
@@ -216,6 +219,9 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
 	if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
 		return -ENOMEM;
 
+	if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+		return -EREMOTEIO;
+
 	if (flags & FOLL_GET)
 		folio_ref_inc(folio);
 	else if (flags & FOLL_PIN) {
@@ -631,6 +637,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		page = ERR_PTR(ret);
 		goto out;
 	}
+
 	/*
 	 * We need to make the page accessible if and only if we are going
 	 * to access its content (the FOLL_PIN case).  Please see
@@ -1060,6 +1067,9 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 	if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
 		return -EOPNOTSUPP;
 
+	if ((gup_flags & FOLL_LONGTERM) && (gup_flags & FOLL_PCI_P2PDMA))
+		return -EOPNOTSUPP;
+
 	if (vma_is_secretmem(vma))
 		return -EFAULT;
 
@@ -2536,6 +2546,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
 			undo_dev_pagemap(nr, nr_start, flags, pages);
 			break;
 		}
+
+		if (!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)) {
+			undo_dev_pagemap(nr, nr_start, flags, pages);
+			break;
+		}
+
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		if (unlikely(try_grab_page(page, flags))) {
@@ -3020,7 +3036,8 @@ static int internal_get_user_pages_fast(unsigned long start,
 
 	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
 				       FOLL_FORCE | FOLL_PIN | FOLL_GET |
-				       FOLL_FAST_ONLY | FOLL_NOFAULT)))
+				       FOLL_FAST_ONLY | FOLL_NOFAULT |
+				       FOLL_PCI_P2PDMA)))
 		return -EINVAL;
 
 	if (gup_flags & FOLL_PIN)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e8d01a19ce46..a55adfbacedb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6342,8 +6342,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			 * tables. If the huge page is present, then the tail
 			 * pages must also be present. The ptl prevents the
 			 * head page and tail pages from being rearranged in
-			 * any way. So this page must be available at this
-			 * point, unless the page refcount overflowed:
+			 * any way. As this is hugetlb, the pages will never
+			 * be p2pdma or not longterm pinable. So this page
+			 * must be available at this point, unless the page
+			 * refcount overflowed:
 			 */
 			if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
 							 flags))) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
  2022-10-21 17:41 ` [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page() Logan Gunthorpe
  2022-10-21 17:41 ` [PATCH v11 2/9] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages Logan Gunthorpe
@ 2022-10-21 17:41 ` Logan Gunthorpe
  2022-10-25  1:14   ` Chaitanya Kulkarni
  2022-10-27  7:11   ` Jay Fang
  2022-10-21 17:41 ` [PATCH v11 4/9] block: add check when merging zone device pages Logan Gunthorpe
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
which take a flags argument that is passed to get_user_pages_fast().

This is so that FOLL_PCI_P2PDMA can be passed when appropriate.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uio.h |  6 ++++++
 lib/iov_iter.c      | 32 ++++++++++++++++++++++++--------
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 2e3134b14ffd..9ede533ce64c 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -247,8 +247,14 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
 void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
 		     loff_t start, size_t count);
+ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
+		size_t maxsize, unsigned maxpages, size_t *start,
+		unsigned gup_flags);
 ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
 			size_t maxsize, unsigned maxpages, size_t *start);
+ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
+		struct page ***pages, size_t maxsize, size_t *start,
+		unsigned gup_flags);
 ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
 			size_t maxsize, size_t *start);
 int iov_iter_npages(const struct iov_iter *i, int maxpages);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index c3ca28ca68a6..53efad017f3c 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1430,7 +1430,8 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
 
 static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
-		   unsigned int maxpages, size_t *start)
+		   unsigned int maxpages, size_t *start,
+		   unsigned int gup_flags)
 {
 	unsigned int n;
 
@@ -1442,7 +1443,6 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		maxsize = MAX_RW_COUNT;
 
 	if (likely(user_backed_iter(i))) {
-		unsigned int gup_flags = 0;
 		unsigned long addr;
 		int res;
 
@@ -1492,33 +1492,49 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 	return -EFAULT;
 }
 
-ssize_t iov_iter_get_pages2(struct iov_iter *i,
+ssize_t iov_iter_get_pages(struct iov_iter *i,
 		   struct page **pages, size_t maxsize, unsigned maxpages,
-		   size_t *start)
+		   size_t *start, unsigned gup_flags)
 {
 	if (!maxpages)
 		return 0;
 	BUG_ON(!pages);
 
-	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start);
+	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages,
+					  start, gup_flags);
+}
+EXPORT_SYMBOL_GPL(iov_iter_get_pages);
+
+ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
+		size_t maxsize, unsigned maxpages, size_t *start)
+{
+	return iov_iter_get_pages(i, pages, maxsize, maxpages, start, 0);
 }
 EXPORT_SYMBOL(iov_iter_get_pages2);
 
-ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
+ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
-		   size_t *start)
+		   size_t *start, unsigned gup_flags)
 {
 	ssize_t len;
 
 	*pages = NULL;
 
-	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start);
+	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start,
+					 gup_flags);
 	if (len <= 0) {
 		kvfree(*pages);
 		*pages = NULL;
 	}
 	return len;
 }
+EXPORT_SYMBOL_GPL(iov_iter_get_pages_alloc);
+
+ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
+		struct page ***pages, size_t maxsize, size_t *start)
+{
+	return iov_iter_get_pages_alloc(i, pages, maxsize, start, 0);
+}
 EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
 
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 4/9] block: add check when merging zone device pages
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
                   ` (2 preceding siblings ...)
  2022-10-21 17:41 ` [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags() Logan Gunthorpe
@ 2022-10-21 17:41 ` Logan Gunthorpe
  2022-10-25  1:16   ` Chaitanya Kulkarni
  2022-10-21 17:41 ` [PATCH v11 5/9] lib/scatterlist: " Logan Gunthorpe
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

Consecutive zone device pages should not be merged into the same sgl
or bvec segment with other types of pages or if they belong to different
pgmaps. Otherwise getting the pgmap of a given segment is not possible
without scanning the entire segment. This helper returns true either if
both pages are not zone device pages or both pages are zone device
pages with the same pgmap.

Add a helper to determine if zone device pages are mergeable and use
this helper in page_is_mergeable().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 block/bio.c            |  2 ++
 include/linux/mmzone.h | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 633a902468ec..439469370b7c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -863,6 +863,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
 		return false;
 	if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
 		return false;
+	if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
+		return false;
 
 	*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
 	if (*same_page)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5f74891556f3..9c49ec5d0e25 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -986,6 +986,25 @@ static inline bool is_zone_device_page(const struct page *page)
 {
 	return page_zonenum(page) == ZONE_DEVICE;
 }
+
+/*
+ * Consecutive zone device pages should not be merged into the same sgl
+ * or bvec segment with other types of pages or if they belong to different
+ * pgmaps. Otherwise getting the pgmap of a given segment is not possible
+ * without scanning the entire segment. This helper returns true either if
+ * both pages are not zone device pages or both pages are zone device pages
+ * with the same pgmap.
+ */
+static inline bool zone_device_pages_have_same_pgmap(const struct page *a,
+						     const struct page *b)
+{
+	if (is_zone_device_page(a) != is_zone_device_page(b))
+		return false;
+	if (!is_zone_device_page(a))
+		return true;
+	return a->pgmap == b->pgmap;
+}
+
 extern void memmap_init_zone_device(struct zone *, unsigned long,
 				    unsigned long, struct dev_pagemap *);
 #else
@@ -993,6 +1012,11 @@ static inline bool is_zone_device_page(const struct page *page)
 {
 	return false;
 }
+static inline bool zone_device_pages_have_same_pgmap(const struct page *a,
+						     const struct page *b)
+{
+	return true;
+}
 #endif
 
 static inline bool folio_is_zone_device(const struct folio *folio)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 5/9] lib/scatterlist: add check when merging zone device pages
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
                   ` (3 preceding siblings ...)
  2022-10-21 17:41 ` [PATCH v11 4/9] block: add check when merging zone device pages Logan Gunthorpe
@ 2022-10-21 17:41 ` Logan Gunthorpe
  2022-10-25  1:19   ` Chaitanya Kulkarni
  2022-10-21 17:41 ` [PATCH v11 6/9] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages() Logan Gunthorpe
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

Consecutive zone device pages should not be merged into the same sgl
or bvec segment with other types of pages or if they belong to different
pgmaps. Otherwise getting the pgmap of a given segment is not possible
without scanning the entire segment. This helper returns true either if
both pages are not zone device pages or both pages are zone device
pages with the same pgmap.

Factor out the check for page mergability into a pages_are_mergable()
helper and add a check with zone_device_pages_are_mergeable().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 lib/scatterlist.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c8c3d675845c..a0ad2a7959b5 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -410,6 +410,15 @@ static struct scatterlist *get_next_sg(struct sg_append_table *table,
 	return new_sg;
 }
 
+static bool pages_are_mergeable(struct page *a, struct page *b)
+{
+	if (page_to_pfn(a) != page_to_pfn(b) + 1)
+		return false;
+	if (!zone_device_pages_have_same_pgmap(a, b))
+		return false;
+	return true;
+}
+
 /**
  * sg_alloc_append_table_from_pages - Allocate and initialize an append sg
  *                                    table from an array of pages
@@ -447,6 +456,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append,
 	unsigned int chunks, cur_page, seg_len, i, prv_len = 0;
 	unsigned int added_nents = 0;
 	struct scatterlist *s = sgt_append->prv;
+	struct page *last_pg;
 
 	/*
 	 * The algorithm below requires max_segment to be aligned to PAGE_SIZE
@@ -460,21 +470,17 @@ int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append,
 		return -EOPNOTSUPP;
 
 	if (sgt_append->prv) {
-		unsigned long paddr =
-			(page_to_pfn(sg_page(sgt_append->prv)) * PAGE_SIZE +
-			 sgt_append->prv->offset + sgt_append->prv->length) /
-			PAGE_SIZE;
-
 		if (WARN_ON(offset))
 			return -EINVAL;
 
 		/* Merge contiguous pages into the last SG */
 		prv_len = sgt_append->prv->length;
-		while (n_pages && page_to_pfn(pages[0]) == paddr) {
+		last_pg = sg_page(sgt_append->prv);
+		while (n_pages && pages_are_mergeable(last_pg, pages[0])) {
 			if (sgt_append->prv->length + PAGE_SIZE > max_segment)
 				break;
 			sgt_append->prv->length += PAGE_SIZE;
-			paddr++;
+			last_pg = pages[0];
 			pages++;
 			n_pages--;
 		}
@@ -488,7 +494,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append,
 	for (i = 1; i < n_pages; i++) {
 		seg_len += PAGE_SIZE;
 		if (seg_len >= max_segment ||
-		    page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) {
+		    !pages_are_mergeable(pages[i], pages[i - 1])) {
 			chunks++;
 			seg_len = 0;
 		}
@@ -504,8 +510,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append,
 		for (j = cur_page + 1; j < n_pages; j++) {
 			seg_len += PAGE_SIZE;
 			if (seg_len >= max_segment ||
-			    page_to_pfn(pages[j]) !=
-			    page_to_pfn(pages[j - 1]) + 1)
+			    !pages_are_mergeable(pages[j], pages[j - 1]))
 				break;
 		}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 6/9] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
                   ` (4 preceding siblings ...)
  2022-10-21 17:41 ` [PATCH v11 5/9] lib/scatterlist: " Logan Gunthorpe
@ 2022-10-21 17:41 ` Logan Gunthorpe
  2022-10-25  1:23   ` Chaitanya Kulkarni
  2022-10-25  1:25   ` Chaitanya Kulkarni
  2022-10-21 17:41 ` [PATCH v11 7/9] block: set FOLL_PCI_P2PDMA in bio_map_user_iov() Logan Gunthorpe
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for
iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be passed
from userspace and enables the O_DIRECT path in iomap based filesystems
and direct to block devices.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 block/bio.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 439469370b7c..a7abf9b1b66a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1197,6 +1197,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
+	unsigned int gup_flags = 0;
 	ssize_t size, left;
 	unsigned len, i = 0;
 	size_t offset, trim;
@@ -1210,6 +1211,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
+	if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
+		gup_flags |= FOLL_PCI_P2PDMA;
+
 	/*
 	 * Each segment in the iov is required to be a block size multiple.
 	 * However, we may not be able to get the entire segment if it spans
@@ -1217,8 +1221,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * result to ensure the bio's total size is correct. The remainder of
 	 * the iov data will be picked up in the next bio iteration.
 	 */
-	size = iov_iter_get_pages2(iter, pages, UINT_MAX - bio->bi_iter.bi_size,
-				  nr_pages, &offset);
+	size = iov_iter_get_pages(iter, pages,
+				  UINT_MAX - bio->bi_iter.bi_size,
+				  nr_pages, &offset, gup_flags);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 7/9] block: set FOLL_PCI_P2PDMA in bio_map_user_iov()
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
                   ` (5 preceding siblings ...)
  2022-10-21 17:41 ` [PATCH v11 6/9] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages() Logan Gunthorpe
@ 2022-10-21 17:41 ` Logan Gunthorpe
  2022-10-25  1:26   ` Chaitanya Kulkarni
  2022-10-21 17:41 ` [PATCH v11 8/9] PCI/P2PDMA: Allow userspace VMA allocations through sysfs Logan Gunthorpe
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for
iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be
passed from userspace and enables the NVMe passthru requests to
use P2PDMA pages.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 block/blk-map.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 34735626b00f..8750f82d7da4 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -267,6 +267,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 {
 	unsigned int max_sectors = queue_max_hw_sectors(rq->q);
 	unsigned int nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS);
+	unsigned int gup_flags = 0;
 	struct bio *bio;
 	int ret;
 	int j;
@@ -278,6 +279,9 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 	if (bio == NULL)
 		return -ENOMEM;
 
+	if (blk_queue_pci_p2pdma(rq->q))
+		gup_flags |= FOLL_PCI_P2PDMA;
+
 	while (iov_iter_count(iter)) {
 		struct page **pages, *stack_pages[UIO_FASTIOV];
 		ssize_t bytes;
@@ -286,11 +290,11 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 
 		if (nr_vecs <= ARRAY_SIZE(stack_pages)) {
 			pages = stack_pages;
-			bytes = iov_iter_get_pages2(iter, pages, LONG_MAX,
-							nr_vecs, &offs);
+			bytes = iov_iter_get_pages(iter, pages, LONG_MAX,
+						   nr_vecs, &offs, gup_flags);
 		} else {
-			bytes = iov_iter_get_pages_alloc2(iter, &pages,
-							LONG_MAX, &offs);
+			bytes = iov_iter_get_pages_alloc(iter, &pages,
+						LONG_MAX, &offs, gup_flags);
 		}
 		if (unlikely(bytes <= 0)) {
 			ret = bytes ? bytes : -EFAULT;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 8/9] PCI/P2PDMA: Allow userspace VMA allocations through sysfs
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
                   ` (6 preceding siblings ...)
  2022-10-21 17:41 ` [PATCH v11 7/9] block: set FOLL_PCI_P2PDMA in bio_map_user_iov() Logan Gunthorpe
@ 2022-10-21 17:41 ` Logan Gunthorpe
  2022-10-25  1:29   ` Chaitanya Kulkarni
  2022-10-25  1:34   ` Chaitanya Kulkarni
  2022-10-21 17:41 ` [PATCH v11 9/9] ABI: sysfs-bus-pci: add documentation for p2pmem allocate Logan Gunthorpe
  2022-10-24 15:03 ` [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Christoph Hellwig
  9 siblings, 2 replies; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe, Bjorn Helgaas

Create a sysfs bin attribute called "allocate" under the existing
"p2pmem" group. The only allowable operation on this file is the mmap()
call.

When mmap() is called on this attribute, the kernel allocates a chunk of
memory from the genalloc and inserts the pages into the VMA. The
dev_pagemap .page_free callback will indicate when these pages are no
longer used and they will be put back into the genalloc.

On device unbind, remove the sysfs file before the memremap_pages are
cleaned up. This ensures unmap_mapping_range() is called on the files
inode and no new mappings can be created.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/pci/p2pdma.c | 124 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 88dc66ee1c46..27539770a613 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -89,6 +89,90 @@ static ssize_t published_show(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RO(published);
 
+static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
+		struct bin_attribute *attr, struct vm_area_struct *vma)
+{
+	struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
+	size_t len = vma->vm_end - vma->vm_start;
+	struct pci_p2pdma *p2pdma;
+	struct percpu_ref *ref;
+	unsigned long vaddr;
+	void *kaddr;
+	int ret;
+
+	/* prevent private mappings from being established */
+	if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
+		pci_info_ratelimited(pdev,
+				     "%s: fail, attempted private mapping\n",
+				     current->comm);
+		return -EINVAL;
+	}
+
+	if (vma->vm_pgoff) {
+		pci_info_ratelimited(pdev,
+				     "%s: fail, attempted mapping with non-zero offset\n",
+				     current->comm);
+		return -EINVAL;
+	}
+
+	rcu_read_lock();
+	p2pdma = rcu_dereference(pdev->p2pdma);
+	if (!p2pdma) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	kaddr = (void *)gen_pool_alloc_owner(p2pdma->pool, len, (void **)&ref);
+	if (!kaddr) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * vm_insert_page() can sleep, so a reference is taken to mapping
+	 * such that rcu_read_unlock() can be done before inserting the
+	 * pages
+	 */
+	if (unlikely(!percpu_ref_tryget_live_rcu(ref))) {
+		ret = -ENODEV;
+		goto out_free_mem;
+	}
+	rcu_read_unlock();
+
+	for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
+		ret = vm_insert_page(vma, vaddr, virt_to_page(kaddr));
+		if (ret) {
+			gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
+			return ret;
+		}
+		percpu_ref_get(ref);
+		put_page(virt_to_page(kaddr));
+		kaddr += PAGE_SIZE;
+		len -= PAGE_SIZE;
+	}
+
+	percpu_ref_put(ref);
+
+	return 0;
+out_free_mem:
+	gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
+out:
+	rcu_read_unlock();
+	return ret;
+}
+
+static struct bin_attribute p2pmem_alloc_attr = {
+	.attr = { .name = "allocate", .mode = 0660 },
+	.mmap = p2pmem_alloc_mmap,
+	/*
+	 * Some places where we want to call mmap (ie. python) will check
+	 * that the file size is greater than the mmap size before allowing
+	 * the mmap to continue. To work around this, just set the size
+	 * to be very large.
+	 */
+	.size = SZ_1T,
+};
+
 static struct attribute *p2pmem_attrs[] = {
 	&dev_attr_size.attr,
 	&dev_attr_available.attr,
@@ -96,11 +180,32 @@ static struct attribute *p2pmem_attrs[] = {
 	NULL,
 };
 
+static struct bin_attribute *p2pmem_bin_attrs[] = {
+	&p2pmem_alloc_attr,
+	NULL,
+};
+
 static const struct attribute_group p2pmem_group = {
 	.attrs = p2pmem_attrs,
+	.bin_attrs = p2pmem_bin_attrs,
 	.name = "p2pmem",
 };
 
+static void p2pdma_page_free(struct page *page)
+{
+	struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap);
+	struct percpu_ref *ref;
+
+	gen_pool_free_owner(pgmap->provider->p2pdma->pool,
+			    (uintptr_t)page_to_virt(page), PAGE_SIZE,
+			    (void **)&ref);
+	percpu_ref_put(ref);
+}
+
+static const struct dev_pagemap_ops p2pdma_pgmap_ops = {
+	.page_free = p2pdma_page_free,
+};
+
 static void pci_p2pdma_release(void *data)
 {
 	struct pci_dev *pdev = data;
@@ -152,6 +257,19 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
 	return error;
 }
 
+static void pci_p2pdma_unmap_mappings(void *data)
+{
+	struct pci_dev *pdev = data;
+
+	/*
+	 * Removing the alloc attribute from sysfs will call
+	 * unmap_mapping_range() on the inode, teardown any existing userspace
+	 * mappings and prevent new ones from being created.
+	 */
+	sysfs_remove_file_from_group(&pdev->dev.kobj, &p2pmem_alloc_attr.attr,
+				     p2pmem_group.name);
+}
+
 /**
  * pci_p2pdma_add_resource - add memory for use as p2p memory
  * @pdev: the device to add the memory to
@@ -198,6 +316,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 	pgmap->range.end = pgmap->range.start + size - 1;
 	pgmap->nr_range = 1;
 	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
+	pgmap->ops = &p2pdma_pgmap_ops;
 
 	p2p_pgmap->provider = pdev;
 	p2p_pgmap->bus_offset = pci_bus_address(pdev, bar) -
@@ -209,6 +328,11 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 		goto pgmap_free;
 	}
 
+	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings,
+					 pdev);
+	if (error)
+		goto pages_free;
+
 	p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
 	error = gen_pool_add_owner(p2pdma->pool, (unsigned long)addr,
 			pci_bus_address(pdev, bar) + offset,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v11 9/9] ABI: sysfs-bus-pci: add documentation for p2pmem allocate
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
                   ` (7 preceding siblings ...)
  2022-10-21 17:41 ` [PATCH v11 8/9] PCI/P2PDMA: Allow userspace VMA allocations through sysfs Logan Gunthorpe
@ 2022-10-21 17:41 ` Logan Gunthorpe
  2022-10-25  1:29   ` Chaitanya Kulkarni
  2022-10-24 15:03 ` [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Christoph Hellwig
  9 siblings, 1 reply; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-21 17:41 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

Add documentation for the p2pmem/allocate binary file which allows
for allocating p2pmem buffers in userspace for passing to drivers
that support them. (Currently only O_DIRECT to NVMe devices.)

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ABI/testing/sysfs-bus-pci | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 840727fc75dc..ecf47559f495 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -407,6 +407,16 @@ Description:
 	        file contains a '1' if the memory has been published for
 		use outside the driver that owns the device.
 
+What:		/sys/bus/pci/devices/.../p2pmem/allocate
+Date:		August 2022
+Contact:	Logan Gunthorpe <logang@deltatee.com>
+Description:
+		This file allows mapping p2pmem into userspace. For each
+		mmap() call on this file, the kernel will allocate a chunk
+		of Peer-to-Peer memory for use in Peer-to-Peer transactions.
+		This memory can be used in O_DIRECT calls to NVMe backed
+		files for Peer-to-Peer copies.
+
 What:		/sys/bus/pci/devices/.../link/clkpm
 		/sys/bus/pci/devices/.../link/l0s_aspm
 		/sys/bus/pci/devices/.../link/l1_aspm
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page()
  2022-10-21 17:41 ` [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page() Logan Gunthorpe
@ 2022-10-24 15:00   ` Christoph Hellwig
  2022-10-24 16:37   ` Dan Williams
  2022-10-25  1:06   ` Chaitanya Kulkarni
  2 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2022-10-24 15:00 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

On Fri, Oct 21, 2022 at 11:41:08AM -0600, Logan Gunthorpe wrote:
> In order to add checks for P2PDMA memory into try_grab_page(), expand
> the error return from a bool to an int/error code. Update all the
> callsites handle change in usage.
> 
> Also remove the WARN_ON_ONCE() call at the callsites seeing there
> already is a WARN_ON_ONCE() inside the function if it fails.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/9] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
  2022-10-21 17:41 ` [PATCH v11 2/9] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages Logan Gunthorpe
@ 2022-10-24 15:00   ` Christoph Hellwig
  2022-10-25  1:09   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2022-10-24 15:00 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices
  2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
                   ` (8 preceding siblings ...)
  2022-10-21 17:41 ` [PATCH v11 9/9] ABI: sysfs-bus-pci: add documentation for p2pmem allocate Logan Gunthorpe
@ 2022-10-24 15:03 ` Christoph Hellwig
  2022-10-24 19:15   ` John Hubbard
  9 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-10-24 15:03 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, viro

The series looks good to me know. How do we want to handle it?  I think
we need a special branch somewhere (maybe in the block or mm trees?)
so that we can base the other iov_iter work from John on it.  Also
Al has a whole bunch of iov_iter changes that we probably want on
the same branch as well, although some of those (READ vs WRITE fixups)
look like 6.1 material to me.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page()
  2022-10-21 17:41 ` [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page() Logan Gunthorpe
  2022-10-24 15:00   ` Christoph Hellwig
@ 2022-10-24 16:37   ` Dan Williams
  2022-10-25  1:06   ` Chaitanya Kulkarni
  2 siblings, 0 replies; 34+ messages in thread
From: Dan Williams @ 2022-10-24 16:37 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Logan Gunthorpe

Logan Gunthorpe wrote:
> In order to add checks for P2PDMA memory into try_grab_page(), expand
> the error return from a bool to an int/error code. Update all the
> callsites handle change in usage.
> 
> Also remove the WARN_ON_ONCE() call at the callsites seeing there
> already is a WARN_ON_ONCE() inside the function if it fails.

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices
  2022-10-24 15:03 ` [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Christoph Hellwig
@ 2022-10-24 19:15   ` John Hubbard
  2022-11-08  6:56     ` Christoph Hellwig
  0 siblings, 1 reply; 34+ messages in thread
From: John Hubbard @ 2022-10-24 19:15 UTC (permalink / raw)
  To: Christoph Hellwig, Logan Gunthorpe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	Greg Kroah-Hartman, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Minturn Dave B, Jason Ekstrand, Dave Hansen, Xiong Jianxin,
	Bjorn Helgaas, Ira Weiny, Robin Murphy, Martin Oliveira,
	Chaitanya Kulkarni, Ralph Campbell, Stephen Bates, viro

On 10/24/22 08:03, Christoph Hellwig wrote:
> The series looks good to me know. How do we want to handle it?  I think
> we need a special branch somewhere (maybe in the block or mm trees?)
> so that we can base the other iov_iter work from John on it.  Also
> Al has a whole bunch of iov_iter changes that we probably want on
> the same branch as well, although some of those (READ vs WRITE fixups)
> look like 6.1 material to me.
> 

A little earlier, Jens graciously offered [1] to provide a topic branch,
such as:

     for-6.2/block-gup [2]

(I've moved the name forward from 6.1 to 6.2, because that discussion
was 7 weeks ago.)


[1] https://lore.kernel.org/ae675a01-90e6-4af1-6c43-660b3a6c7b72@kernel.dk
[2] https://lore.kernel.org/55a2d67f-9a12-9fe6-d73b-8c3f5eb36f31@kernel.dk

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page()
  2022-10-21 17:41 ` [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page() Logan Gunthorpe
  2022-10-24 15:00   ` Christoph Hellwig
  2022-10-24 16:37   ` Dan Williams
@ 2022-10-25  1:06   ` Chaitanya Kulkarni
  2 siblings, 0 replies; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:06 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

On 10/21/22 10:41, Logan Gunthorpe wrote:
> In order to add checks for P2PDMA memory into try_grab_page(), expand
> the error return from a bool to an int/error code. Update all the
> callsites handle change in usage.
> 
> Also remove the WARN_ON_ONCE() call at the callsites seeing there
> already is a WARN_ON_ONCE() inside the function if it fails.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 2/9] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
  2022-10-21 17:41 ` [PATCH v11 2/9] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages Logan Gunthorpe
  2022-10-24 15:00   ` Christoph Hellwig
@ 2022-10-25  1:09   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:09 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

On 10/21/22 10:41, Logan Gunthorpe wrote:
> GUP Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to
> allow obtaining P2PDMA pages. If GUP is called without the flag and a
> P2PDMA page is found, it will return an error in try_grab_page() or
> try_grab_folio().
> 
> The check is safe to do before taking the reference to the page in both
> cases seeing the page should be protected by either the appropriate
> ptl or mmap_lock; or the gup fast guarantees preventing TLB flushes.
> 
> try_grab_folio() has one call site that WARNs on failure and cannot
> actually deal with the failure of this function (it seems it will
> get into an infinite loop). Expand the comment there to document a
> couple more conditions on why it will not fail.
> 
> FOLL_PCI_P2PDMA cannot be set if FOLL_LONGTERM is set. This is to copy
> fsdax until pgmap refcounts are fixed (see the link below for more
> information).
> 
> Link: https://lkml.kernel.org/r/Yy4Ot5MoOhsgYLTQ@ziepe.ca
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
  2022-10-21 17:41 ` [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags() Logan Gunthorpe
@ 2022-10-25  1:14   ` Chaitanya Kulkarni
  2022-10-25 15:35     ` Logan Gunthorpe
  2022-10-27  7:11   ` Jay Fang
  1 sibling, 1 reply; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:14 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

On 10/21/22 10:41, Logan Gunthorpe wrote:
> Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
> which take a flags argument that is passed to get_user_pages_fast().
> 
> This is so that FOLL_PCI_P2PDMA can be passed when appropriate.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/uio.h |  6 ++++++
>   lib/iov_iter.c      | 32 ++++++++++++++++++++++++--------
>   2 files changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 2e3134b14ffd..9ede533ce64c 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -247,8 +247,14 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode
>   void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
>   void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
>   		     loff_t start, size_t count);
> +ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
> +		size_t maxsize, unsigned maxpages, size_t *start,
> +		unsigned gup_flags);
>   ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
>   			size_t maxsize, unsigned maxpages, size_t *start);
> +ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
> +		struct page ***pages, size_t maxsize, size_t *start,
> +		unsigned gup_flags);
>   ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
>   			size_t maxsize, size_t *start);
>   int iov_iter_npages(const struct iov_iter *i, int maxpages);
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index c3ca28ca68a6..53efad017f3c 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1430,7 +1430,8 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
>   
>   static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>   		   struct page ***pages, size_t maxsize,
> -		   unsigned int maxpages, size_t *start)
> +		   unsigned int maxpages, size_t *start,
> +		   unsigned int gup_flags)
>   {
>   	unsigned int n;
>   
> @@ -1442,7 +1443,6 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>   		maxsize = MAX_RW_COUNT;
>   
>   	if (likely(user_backed_iter(i))) {
> -		unsigned int gup_flags = 0;
>   		unsigned long addr;
>   		int res;
>   
> @@ -1492,33 +1492,49 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>   	return -EFAULT;
>   }
>   
> -ssize_t iov_iter_get_pages2(struct iov_iter *i,
> +ssize_t iov_iter_get_pages(struct iov_iter *i,
>   		   struct page **pages, size_t maxsize, unsigned maxpages,
> -		   size_t *start)
> +		   size_t *start, unsigned gup_flags)
>   {
>   	if (!maxpages)
>   		return 0;
>   	BUG_ON(!pages);
>   
> -	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start);
> +	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages,
> +					  start, gup_flags);
> +}
> +EXPORT_SYMBOL_GPL(iov_iter_get_pages);
> +
> +ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
> +		size_t maxsize, unsigned maxpages, size_t *start)
> +{
> +	return iov_iter_get_pages(i, pages, maxsize, maxpages, start, 0);
>   }
>   EXPORT_SYMBOL(iov_iter_get_pages2);
>   
> -ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
> +ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
>   		   struct page ***pages, size_t maxsize,
> -		   size_t *start)
> +		   size_t *start, unsigned gup_flags)
>   {
>   	ssize_t len;
>   
>   	*pages = NULL;
>   
> -	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start);
> +	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start,
> +					 gup_flags);
>   	if (len <= 0) {
>   		kvfree(*pages);
>   		*pages = NULL;
>   	}
>   	return len;
>   }
> +EXPORT_SYMBOL_GPL(iov_iter_get_pages_alloc);
> +
> +ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
> +		struct page ***pages, size_t maxsize, size_t *start)
> +{
> +	return iov_iter_get_pages_alloc(i, pages, maxsize, start, 0);
> +}
>   EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
Just one minor question why not make following functions
EXPORT_SYMBOL_GPL() ?

1. iov_iter_get_pages2()
2. iov_iter_get_pages_alloc2()

Reviewed-by: Chaitanya Kukkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 4/9] block: add check when merging zone device pages
  2022-10-21 17:41 ` [PATCH v11 4/9] block: add check when merging zone device pages Logan Gunthorpe
@ 2022-10-25  1:16   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:16 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

On 10/21/22 10:41, Logan Gunthorpe wrote:
> Consecutive zone device pages should not be merged into the same sgl
> or bvec segment with other types of pages or if they belong to different
> pgmaps. Otherwise getting the pgmap of a given segment is not possible
> without scanning the entire segment. This helper returns true either if
> both pages are not zone device pages or both pages are zone device
> pages with the same pgmap.
> 
> Add a helper to determine if zone device pages are mergeable and use
> this helper in page_is_mergeable().
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> ---
>   block/bio.c            |  2 ++
>   include/linux/mmzone.h | 24 ++++++++++++++++++++++++
>   2 files changed, 26 insertions(+)
> 

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 5/9] lib/scatterlist: add check when merging zone device pages
  2022-10-21 17:41 ` [PATCH v11 5/9] lib/scatterlist: " Logan Gunthorpe
@ 2022-10-25  1:19   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:19 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

On 10/21/22 10:41, Logan Gunthorpe wrote:
> Consecutive zone device pages should not be merged into the same sgl
> or bvec segment with other types of pages or if they belong to different
> pgmaps. Otherwise getting the pgmap of a given segment is not possible
> without scanning the entire segment. This helper returns true either if
> both pages are not zone device pages or both pages are zone device
> pages with the same pgmap.
> 
> Factor out the check for page mergability into a pages_are_mergable()
> helper and add a check with zone_device_pages_are_mergeable().
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   lib/scatterlist.c | 25 +++++++++++++++----------
>   1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> index c8c3d675845c..a0ad2a7959b5 100644
> --- a/lib/scatterlist.c
> +++ b/lib/scatterlist.c
> @@ -410,6 +410,15 @@ static struct scatterlist *get_next_sg(struct sg_append_table *table,
>   	return new_sg;
>   }
>   
> +static bool pages_are_mergeable(struct page *a, struct page *b)
> +{
> +	if (page_to_pfn(a) != page_to_pfn(b) + 1)
> +		return false;
> +	if (!zone_device_pages_have_same_pgmap(a, b))
> +		return false;
> +	return true;
> +}
> +


not sure if it makes sense to make it inline ? either way,

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 6/9] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
  2022-10-21 17:41 ` [PATCH v11 6/9] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages() Logan Gunthorpe
@ 2022-10-25  1:23   ` Chaitanya Kulkarni
  2022-10-25 15:37     ` Logan Gunthorpe
  2022-10-25  1:25   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:23 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

/*
>   	 * Each segment in the iov is required to be a block size multiple.
>   	 * However, we may not be able to get the entire segment if it spans
> @@ -1217,8 +1221,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>   	 * result to ensure the bio's total size is correct. The remainder of
>   	 * the iov data will be picked up in the next bio iteration.
>   	 */
> -	size = iov_iter_get_pages2(iter, pages, UINT_MAX - bio->bi_iter.bi_size,
> -				  nr_pages, &offset);
> +	size = iov_iter_get_pages(iter, pages,
> +				  UINT_MAX - bio->bi_iter.bi_size,
> +				  nr_pages, &offset, gup_flags);

nit, 3rd param in above call fits on the first line ? plz check :-

iov_iter_get_pages(iter, pages, UINT_MAX - bio->bi_iter.bi_size,
                    nr_pages, &offset, gup_flags);

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 6/9] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
  2022-10-21 17:41 ` [PATCH v11 6/9] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages() Logan Gunthorpe
  2022-10-25  1:23   ` Chaitanya Kulkarni
@ 2022-10-25  1:25   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:25 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

  * Each segment in the iov is required to be a block size multiple.
>   	 * However, we may not be able to get the entire segment if it spans
> @@ -1217,8 +1221,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>   	 * result to ensure the bio's total size is correct. The remainder of
>   	 * the iov data will be picked up in the next bio iteration.
>   	 */
> -	size = iov_iter_get_pages2(iter, pages, UINT_MAX - bio->bi_iter.bi_size,
> -				  nr_pages, &offset);
> +	size = iov_iter_get_pages(iter, pages,
> +				  UINT_MAX - bio->bi_iter.bi_size,
> +				  nr_pages, &offset, gup_flags);

nit:-
3rd parameter in the above call fits on the 1st line? plz check

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 7/9] block: set FOLL_PCI_P2PDMA in bio_map_user_iov()
  2022-10-21 17:41 ` [PATCH v11 7/9] block: set FOLL_PCI_P2PDMA in bio_map_user_iov() Logan Gunthorpe
@ 2022-10-25  1:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:26 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

On 10/21/22 10:41, Logan Gunthorpe wrote:
> When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for
> iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be
> passed from userspace and enables the NVMe passthru requests to
> use P2PDMA pages.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> ---
>   block/blk-map.c | 12 ++++++++----

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 8/9] PCI/P2PDMA: Allow userspace VMA allocations through sysfs
  2022-10-21 17:41 ` [PATCH v11 8/9] PCI/P2PDMA: Allow userspace VMA allocations through sysfs Logan Gunthorpe
@ 2022-10-25  1:29   ` Chaitanya Kulkarni
  2022-10-25  1:34   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:29 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Bjorn Helgaas

On 10/21/22 10:41, Logan Gunthorpe wrote:
> Create a sysfs bin attribute called "allocate" under the existing
> "p2pmem" group. The only allowable operation on this file is the mmap()
> call.
> 
> When mmap() is called on this attribute, the kernel allocates a chunk of
> memory from the genalloc and inserts the pages into the VMA. The
> dev_pagemap .page_free callback will indicate when these pages are no
> longer used and they will be put back into the genalloc.
> 
> On device unbind, remove the sysfs file before the memremap_pages are
> cleaned up. This ensures unmap_mapping_range() is called on the files
> inode and no new mappings can be created.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 9/9] ABI: sysfs-bus-pci: add documentation for p2pmem allocate
  2022-10-21 17:41 ` [PATCH v11 9/9] ABI: sysfs-bus-pci: add documentation for p2pmem allocate Logan Gunthorpe
@ 2022-10-25  1:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:29 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

On 10/21/22 10:41, Logan Gunthorpe wrote:
> Add documentation for the p2pmem/allocate binary file which allows
> for allocating p2pmem buffers in userspace for passing to drivers
> that support them. (Currently only O_DIRECT to NVMe devices.)
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 8/9] PCI/P2PDMA: Allow userspace VMA allocations through sysfs
  2022-10-21 17:41 ` [PATCH v11 8/9] PCI/P2PDMA: Allow userspace VMA allocations through sysfs Logan Gunthorpe
  2022-10-25  1:29   ` Chaitanya Kulkarni
@ 2022-10-25  1:34   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 34+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  1:34 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, Bjorn Helgaas

On 10/21/22 10:41, Logan Gunthorpe wrote:
> Create a sysfs bin attribute called "allocate" under the existing
> "p2pmem" group. The only allowable operation on this file is the mmap()
> call.
> 
> When mmap() is called on this attribute, the kernel allocates a chunk of
> memory from the genalloc and inserts the pages into the VMA. The
> dev_pagemap .page_free callback will indicate when these pages are no
> longer used and they will be put back into the genalloc.
> 
> On device unbind, remove the sysfs file before the memremap_pages are
> cleaned up. This ensures unmap_mapping_range() is called on the files
> inode and no new mappings can be created.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>   drivers/pci/p2pdma.c | 124 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 124 insertions(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 88dc66ee1c46..27539770a613 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -89,6 +89,90 @@ static ssize_t published_show(struct device *dev, struct device_attribute *attr,
>   }
>   static DEVICE_ATTR_RO(published);
>   
> +static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
> +		struct bin_attribute *attr, struct vm_area_struct *vma)
> +{
> +	struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
> +	size_t len = vma->vm_end - vma->vm_start;
> +	struct pci_p2pdma *p2pdma;
> +	struct percpu_ref *ref;
> +	unsigned long vaddr;
> +	void *kaddr;
> +	int ret;
> +
> +	/* prevent private mappings from being established */
> +	if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
> +		pci_info_ratelimited(pdev,
> +				     "%s: fail, attempted private mapping\n",
> +				     current->comm);
> +		return -EINVAL;
> +	}
> +
> +	if (vma->vm_pgoff) {
> +		pci_info_ratelimited(pdev,
> +				     "%s: fail, attempted mapping with non-zero offset\n",
> +				     current->comm);
> +		return -EINVAL;
> +	}
> +
> +	rcu_read_lock();
> +	p2pdma = rcu_dereference(pdev->p2pdma);
> +	if (!p2pdma) {
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	kaddr = (void *)gen_pool_alloc_owner(p2pdma->pool, len, (void **)&ref);
> +	if (!kaddr) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	/*
> +	 * vm_insert_page() can sleep, so a reference is taken to mapping
> +	 * such that rcu_read_unlock() can be done before inserting the
> +	 * pages
> +	 */
> +	if (unlikely(!percpu_ref_tryget_live_rcu(ref))) {
> +		ret = -ENODEV;
> +		goto out_free_mem;
> +	}
> +	rcu_read_unlock();
> +
> +	for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
> +		ret = vm_insert_page(vma, vaddr, virt_to_page(kaddr));
> +		if (ret) {
> +			gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
> +			return ret;
> +		}
> +		percpu_ref_get(ref);
> +		put_page(virt_to_page(kaddr));
> +		kaddr += PAGE_SIZE;
> +		len -= PAGE_SIZE;
> +	}
> +
> +	percpu_ref_put(ref);
> +
> +	return 0;
> +out_free_mem:
> +	gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
> +out:
> +	rcu_read_unlock();
> +	return ret;
> +}
> +
> +static struct bin_attribute p2pmem_alloc_attr = {
> +	.attr = { .name = "allocate", .mode = 0660 },
> +	.mmap = p2pmem_alloc_mmap,
> +	/*
> +	 * Some places where we want to call mmap (ie. python) will check
> +	 * that the file size is greater than the mmap size before allowing
> +	 * the mmap to continue. To work around this, just set the size
> +	 * to be very large.
> +	 */
> +	.size = SZ_1T,
> +};
> +
>   static struct attribute *p2pmem_attrs[] = {
>   	&dev_attr_size.attr,
>   	&dev_attr_available.attr,
> @@ -96,11 +180,32 @@ static struct attribute *p2pmem_attrs[] = {
>   	NULL,
>   };
>   
> +static struct bin_attribute *p2pmem_bin_attrs[] = {
> +	&p2pmem_alloc_attr,
> +	NULL,
> +};
> +
>   static const struct attribute_group p2pmem_group = {
>   	.attrs = p2pmem_attrs,
> +	.bin_attrs = p2pmem_bin_attrs,
>   	.name = "p2pmem",
>   };
>   
> +static void p2pdma_page_free(struct page *page)
> +{
> +	struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap);
> +	struct percpu_ref *ref;
> +
> +	gen_pool_free_owner(pgmap->provider->p2pdma->pool,
> +			    (uintptr_t)page_to_virt(page), PAGE_SIZE,
> +			    (void **)&ref);
> +	percpu_ref_put(ref);
> +}
> +
> +static const struct dev_pagemap_ops p2pdma_pgmap_ops = {
> +	.page_free = p2pdma_page_free,
> +};
> +
>   static void pci_p2pdma_release(void *data)
>   {
>   	struct pci_dev *pdev = data;
> @@ -152,6 +257,19 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
>   	return error;
>   }
>   
> +static void pci_p2pdma_unmap_mappings(void *data)
> +{
> +	struct pci_dev *pdev = data;
> +
> +	/*
> +	 * Removing the alloc attribute from sysfs will call
> +	 * unmap_mapping_range() on the inode, teardown any existing userspace
> +	 * mappings and prevent new ones from being created.
> +	 */
> +	sysfs_remove_file_from_group(&pdev->dev.kobj, &p2pmem_alloc_attr.attr,
> +				     p2pmem_group.name);
> +}
> +
>   /**
>    * pci_p2pdma_add_resource - add memory for use as p2p memory
>    * @pdev: the device to add the memory to
> @@ -198,6 +316,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
>   	pgmap->range.end = pgmap->range.start + size - 1;
>   	pgmap->nr_range = 1;
>   	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
> +	pgmap->ops = &p2pdma_pgmap_ops;
>   
>   	p2p_pgmap->provider = pdev;
>   	p2p_pgmap->bus_offset = pci_bus_address(pdev, bar) -
> @@ -209,6 +328,11 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
>   		goto pgmap_free;
>   	}
>   
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings,
> +					 pdev);
> +	if (error)
> +		goto pages_free;
> +
>   	p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
>   	error = gen_pool_add_owner(p2pdma->pool, (unsigned long)addr,
>   			pci_bus_address(pdev, bar) + offset,

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
  2022-10-25  1:14   ` Chaitanya Kulkarni
@ 2022-10-25 15:35     ` Logan Gunthorpe
  2022-10-25 15:41       ` Christoph Hellwig
  0 siblings, 1 reply; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-25 15:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates



On 2022-10-24 19:14, Chaitanya Kulkarni wrote:
> On 10/21/22 10:41, Logan Gunthorpe wrote:
>> Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
>> which take a flags argument that is passed to get_user_pages_fast().
>>
>> This is so that FOLL_PCI_P2PDMA can be passed when appropriate.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   include/linux/uio.h |  6 ++++++
>>   lib/iov_iter.c      | 32 ++++++++++++++++++++++++--------
>>   2 files changed, 30 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/uio.h b/include/linux/uio.h
>> index 2e3134b14ffd..9ede533ce64c 100644
>> --- a/include/linux/uio.h
>> +++ b/include/linux/uio.h
>> @@ -247,8 +247,14 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode
>>   void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
>>   void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
>>   		     loff_t start, size_t count);
>> +ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
>> +		size_t maxsize, unsigned maxpages, size_t *start,
>> +		unsigned gup_flags);
>>   ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
>>   			size_t maxsize, unsigned maxpages, size_t *start);
>> +ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
>> +		struct page ***pages, size_t maxsize, size_t *start,
>> +		unsigned gup_flags);
>>   ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
>>   			size_t maxsize, size_t *start);
>>   int iov_iter_npages(const struct iov_iter *i, int maxpages);
>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>> index c3ca28ca68a6..53efad017f3c 100644
>> --- a/lib/iov_iter.c
>> +++ b/lib/iov_iter.c
>> @@ -1430,7 +1430,8 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
>>   
>>   static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>>   		   struct page ***pages, size_t maxsize,
>> -		   unsigned int maxpages, size_t *start)
>> +		   unsigned int maxpages, size_t *start,
>> +		   unsigned int gup_flags)
>>   {
>>   	unsigned int n;
>>   
>> @@ -1442,7 +1443,6 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>>   		maxsize = MAX_RW_COUNT;
>>   
>>   	if (likely(user_backed_iter(i))) {
>> -		unsigned int gup_flags = 0;
>>   		unsigned long addr;
>>   		int res;
>>   
>> @@ -1492,33 +1492,49 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>>   	return -EFAULT;
>>   }
>>   
>> -ssize_t iov_iter_get_pages2(struct iov_iter *i,
>> +ssize_t iov_iter_get_pages(struct iov_iter *i,
>>   		   struct page **pages, size_t maxsize, unsigned maxpages,
>> -		   size_t *start)
>> +		   size_t *start, unsigned gup_flags)
>>   {
>>   	if (!maxpages)
>>   		return 0;
>>   	BUG_ON(!pages);
>>   
>> -	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start);
>> +	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages,
>> +					  start, gup_flags);
>> +}
>> +EXPORT_SYMBOL_GPL(iov_iter_get_pages);
>> +
>> +ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
>> +		size_t maxsize, unsigned maxpages, size_t *start)
>> +{
>> +	return iov_iter_get_pages(i, pages, maxsize, maxpages, start, 0);
>>   }
>>   EXPORT_SYMBOL(iov_iter_get_pages2);
>>   
>> -ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
>> +ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
>>   		   struct page ***pages, size_t maxsize,
>> -		   size_t *start)
>> +		   size_t *start, unsigned gup_flags)
>>   {
>>   	ssize_t len;
>>   
>>   	*pages = NULL;
>>   
>> -	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start);
>> +	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start,
>> +					 gup_flags);
>>   	if (len <= 0) {
>>   		kvfree(*pages);
>>   		*pages = NULL;
>>   	}
>>   	return len;
>>   }
>> +EXPORT_SYMBOL_GPL(iov_iter_get_pages_alloc);
>> +
>> +ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
>> +		struct page ***pages, size_t maxsize, size_t *start)
>> +{
>> +	return iov_iter_get_pages_alloc(i, pages, maxsize, start, 0);
>> +}
>>   EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
> Just one minor question why not make following functions
> EXPORT_SYMBOL_GPL() ?
> 
> 1. iov_iter_get_pages2()
> 2. iov_iter_get_pages_alloc2()

They previously were not GPL, so I didn't think that should be changed
in this patch.

Thanks for the review!

Logan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 6/9] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
  2022-10-25  1:23   ` Chaitanya Kulkarni
@ 2022-10-25 15:37     ` Logan Gunthorpe
  0 siblings, 0 replies; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-25 15:37 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates



On 2022-10-24 19:23, Chaitanya Kulkarni wrote:
> /*
>>   	 * Each segment in the iov is required to be a block size multiple.
>>   	 * However, we may not be able to get the entire segment if it spans
>> @@ -1217,8 +1221,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>>   	 * result to ensure the bio's total size is correct. The remainder of
>>   	 * the iov data will be picked up in the next bio iteration.
>>   	 */
>> -	size = iov_iter_get_pages2(iter, pages, UINT_MAX - bio->bi_iter.bi_size,
>> -				  nr_pages, &offset);
>> +	size = iov_iter_get_pages(iter, pages,
>> +				  UINT_MAX - bio->bi_iter.bi_size,
>> +				  nr_pages, &offset, gup_flags);
> 
> nit, 3rd param in above call fits on the first line ? plz check :-
> 
> iov_iter_get_pages(iter, pages, UINT_MAX - bio->bi_iter.bi_size,
>                     nr_pages, &offset, gup_flags);

Oh, yup, this just fits. I'll queue up the fix for if I send v12.

Logan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
  2022-10-25 15:35     ` Logan Gunthorpe
@ 2022-10-25 15:41       ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2022-10-25 15:41 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Chaitanya Kulkarni, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm, Christoph Hellwig, Greg Kroah-Hartman,
	Dan Williams, Jason Gunthorpe, Christian König,
	John Hubbard, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Minturn Dave B, Jason Ekstrand, Dave Hansen, Xiong Jianxin,
	Bjorn Helgaas, Ira Weiny, Robin Murphy, Martin Oliveira,
	Chaitanya Kulkarni, Ralph Campbell, Stephen Bates

> > Just one minor question why not make following functions
> > EXPORT_SYMBOL_GPL() ?
> > 
> > 1. iov_iter_get_pages2()
> > 2. iov_iter_get_pages_alloc2()
> 
> They previously were not GPL, so I didn't think that should be changed
> in this patch.

Yes.  While they should have been _GPL from the start, rocking that
boat is a bit pointless now.  We just need to make sure to do the
right thing for the pinning variants that are going to replace them soon.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
  2022-10-21 17:41 ` [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags() Logan Gunthorpe
  2022-10-25  1:14   ` Chaitanya Kulkarni
@ 2022-10-27  7:11   ` Jay Fang
  2022-10-27 14:22     ` Logan Gunthorpe
  1 sibling, 1 reply; 34+ messages in thread
From: Jay Fang @ 2022-10-27  7:11 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, linux-block,
	linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates

On 2022/10/22 1:41, Logan Gunthorpe wrote:
> Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
> which take a flags argument that is passed to get_user_pages_fast().
> 
> This is so that FOLL_PCI_P2PDMA can be passed when appropriate.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/uio.h |  6 ++++++
>  lib/iov_iter.c      | 32 ++++++++++++++++++++++++--------
>  2 files changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 2e3134b14ffd..9ede533ce64c 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -247,8 +247,14 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode
>  void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
>  void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
>  		     loff_t start, size_t count);
> +ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
> +		size_t maxsize, unsigned maxpages, size_t *start,
> +		unsigned gup_flags);
>  ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
>  			size_t maxsize, unsigned maxpages, size_t *start);
> +ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
> +		struct page ***pages, size_t maxsize, size_t *start,
> +		unsigned gup_flags);
>  ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
>  			size_t maxsize, size_t *start);
>  int iov_iter_npages(const struct iov_iter *i, int maxpages);
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index c3ca28ca68a6..53efad017f3c 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1430,7 +1430,8 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
>  
>  static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>  		   struct page ***pages, size_t maxsize,
> -		   unsigned int maxpages, size_t *start)
> +		   unsigned int maxpages, size_t *start,
> +		   unsigned int gup_flags)

Hi,
found some checkpatch warnings, like this:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#50: FILE: lib/iov_iter.c:1497:
+		   size_t *start, unsigned gup_flags)

>  {
>  	unsigned int n;
>  
> @@ -1442,7 +1443,6 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>  		maxsize = MAX_RW_COUNT;
>  
>  	if (likely(user_backed_iter(i))) {
> -		unsigned int gup_flags = 0;
>  		unsigned long addr;
>  		int res;
>  
> @@ -1492,33 +1492,49 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>  	return -EFAULT;
>  }
>  
> -ssize_t iov_iter_get_pages2(struct iov_iter *i,
> +ssize_t iov_iter_get_pages(struct iov_iter *i,
>  		   struct page **pages, size_t maxsize, unsigned maxpages,
> -		   size_t *start)
> +		   size_t *start, unsigned gup_flags)
>  {
>  	if (!maxpages)
>  		return 0;
>  	BUG_ON(!pages);
>  
> -	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start);
> +	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages,
> +					  start, gup_flags);
> +}
> +EXPORT_SYMBOL_GPL(iov_iter_get_pages);
> +
> +ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
> +		size_t maxsize, unsigned maxpages, size_t *start)
> +{
> +	return iov_iter_get_pages(i, pages, maxsize, maxpages, start, 0);
>  }
>  EXPORT_SYMBOL(iov_iter_get_pages2);
>  
> -ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
> +ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
>  		   struct page ***pages, size_t maxsize,
> -		   size_t *start)
> +		   size_t *start, unsigned gup_flags)
>  {
>  	ssize_t len;
>  
>  	*pages = NULL;
>  
> -	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start);
> +	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start,
> +					 gup_flags);
>  	if (len <= 0) {
>  		kvfree(*pages);
>  		*pages = NULL;
>  	}
>  	return len;
>  }
> +EXPORT_SYMBOL_GPL(iov_iter_get_pages_alloc);
> +
> +ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
> +		struct page ***pages, size_t maxsize, size_t *start)
> +{
> +	return iov_iter_get_pages_alloc(i, pages, maxsize, start, 0);
> +}
>  EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
>  
>  size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
  2022-10-27  7:11   ` Jay Fang
@ 2022-10-27 14:22     ` Logan Gunthorpe
  0 siblings, 0 replies; 34+ messages in thread
From: Logan Gunthorpe @ 2022-10-27 14:22 UTC (permalink / raw)
  To: Jay Fang, linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm
  Cc: Christoph Hellwig, Greg Kroah-Hartman, Dan Williams,
	Jason Gunthorpe, Christian König, John Hubbard, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates



On 2022-10-27 01:11, Jay Fang wrote:
> On 2022/10/22 1:41, Logan Gunthorpe wrote:
>> Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
>> which take a flags argument that is passed to get_user_pages_fast().
>>
>> This is so that FOLL_PCI_P2PDMA can be passed when appropriate.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  include/linux/uio.h |  6 ++++++
>>  lib/iov_iter.c      | 32 ++++++++++++++++++++++++--------
>>  2 files changed, 30 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/uio.h b/include/linux/uio.h
>> index 2e3134b14ffd..9ede533ce64c 100644
>> --- a/include/linux/uio.h
>> +++ b/include/linux/uio.h
>> @@ -247,8 +247,14 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode
>>  void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
>>  void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
>>  		     loff_t start, size_t count);
>> +ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
>> +		size_t maxsize, unsigned maxpages, size_t *start,
>> +		unsigned gup_flags);
>>  ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
>>  			size_t maxsize, unsigned maxpages, size_t *start);
>> +ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
>> +		struct page ***pages, size_t maxsize, size_t *start,
>> +		unsigned gup_flags);
>>  ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
>>  			size_t maxsize, size_t *start);
>>  int iov_iter_npages(const struct iov_iter *i, int maxpages);
>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>> index c3ca28ca68a6..53efad017f3c 100644
>> --- a/lib/iov_iter.c
>> +++ b/lib/iov_iter.c
>> @@ -1430,7 +1430,8 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
>>  
>>  static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
>>  		   struct page ***pages, size_t maxsize,
>> -		   unsigned int maxpages, size_t *start)
>> +		   unsigned int maxpages, size_t *start,
>> +		   unsigned int gup_flags)
> 
> Hi,
> found some checkpatch warnings, like this:
> WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
> #50: FILE: lib/iov_iter.c:1497:
> +		   size_t *start, unsigned gup_flags)

We usually stick with the choices of the nearby code instead of
the warnings of checkpatch.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices
  2022-10-24 19:15   ` John Hubbard
@ 2022-11-08  6:56     ` Christoph Hellwig
  2022-11-09 17:28       ` Logan Gunthorpe
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-11-08  6:56 UTC (permalink / raw)
  To: John Hubbard
  Cc: Christoph Hellwig, Logan Gunthorpe, linux-kernel, linux-nvme,
	linux-block, linux-pci, linux-mm, Greg Kroah-Hartman,
	Dan Williams, Jason Gunthorpe, Christian König, Don Dutile,
	Matthew Wilcox, Daniel Vetter, Minturn Dave B, Jason Ekstrand,
	Dave Hansen, Xiong Jianxin, Bjorn Helgaas, Ira Weiny,
	Robin Murphy, Martin Oliveira, Chaitanya Kulkarni,
	Ralph Campbell, Stephen Bates, viro

On Mon, Oct 24, 2022 at 12:15:56PM -0700, John Hubbard wrote:
> A little earlier, Jens graciously offered [1] to provide a topic branch,
> such as:
>
>     for-6.2/block-gup [2]
>
> (I've moved the name forward from 6.1 to 6.2, because that discussion
> was 7 weeks ago.)

So what are we going to do with this series?  It would be sad to miss
the merge window again.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices
  2022-11-08  6:56     ` Christoph Hellwig
@ 2022-11-09 17:28       ` Logan Gunthorpe
  2022-11-09 18:33         ` Jens Axboe
  0 siblings, 1 reply; 34+ messages in thread
From: Logan Gunthorpe @ 2022-11-09 17:28 UTC (permalink / raw)
  To: Christoph Hellwig, John Hubbard, Jens Axboe
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	Greg Kroah-Hartman, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Minturn Dave B, Jason Ekstrand, Dave Hansen, Xiong Jianxin,
	Bjorn Helgaas, Ira Weiny, Robin Murphy, Martin Oliveira,
	Chaitanya Kulkarni, Ralph Campbell, Stephen Bates, viro

@add Jens

On 2022-11-07 23:56, Christoph Hellwig wrote:
> On Mon, Oct 24, 2022 at 12:15:56PM -0700, John Hubbard wrote:
>> A little earlier, Jens graciously offered [1] to provide a topic branch,
>> such as:
>>
>>     for-6.2/block-gup [2]
>>
>> (I've moved the name forward from 6.1 to 6.2, because that discussion
>> was 7 weeks ago.)
> 
> So what are we going to do with this series?  It would be sad to miss
> the merge window again.

I noticed Jens wasn't copied on this series. I've added him. It would be
nice to get this in someone's tree soon.

Thanks!

Logan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices
  2022-11-09 17:28       ` Logan Gunthorpe
@ 2022-11-09 18:33         ` Jens Axboe
  0 siblings, 0 replies; 34+ messages in thread
From: Jens Axboe @ 2022-11-09 18:33 UTC (permalink / raw)
  To: Logan Gunthorpe, Christoph Hellwig, John Hubbard
  Cc: linux-kernel, linux-nvme, linux-block, linux-pci, linux-mm,
	Greg Kroah-Hartman, Dan Williams, Jason Gunthorpe,
	Christian König, Don Dutile, Matthew Wilcox, Daniel Vetter,
	Minturn Dave B, Jason Ekstrand, Dave Hansen, Xiong Jianxin,
	Bjorn Helgaas, Ira Weiny, Robin Murphy, Martin Oliveira,
	Chaitanya Kulkarni, Ralph Campbell, Stephen Bates, viro

On 11/9/22 10:28 AM, Logan Gunthorpe wrote:
> @add Jens
> 
> On 2022-11-07 23:56, Christoph Hellwig wrote:
>> On Mon, Oct 24, 2022 at 12:15:56PM -0700, John Hubbard wrote:
>>> A little earlier, Jens graciously offered [1] to provide a topic branch,
>>> such as:
>>>
>>>     for-6.2/block-gup [2]
>>>
>>> (I've moved the name forward from 6.1 to 6.2, because that discussion
>>> was 7 weeks ago.)
>>
>> So what are we going to do with this series?  It would be sad to miss
>> the merge window again.
> 
> I noticed Jens wasn't copied on this series. I've added him. It would be
> nice to get this in someone's tree soon.

I took a look and the series looks fine to me.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-11-09 18:33 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21 17:41 [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
2022-10-21 17:41 ` [PATCH v11 1/9] mm: allow multiple error returns in try_grab_page() Logan Gunthorpe
2022-10-24 15:00   ` Christoph Hellwig
2022-10-24 16:37   ` Dan Williams
2022-10-25  1:06   ` Chaitanya Kulkarni
2022-10-21 17:41 ` [PATCH v11 2/9] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages Logan Gunthorpe
2022-10-24 15:00   ` Christoph Hellwig
2022-10-25  1:09   ` Chaitanya Kulkarni
2022-10-21 17:41 ` [PATCH v11 3/9] iov_iter: introduce iov_iter_get_pages_[alloc_]flags() Logan Gunthorpe
2022-10-25  1:14   ` Chaitanya Kulkarni
2022-10-25 15:35     ` Logan Gunthorpe
2022-10-25 15:41       ` Christoph Hellwig
2022-10-27  7:11   ` Jay Fang
2022-10-27 14:22     ` Logan Gunthorpe
2022-10-21 17:41 ` [PATCH v11 4/9] block: add check when merging zone device pages Logan Gunthorpe
2022-10-25  1:16   ` Chaitanya Kulkarni
2022-10-21 17:41 ` [PATCH v11 5/9] lib/scatterlist: " Logan Gunthorpe
2022-10-25  1:19   ` Chaitanya Kulkarni
2022-10-21 17:41 ` [PATCH v11 6/9] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages() Logan Gunthorpe
2022-10-25  1:23   ` Chaitanya Kulkarni
2022-10-25 15:37     ` Logan Gunthorpe
2022-10-25  1:25   ` Chaitanya Kulkarni
2022-10-21 17:41 ` [PATCH v11 7/9] block: set FOLL_PCI_P2PDMA in bio_map_user_iov() Logan Gunthorpe
2022-10-25  1:26   ` Chaitanya Kulkarni
2022-10-21 17:41 ` [PATCH v11 8/9] PCI/P2PDMA: Allow userspace VMA allocations through sysfs Logan Gunthorpe
2022-10-25  1:29   ` Chaitanya Kulkarni
2022-10-25  1:34   ` Chaitanya Kulkarni
2022-10-21 17:41 ` [PATCH v11 9/9] ABI: sysfs-bus-pci: add documentation for p2pmem allocate Logan Gunthorpe
2022-10-25  1:29   ` Chaitanya Kulkarni
2022-10-24 15:03 ` [PATCH v11 0/9] Userspace P2PDMA with O_DIRECT NVMe devices Christoph Hellwig
2022-10-24 19:15   ` John Hubbard
2022-11-08  6:56     ` Christoph Hellwig
2022-11-09 17:28       ` Logan Gunthorpe
2022-11-09 18:33         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).