All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhu Yanjun <zyjzyj2000@gmail.com>
To: Doug Ledford <dledford@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	RDMA mailing list <linux-rdma@vger.kernel.org>,
	maorg@nvidia.com
Subject: Fwd: [PATCH 1/1] RDMA/umem: add back hugepage sg list
Date: Sun, 7 Mar 2021 22:28:33 +0800	[thread overview]
Message-ID: <CAD=hENeqTTmpS5V+G646V0QvJFLVSd3Sq11ffQFcDXU-OSsQEg@mail.gmail.com> (raw)
In-Reply-To: <20210307221034.568606-1-yanjun.zhu@intel.com>

From: Zhu Yanjun <zyjzyj2000@gmail.com>

After the commit ("RDMA/umem: Move to allocate SG table from pages"),
the sg list from ib_ume_get is like the following:
"
sg_dma_address(sg):0x4b3c1ce000
sg_dma_address(sg):0x4c3c1cd000
sg_dma_address(sg):0x4d3c1cc000
sg_dma_address(sg):0x4e3c1cb000
"

But sometimes, we need sg list like the following:
"
sg_dma_address(sg):0x203b400000
sg_dma_address(sg):0x213b200000
sg_dma_address(sg):0x223b000000
sg_dma_address(sg):0x233ae00000
sg_dma_address(sg):0x243ac00000
"
The function ib_umem_add_sg_table can provide the sg list like the
second. And this function is removed in the commit ("RDMA/umem: Move
to allocate SG table from pages"). Now I add it back.

The new function is ib_umem_get to ib_umem_hugepage_get that calls
ib_umem_add_sg_table.

This function ib_umem_huagepage_get can get 4K, 2M sg list dma address.

Fixes: 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages")
Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
---
 drivers/infiniband/core/umem.c | 197 +++++++++++++++++++++++++++++++++
 include/rdma/ib_umem.h         |   3 +
 2 files changed, 200 insertions(+)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 2dde99a9ba07..af8733788b1e 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -62,6 +62,203 @@ static void __ib_umem_release(struct ib_device
*dev, struct ib_umem *umem, int d
        sg_free_table(&umem->sg_head);
 }

+/* ib_umem_add_sg_table - Add N contiguous pages to scatter table
+ *
+ * sg: current scatterlist entry
+ * page_list: array of npage struct page pointers
+ * npages: number of pages in page_list
+ * max_seg_sz: maximum segment size in bytes
+ * nents: [out] number of entries in the scatterlist
+ *
+ * Return new end of scatterlist
+ */
+static struct scatterlist *ib_umem_add_sg_table(struct scatterlist *sg,
+                                               struct page **page_list,
+                                               unsigned long npages,
+                                               unsigned int max_seg_sz,
+                                               int *nents)
+{
+       unsigned long first_pfn;
+       unsigned long i = 0;
+       bool update_cur_sg = false;
+       bool first = !sg_page(sg);
+
+       /* Check if new page_list is contiguous with end of previous page_list.
+        * sg->length here is a multiple of PAGE_SIZE and sg->offset is 0.
+        */
+       if (!first && (page_to_pfn(sg_page(sg)) + (sg->length >> PAGE_SHIFT) ==
+                       page_to_pfn(page_list[0])))
+               update_cur_sg = true;
+
+       while (i != npages) {
+               unsigned long len;
+               struct page *first_page = page_list[i];
+
+               first_pfn = page_to_pfn(first_page);
+
+               /* Compute the number of contiguous pages we have starting
+                * at i
+                */
+               for (len = 0; i != npages &&
+                               first_pfn + len == page_to_pfn(page_list[i]) &&
+                               len < (max_seg_sz >> PAGE_SHIFT);
+                    len++)
+                       i++;
+
+               /* Squash N contiguous pages from page_list into current sge */
+               if (update_cur_sg) {
+                       if ((max_seg_sz - sg->length) >= (len << PAGE_SHIFT)) {
+                               sg_set_page(sg, sg_page(sg),
+                               sg->length + (len << PAGE_SHIFT),
+                               0);
+                               update_cur_sg = false;
+                               continue;
+                       }
+                       update_cur_sg = false;
+               }
+
+               /* Squash N contiguous pages into next sge or first sge */
+               if (!first)
+                       sg = sg_next(sg);
+
+               (*nents)++;
+               sg_set_page(sg, first_page, len << PAGE_SHIFT, 0);
+               first = false;
+       }
+
+       return sg;
+}
+
+/**
+ * ib_umem_hugepage_get - Pin and DMA map userspace memory.
+ *
+ * @device: IB device to connect UMEM
+ * @addr: userspace virtual address to start at
+ * @size: length of region to pin
+ * @access: IB_ACCESS_xxx flags for memory being pinned
+ */
+struct ib_umem *ib_umem_hugepage_get(struct ib_device *device,
+                                    unsigned long addr,
+                                    size_t size, int access)
+{
+       struct ib_umem *umem;
+       struct page **page_list;
+       unsigned long lock_limit;
+       unsigned long new_pinned;
+       unsigned long cur_base;
+       unsigned long dma_attr = 0;
+       struct mm_struct *mm;
+       unsigned long npages;
+       int ret;
+       struct scatterlist *sg;
+       unsigned int gup_flags = FOLL_WRITE;
+
+       /*
+        * If the combination of the addr and size requested for this memory
+        * region causes an integer overflow, return error.
+        */
+       if (((addr + size) < addr) ||
+           PAGE_ALIGN(addr + size) < (addr + size))
+               return ERR_PTR(-EINVAL);
+
+       if (!can_do_mlock())
+               return ERR_PTR(-EPERM);
+
+       if (access & IB_ACCESS_ON_DEMAND)
+               return ERR_PTR(-EOPNOTSUPP);
+
+       umem = kzalloc(sizeof(*umem), GFP_KERNEL);
+       if (!umem)
+               return ERR_PTR(-ENOMEM);
+       umem->ibdev      = device;
+       umem->length     = size;
+       umem->address    = addr;
+       umem->writable   = ib_access_writable(access);
+       umem->owning_mm = mm = current->mm;
+       mmgrab(mm);
+
+       page_list = (struct page **) __get_free_page(GFP_KERNEL);
+       if (!page_list) {
+               ret = -ENOMEM;
+               goto umem_kfree;
+       }
+
+       npages = ib_umem_num_pages(umem);
+       if (npages == 0 || npages > UINT_MAX) {
+               ret = -EINVAL;
+               goto out;
+       }
+
+       lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+       new_pinned = atomic64_add_return(npages, &mm->pinned_vm);
+       if (new_pinned > lock_limit && !capable(CAP_IPC_LOCK)) {
+               atomic64_sub(npages, &mm->pinned_vm);
+               ret = -ENOMEM;
+               goto out;
+       }
+
+       cur_base = addr & PAGE_MASK;
+
+       ret = sg_alloc_table(&umem->sg_head, npages, GFP_KERNEL);
+       if (ret)
+               goto vma;
+
+       if (!umem->writable)
+               gup_flags |= FOLL_FORCE;
+
+       sg = umem->sg_head.sgl;
+
+       while (npages) {
+               cond_resched();
+               ret = pin_user_pages_fast(cur_base,
+                                         min_t(unsigned long, npages,
+                                               PAGE_SIZE /
+                                         sizeof(struct page *)),
+                                         gup_flags | FOLL_LONGTERM, page_list);
+               if (ret < 0)
+                       goto umem_release;
+
+               cur_base += ret * PAGE_SIZE;
+               npages   -= ret;
+
+               sg = ib_umem_add_sg_table(sg, page_list, ret,
+                               dma_get_max_seg_size(device->dma_device),
+                               &umem->sg_nents);
+       }
+
+       sg_mark_end(sg);
+
+       if (access & IB_ACCESS_RELAXED_ORDERING)
+               dma_attr |= DMA_ATTR_WEAK_ORDERING;
+
+       umem->nmap =
+               ib_dma_map_sg_attrs(device, umem->sg_head.sgl, umem->sg_nents,
+                                   DMA_BIDIRECTIONAL, dma_attr);
+
+       if (!umem->nmap) {
+               ret = -ENOMEM;
+               goto umem_release;
+       }
+
+       ret = 0;
+       goto out;
+
+umem_release:
+       __ib_umem_release(device, umem, 0);
+vma:
+       atomic64_sub(ib_umem_num_pages(umem), &mm->pinned_vm);
+out:
+       free_page((unsigned long) page_list);
+umem_kfree:
+       if (ret) {
+               mmdrop(umem->owning_mm);
+               kfree(umem);
+       }
+       return ret ? ERR_PTR(ret) : umem;
+}
+EXPORT_SYMBOL(ib_umem_hugepage_get);
+
 /**
  * ib_umem_find_best_pgsz - Find best HW page size to use for this MR
  *
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 676c57f5ca80..fc6350ff21e6 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -96,6 +96,9 @@ static inline void
__rdma_umem_block_iter_start(struct ib_block_iter *biter,
             __rdma_block_iter_next(biter);)

 #ifdef CONFIG_INFINIBAND_USER_MEM
+struct ib_umem *ib_umem_hugepage_get(struct ib_device *device,
+                                    unsigned long addr,
+                                    size_t size, int access);

 struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
                            size_t size, int access);
--
2.25.1

       reply	other threads:[~2021-03-07 14:29 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210307221034.568606-1-yanjun.zhu@intel.com>
2021-03-07 14:28 ` Zhu Yanjun [this message]
2021-03-07 17:20   ` Fwd: [PATCH 1/1] RDMA/umem: add back hugepage sg list Leon Romanovsky
2021-03-07 17:22   ` Leon Romanovsky
2021-03-08  2:44     ` Zhu Yanjun
2021-03-08 10:13     ` Zhu Yanjun
2021-03-08 12:16       ` Jason Gunthorpe
2021-03-11 10:41         ` Zhu Yanjun
2021-03-12  0:25           ` Jason Gunthorpe
2021-03-12  8:04             ` Zhu Yanjun
2021-03-12  8:05               ` Zhu Yanjun
2021-03-12 13:02               ` Jason Gunthorpe
2021-03-12 13:42                 ` Zhu Yanjun
2021-03-12 13:49                   ` Zhu Yanjun
2021-03-12 14:01                     ` Jason Gunthorpe
2021-03-13  3:02                       ` Zhu Yanjun
2021-03-19 13:00                         ` Jason Gunthorpe
2021-03-19 13:33                           ` Zhu Yanjun
2021-03-19 13:48                             ` Jason Gunthorpe
2021-03-20  3:38                               ` Zhu Yanjun
2021-03-20  3:49                                 ` Zhu Yanjun
2021-03-20 20:38                                 ` Jason Gunthorpe
2021-03-21  8:06                                   ` Zhu Yanjun
2021-03-21 12:07                                     ` Jason Gunthorpe
2021-03-21 12:54                                       ` Zhu Yanjun
2021-03-21 13:03                                         ` Jason Gunthorpe
2021-03-21 14:38                                           ` Zhu Yanjun
2021-03-21 15:52                                             ` Jason Gunthorpe
2021-03-22  5:07                                               ` Zhu Yanjun
2021-03-08  0:44   ` Jason Gunthorpe
2021-03-08  2:47     ` Zhu Yanjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=hENeqTTmpS5V+G646V0QvJFLVSd3Sq11ffQFcDXU-OSsQEg@mail.gmail.com' \
    --to=zyjzyj2000@gmail.com \
    --cc=dledford@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.