From: Logan Gunthorpe <logang@deltatee.com> To: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-pci@vger.kernel.org, linux-mm@kvack.org, iommu@lists.linux-foundation.org Cc: "Stephen Bates" <sbates@raithlin.com>, "Christoph Hellwig" <hch@lst.de>, "Dan Williams" <dan.j.williams@intel.com>, "Jason Gunthorpe" <jgg@ziepe.ca>, "Christian König" <christian.koenig@amd.com>, "John Hubbard" <jhubbard@nvidia.com>, "Don Dutile" <ddutile@redhat.com>, "Matthew Wilcox" <willy@infradead.org>, "Daniel Vetter" <daniel.vetter@ffwll.ch>, "Jakowski Andrzej" <andrzej.jakowski@intel.com>, "Minturn Dave B" <dave.b.minturn@intel.com>, "Jason Ekstrand" <jason@jlekstrand.net>, "Dave Hansen" <dave.hansen@linux.intel.com>, "Xiong Jianxin" <jianxin.xiong@intel.com>, "Bjorn Helgaas" <helgaas@kernel.org>, "Ira Weiny" <ira.weiny@intel.com>, "Robin Murphy" <robin.murphy@arm.com>, "Martin Oliveira" <martin.oliveira@eideticom.com>, "Chaitanya Kulkarni" <ckulkarnilinux@gmail.com>, "Ralph Campbell" <rcampbell@nvidia.com>, "Logan Gunthorpe" <logang@deltatee.com>, "Bjorn Helgaas" <bhelgaas@google.com> Subject: [PATCH v5 23/24] PCI/P2PDMA: Introduce pci_mmap_p2pmem() Date: Thu, 27 Jan 2022 17:26:13 -0700 [thread overview] Message-ID: <20220128002614.6136-24-logang@deltatee.com> (raw) In-Reply-To: <20220128002614.6136-1-logang@deltatee.com> Introduce pci_mmap_p2pmem() which is a helper to allocate and mmap a hunk of p2pmem into userspace. Pages are allocated from the genalloc in bulk and their reference count incremented. They are returned to the genalloc when the page is put. The VMA does not take a reference to the pages when they are inserted with vmf_insert_mixed() (which is necessary for zone device pages) so the backing P2P memory is stored in a structures in vm_private_data. A pseudo mount is used to allocate an inode for each PCI device. The inode's address_space is used in the file doing the mmap so that all VMAs are collected and can be unmapped if the PCI device is unbound. After unmapping, the VMAs are iterated through and their pages are put so the device can continue to be unbound. An active flag is used to signal to VMAs not to allocate any further P2P memory once the removal process starts. The flag is synchronized with concurrent access with an RCU lock. The VMAs and inode will survive after the unbind of the device, but no pages will be present in the VMA and a subsequent access will result in a SIGBUS error. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/p2pdma.c | 301 ++++++++++++++++++++++++++++++++++++- include/linux/pci-p2pdma.h | 11 ++ include/uapi/linux/magic.h | 1 + 3 files changed, 311 insertions(+), 2 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 3a24bf5099cf..d54068d6ce6a 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -17,14 +17,19 @@ #include <linux/genalloc.h> #include <linux/memremap.h> #include <linux/percpu-refcount.h> +#include <linux/pfn_t.h> +#include <linux/pseudo_fs.h> #include <linux/random.h> #include <linux/seq_buf.h> #include <linux/xarray.h> +#include <uapi/linux/magic.h> struct pci_p2pdma { struct gen_pool *pool; bool p2pmem_published; struct xarray map_types; + struct inode *inode; + bool active; }; struct pci_p2pdma_pagemap { @@ -33,6 +38,15 @@ struct pci_p2pdma_pagemap { u64 bus_offset; }; +struct pci_p2pdma_map { + struct kref ref; + struct rcu_head rcu; + struct pci_dev *pdev; + struct inode *inode; + void *kaddr; + size_t len; +}; + static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap) { return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap); @@ -101,6 +115,26 @@ static const struct attribute_group p2pmem_group = { .name = "p2pmem", }; +/* + * P2PDMA internal mount + * Fake an internal VFS mount-point in order to allocate struct address_space + * mappings to remove VMAs on unbind events. + */ +static int pci_p2pdma_fs_cnt; +static struct vfsmount *pci_p2pdma_fs_mnt; + +static int pci_p2pdma_fs_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, P2PDMA_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type pci_p2pdma_fs_type = { + .name = "p2dma", + .owner = THIS_MODULE, + .init_fs_context = pci_p2pdma_fs_init_fs_context, + .kill_sb = kill_anon_super, +}; + static void p2pdma_page_free(struct page *page) { struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap); @@ -129,6 +163,9 @@ static void pci_p2pdma_release(void *data) gen_pool_destroy(p2pdma->pool); sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group); xa_destroy(&p2pdma->map_types); + + iput(p2pdma->inode); + simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt); } static int pci_p2pdma_setup(struct pci_dev *pdev) @@ -146,17 +183,32 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) if (!p2p->pool) goto out; - error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); + error = simple_pin_fs(&pci_p2pdma_fs_type, &pci_p2pdma_fs_mnt, + &pci_p2pdma_fs_cnt); if (error) goto out_pool_destroy; + p2p->inode = alloc_anon_inode(pci_p2pdma_fs_mnt->mnt_sb); + if (IS_ERR(p2p->inode)) { + error = -ENOMEM; + goto out_unpin_fs; + } + + error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); + if (error) + goto out_put_inode; + error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); if (error) - goto out_pool_destroy; + goto out_put_inode; rcu_assign_pointer(pdev->p2pdma, p2p); return 0; +out_put_inode: + iput(p2p->inode); +out_unpin_fs: + simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt); out_pool_destroy: gen_pool_destroy(p2p->pool); out: @@ -164,6 +216,54 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) return error; } +static void pci_p2pdma_map_free_pages(struct pci_p2pdma_map *pmap) +{ + int i; + + if (!pmap->kaddr) + return; + + for (i = 0; i < pmap->len; i += PAGE_SIZE) + put_page(virt_to_page(pmap->kaddr + i)); + + pmap->kaddr = NULL; +} + +static void pci_p2pdma_free_mappings(struct address_space *mapping) +{ + struct vm_area_struct *vma; + + i_mmap_lock_write(mapping); + if (RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) + goto out; + + vma_interval_tree_foreach(vma, &mapping->i_mmap, 0, -1) + pci_p2pdma_map_free_pages(vma->vm_private_data); + +out: + i_mmap_unlock_write(mapping); +} + +static void pci_p2pdma_unmap_mappings(void *data) +{ + struct pci_dev *pdev = data; + struct pci_p2pdma *p2pdma = rcu_dereference_protected(pdev->p2pdma, 1); + + /* Ensure no new pages can be allocated in mappings */ + p2pdma->active = false; + synchronize_rcu(); + + unmap_mapping_range(p2pdma->inode->i_mapping, 0, 0, 1); + + /* + * On some architectures, TLB flushes are done with call_rcu() + * so to ensure GUP fast is done with the pages, call synchronize_rcu() + * before freeing them. + */ + synchronize_rcu(); + pci_p2pdma_free_mappings(p2pdma->inode->i_mapping); +} + /** * pci_p2pdma_add_resource - add memory for use as p2p memory * @pdev: the device to add the memory to @@ -222,6 +322,11 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, goto pgmap_free; } + error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings, + pdev); + if (error) + goto pages_free; + p2pdma = rcu_dereference_protected(pdev->p2pdma, 1); error = gen_pool_add_owner(p2pdma->pool, (unsigned long)addr, pci_bus_address(pdev, bar) + offset, @@ -230,6 +335,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, if (error) goto pages_free; + p2pdma->active = true; pci_info(pdev, "added peer-to-peer DMA memory %#llx-%#llx\n", pgmap->range.start, pgmap->range.end); @@ -1030,3 +1136,194 @@ ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, return sprintf(page, "%s\n", pci_name(p2p_dev)); } EXPORT_SYMBOL_GPL(pci_p2pdma_enable_show); + +static struct pci_p2pdma_map *pci_p2pdma_map_alloc(struct pci_dev *pdev, + size_t len) +{ + struct pci_p2pdma_map *pmap; + + pmap = kzalloc(sizeof(*pmap), GFP_KERNEL); + if (!pmap) + return NULL; + + kref_init(&pmap->ref); + pmap->pdev = pci_dev_get(pdev); + pmap->len = len; + + return pmap; +} + +static void pci_p2pdma_map_free(struct rcu_head *rcu) +{ + struct pci_p2pdma_map *pmap = + container_of(rcu, struct pci_p2pdma_map, rcu); + + pci_p2pdma_map_free_pages(pmap); + kfree(pmap); +} + +static void pci_p2pdma_map_release(struct kref *ref) +{ + struct pci_p2pdma_map *pmap = + container_of(ref, struct pci_p2pdma_map, ref); + + iput(pmap->inode); + simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt); + pci_dev_put(pmap->pdev); + + if (pmap->kaddr) { + /* + * Make sure to wait for the TLB flush (which some + * architectures do using call_rcu()) before returning the + * pages to the genalloc. This ensures the pages are not reused + * before GUP-fast is finished with them. So the mapping is + * freed using call_rcu() seeing adding synchronize_rcu() to + * the munmap path can cause long delays on large systems + * during process cleanup. + */ + call_rcu(&pmap->rcu, pci_p2pdma_map_free); + return; + } + + /* + * If there are no pages, just free the object immediately. There + * are no more references to it so there is nothing that can race + * with adding the pages. + */ + pci_p2pdma_map_free(&pmap->rcu); +} + +static void pci_p2pdma_vma_open(struct vm_area_struct *vma) +{ + struct pci_p2pdma_map *pmap = vma->vm_private_data; + + kref_get(&pmap->ref); +} + +static void pci_p2pdma_vma_close(struct vm_area_struct *vma) +{ + struct pci_p2pdma_map *pmap = vma->vm_private_data; + + kref_put(&pmap->ref, pci_p2pdma_map_release); +} + +static vm_fault_t pci_p2pdma_vma_fault(struct vm_fault *vmf) +{ + struct pci_p2pdma_map *pmap = vmf->vma->vm_private_data; + struct pci_p2pdma *p2pdma; + void *vaddr; + pfn_t pfn; + int i; + + if (!pmap->kaddr) { + rcu_read_lock(); + p2pdma = rcu_dereference(pmap->pdev->p2pdma); + if (!p2pdma) + goto err_out; + + if (!p2pdma->active) + goto err_out; + + pmap->kaddr = (void *)gen_pool_alloc(p2pdma->pool, pmap->len); + if (!pmap->kaddr) + goto err_out; + + for (i = 0; i < pmap->len; i += PAGE_SIZE) + get_page(virt_to_page(pmap->kaddr + i)); + + rcu_read_unlock(); + } + + vaddr = pmap->kaddr + (vmf->pgoff << PAGE_SHIFT); + pfn = phys_to_pfn_t(virt_to_phys(vaddr), PFN_DEV | PFN_MAP); + + return vmf_insert_mixed(vmf->vma, vmf->address, pfn); + +err_out: + rcu_read_unlock(); + return VM_FAULT_SIGBUS; +} +static const struct vm_operations_struct pci_p2pdma_vmops = { + .open = pci_p2pdma_vma_open, + .close = pci_p2pdma_vma_close, + .fault = pci_p2pdma_vma_fault, +}; + +/** + * pci_p2pdma_file_open - setup file mapping to store P2PMEM VMAs + * @pdev: the device to allocate memory from + * @vma: the userspace vma to map the memory to + * + * Set f_mapping of the file to the p2pdma inode so that mappings + * are can be torn down on device unbind. + * + * Returns 0 on success, or a negative error code on failure + */ +void pci_p2pdma_file_open(struct pci_dev *pdev, struct file *file) +{ + struct pci_p2pdma *p2pdma; + + rcu_read_lock(); + p2pdma = rcu_dereference(pdev->p2pdma); + if (p2pdma) + file->f_mapping = p2pdma->inode->i_mapping; + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(pci_p2pdma_file_open); + +/** + * pci_mmap_p2pmem - setup an mmap region to be backed with P2PDMA memory + * that was registered with pci_p2pdma_add_resource() + * @pdev: the device to allocate memory from + * @vma: the userspace vma to map the memory to + * + * The file must call pci_p2pdma_mmap_file_open() in its open() operation. + * + * Returns 0 on success, or a negative error code on failure + */ +int pci_mmap_p2pmem(struct pci_dev *pdev, struct vm_area_struct *vma) +{ + struct pci_p2pdma_map *pmap; + struct pci_p2pdma *p2pdma; + int ret; + + /* prevent private mappings from being established */ + if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) { + pci_info_ratelimited(pdev, + "%s: fail, attempted private mapping\n", + current->comm); + return -EINVAL; + } + + pmap = pci_p2pdma_map_alloc(pdev, vma->vm_end - vma->vm_start); + if (!pmap) + return -ENOMEM; + + rcu_read_lock(); + p2pdma = rcu_dereference(pdev->p2pdma); + if (!p2pdma) { + ret = -ENODEV; + goto out; + } + + ret = simple_pin_fs(&pci_p2pdma_fs_type, &pci_p2pdma_fs_mnt, + &pci_p2pdma_fs_cnt); + if (ret) + goto out; + + ihold(p2pdma->inode); + pmap->inode = p2pdma->inode; + rcu_read_unlock(); + + vma->vm_flags |= VM_MIXEDMAP; + vma->vm_private_data = pmap; + vma->vm_ops = &pci_p2pdma_vmops; + + return 0; + +out: + rcu_read_unlock(); + kfree(pmap); + return ret; +} +EXPORT_SYMBOL_GPL(pci_mmap_p2pmem); diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 2c07aa6b7665..040d79126463 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -34,6 +34,8 @@ int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev, bool *use_p2pdma); ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, bool use_p2pdma); +void pci_p2pdma_file_open(struct pci_dev *pdev, struct file *file); +int pci_mmap_p2pmem(struct pci_dev *pdev, struct vm_area_struct *vma); #else /* CONFIG_PCI_P2PDMA */ static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset) @@ -90,6 +92,15 @@ static inline ssize_t pci_p2pdma_enable_show(char *page, { return sprintf(page, "none\n"); } +static inline void pci_p2pdma_file_open(struct pci_dev *pdev, + struct file *file) +{ +} +static inline int pci_mmap_p2pmem(struct pci_dev *pdev, + struct vm_area_struct *vma) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_PCI_P2PDMA */ diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index 0425cd79af9a..bf5af400fb7d 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -94,6 +94,7 @@ #define BPF_FS_MAGIC 0xcafe4a11 #define AAFS_MAGIC 0x5a3c69f0 #define ZONEFS_MAGIC 0x5a4f4653 +#define P2PDMA_MAGIC 0x70327064 /* Since UDF 2.01 is ISO 13346 based... */ #define UDF_SUPER_MAGIC 0x15013346 -- 2.30.2
WARNING: multiple messages have this Message-ID (diff)
From: Logan Gunthorpe <logang@deltatee.com> To: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-pci@vger.kernel.org, linux-mm@kvack.org, iommu@lists.linux-foundation.org Cc: "Daniel Vetter" <daniel.vetter@ffwll.ch>, "Dave Hansen" <dave.hansen@linux.intel.com>, "Stephen Bates" <sbates@raithlin.com>, "Jason Ekstrand" <jason@jlekstrand.net>, "Ira Weiny" <ira.weiny@intel.com>, "Christoph Hellwig" <hch@lst.de>, "Minturn Dave B" <dave.b.minturn@intel.com>, "Martin Oliveira" <martin.oliveira@eideticom.com>, "Matthew Wilcox" <willy@infradead.org>, "Jason Gunthorpe" <jgg@ziepe.ca>, "Chaitanya Kulkarni" <ckulkarnilinux@gmail.com>, "Bjorn Helgaas" <helgaas@kernel.org>, "Ralph Campbell" <rcampbell@nvidia.com>, "John Hubbard" <jhubbard@nvidia.com>, "Bjorn Helgaas" <bhelgaas@google.com>, "Dan Williams" <dan.j.williams@intel.com>, "Jakowski Andrzej" <andrzej.jakowski@intel.com>, "Xiong Jianxin" <jianxin.xiong@intel.com>, "Logan Gunthorpe" <logang@deltatee.com>, "Robin Murphy" <robin.murphy@arm.com>, "Christian König" <christian.koenig@amd.com> Subject: [PATCH v5 23/24] PCI/P2PDMA: Introduce pci_mmap_p2pmem() Date: Thu, 27 Jan 2022 17:26:13 -0700 [thread overview] Message-ID: <20220128002614.6136-24-logang@deltatee.com> (raw) In-Reply-To: <20220128002614.6136-1-logang@deltatee.com> Introduce pci_mmap_p2pmem() which is a helper to allocate and mmap a hunk of p2pmem into userspace. Pages are allocated from the genalloc in bulk and their reference count incremented. They are returned to the genalloc when the page is put. The VMA does not take a reference to the pages when they are inserted with vmf_insert_mixed() (which is necessary for zone device pages) so the backing P2P memory is stored in a structures in vm_private_data. A pseudo mount is used to allocate an inode for each PCI device. The inode's address_space is used in the file doing the mmap so that all VMAs are collected and can be unmapped if the PCI device is unbound. After unmapping, the VMAs are iterated through and their pages are put so the device can continue to be unbound. An active flag is used to signal to VMAs not to allocate any further P2P memory once the removal process starts. The flag is synchronized with concurrent access with an RCU lock. The VMAs and inode will survive after the unbind of the device, but no pages will be present in the VMA and a subsequent access will result in a SIGBUS error. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/p2pdma.c | 301 ++++++++++++++++++++++++++++++++++++- include/linux/pci-p2pdma.h | 11 ++ include/uapi/linux/magic.h | 1 + 3 files changed, 311 insertions(+), 2 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 3a24bf5099cf..d54068d6ce6a 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -17,14 +17,19 @@ #include <linux/genalloc.h> #include <linux/memremap.h> #include <linux/percpu-refcount.h> +#include <linux/pfn_t.h> +#include <linux/pseudo_fs.h> #include <linux/random.h> #include <linux/seq_buf.h> #include <linux/xarray.h> +#include <uapi/linux/magic.h> struct pci_p2pdma { struct gen_pool *pool; bool p2pmem_published; struct xarray map_types; + struct inode *inode; + bool active; }; struct pci_p2pdma_pagemap { @@ -33,6 +38,15 @@ struct pci_p2pdma_pagemap { u64 bus_offset; }; +struct pci_p2pdma_map { + struct kref ref; + struct rcu_head rcu; + struct pci_dev *pdev; + struct inode *inode; + void *kaddr; + size_t len; +}; + static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap) { return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap); @@ -101,6 +115,26 @@ static const struct attribute_group p2pmem_group = { .name = "p2pmem", }; +/* + * P2PDMA internal mount + * Fake an internal VFS mount-point in order to allocate struct address_space + * mappings to remove VMAs on unbind events. + */ +static int pci_p2pdma_fs_cnt; +static struct vfsmount *pci_p2pdma_fs_mnt; + +static int pci_p2pdma_fs_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, P2PDMA_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type pci_p2pdma_fs_type = { + .name = "p2dma", + .owner = THIS_MODULE, + .init_fs_context = pci_p2pdma_fs_init_fs_context, + .kill_sb = kill_anon_super, +}; + static void p2pdma_page_free(struct page *page) { struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap); @@ -129,6 +163,9 @@ static void pci_p2pdma_release(void *data) gen_pool_destroy(p2pdma->pool); sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group); xa_destroy(&p2pdma->map_types); + + iput(p2pdma->inode); + simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt); } static int pci_p2pdma_setup(struct pci_dev *pdev) @@ -146,17 +183,32 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) if (!p2p->pool) goto out; - error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); + error = simple_pin_fs(&pci_p2pdma_fs_type, &pci_p2pdma_fs_mnt, + &pci_p2pdma_fs_cnt); if (error) goto out_pool_destroy; + p2p->inode = alloc_anon_inode(pci_p2pdma_fs_mnt->mnt_sb); + if (IS_ERR(p2p->inode)) { + error = -ENOMEM; + goto out_unpin_fs; + } + + error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); + if (error) + goto out_put_inode; + error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); if (error) - goto out_pool_destroy; + goto out_put_inode; rcu_assign_pointer(pdev->p2pdma, p2p); return 0; +out_put_inode: + iput(p2p->inode); +out_unpin_fs: + simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt); out_pool_destroy: gen_pool_destroy(p2p->pool); out: @@ -164,6 +216,54 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) return error; } +static void pci_p2pdma_map_free_pages(struct pci_p2pdma_map *pmap) +{ + int i; + + if (!pmap->kaddr) + return; + + for (i = 0; i < pmap->len; i += PAGE_SIZE) + put_page(virt_to_page(pmap->kaddr + i)); + + pmap->kaddr = NULL; +} + +static void pci_p2pdma_free_mappings(struct address_space *mapping) +{ + struct vm_area_struct *vma; + + i_mmap_lock_write(mapping); + if (RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) + goto out; + + vma_interval_tree_foreach(vma, &mapping->i_mmap, 0, -1) + pci_p2pdma_map_free_pages(vma->vm_private_data); + +out: + i_mmap_unlock_write(mapping); +} + +static void pci_p2pdma_unmap_mappings(void *data) +{ + struct pci_dev *pdev = data; + struct pci_p2pdma *p2pdma = rcu_dereference_protected(pdev->p2pdma, 1); + + /* Ensure no new pages can be allocated in mappings */ + p2pdma->active = false; + synchronize_rcu(); + + unmap_mapping_range(p2pdma->inode->i_mapping, 0, 0, 1); + + /* + * On some architectures, TLB flushes are done with call_rcu() + * so to ensure GUP fast is done with the pages, call synchronize_rcu() + * before freeing them. + */ + synchronize_rcu(); + pci_p2pdma_free_mappings(p2pdma->inode->i_mapping); +} + /** * pci_p2pdma_add_resource - add memory for use as p2p memory * @pdev: the device to add the memory to @@ -222,6 +322,11 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, goto pgmap_free; } + error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings, + pdev); + if (error) + goto pages_free; + p2pdma = rcu_dereference_protected(pdev->p2pdma, 1); error = gen_pool_add_owner(p2pdma->pool, (unsigned long)addr, pci_bus_address(pdev, bar) + offset, @@ -230,6 +335,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, if (error) goto pages_free; + p2pdma->active = true; pci_info(pdev, "added peer-to-peer DMA memory %#llx-%#llx\n", pgmap->range.start, pgmap->range.end); @@ -1030,3 +1136,194 @@ ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, return sprintf(page, "%s\n", pci_name(p2p_dev)); } EXPORT_SYMBOL_GPL(pci_p2pdma_enable_show); + +static struct pci_p2pdma_map *pci_p2pdma_map_alloc(struct pci_dev *pdev, + size_t len) +{ + struct pci_p2pdma_map *pmap; + + pmap = kzalloc(sizeof(*pmap), GFP_KERNEL); + if (!pmap) + return NULL; + + kref_init(&pmap->ref); + pmap->pdev = pci_dev_get(pdev); + pmap->len = len; + + return pmap; +} + +static void pci_p2pdma_map_free(struct rcu_head *rcu) +{ + struct pci_p2pdma_map *pmap = + container_of(rcu, struct pci_p2pdma_map, rcu); + + pci_p2pdma_map_free_pages(pmap); + kfree(pmap); +} + +static void pci_p2pdma_map_release(struct kref *ref) +{ + struct pci_p2pdma_map *pmap = + container_of(ref, struct pci_p2pdma_map, ref); + + iput(pmap->inode); + simple_release_fs(&pci_p2pdma_fs_mnt, &pci_p2pdma_fs_cnt); + pci_dev_put(pmap->pdev); + + if (pmap->kaddr) { + /* + * Make sure to wait for the TLB flush (which some + * architectures do using call_rcu()) before returning the + * pages to the genalloc. This ensures the pages are not reused + * before GUP-fast is finished with them. So the mapping is + * freed using call_rcu() seeing adding synchronize_rcu() to + * the munmap path can cause long delays on large systems + * during process cleanup. + */ + call_rcu(&pmap->rcu, pci_p2pdma_map_free); + return; + } + + /* + * If there are no pages, just free the object immediately. There + * are no more references to it so there is nothing that can race + * with adding the pages. + */ + pci_p2pdma_map_free(&pmap->rcu); +} + +static void pci_p2pdma_vma_open(struct vm_area_struct *vma) +{ + struct pci_p2pdma_map *pmap = vma->vm_private_data; + + kref_get(&pmap->ref); +} + +static void pci_p2pdma_vma_close(struct vm_area_struct *vma) +{ + struct pci_p2pdma_map *pmap = vma->vm_private_data; + + kref_put(&pmap->ref, pci_p2pdma_map_release); +} + +static vm_fault_t pci_p2pdma_vma_fault(struct vm_fault *vmf) +{ + struct pci_p2pdma_map *pmap = vmf->vma->vm_private_data; + struct pci_p2pdma *p2pdma; + void *vaddr; + pfn_t pfn; + int i; + + if (!pmap->kaddr) { + rcu_read_lock(); + p2pdma = rcu_dereference(pmap->pdev->p2pdma); + if (!p2pdma) + goto err_out; + + if (!p2pdma->active) + goto err_out; + + pmap->kaddr = (void *)gen_pool_alloc(p2pdma->pool, pmap->len); + if (!pmap->kaddr) + goto err_out; + + for (i = 0; i < pmap->len; i += PAGE_SIZE) + get_page(virt_to_page(pmap->kaddr + i)); + + rcu_read_unlock(); + } + + vaddr = pmap->kaddr + (vmf->pgoff << PAGE_SHIFT); + pfn = phys_to_pfn_t(virt_to_phys(vaddr), PFN_DEV | PFN_MAP); + + return vmf_insert_mixed(vmf->vma, vmf->address, pfn); + +err_out: + rcu_read_unlock(); + return VM_FAULT_SIGBUS; +} +static const struct vm_operations_struct pci_p2pdma_vmops = { + .open = pci_p2pdma_vma_open, + .close = pci_p2pdma_vma_close, + .fault = pci_p2pdma_vma_fault, +}; + +/** + * pci_p2pdma_file_open - setup file mapping to store P2PMEM VMAs + * @pdev: the device to allocate memory from + * @vma: the userspace vma to map the memory to + * + * Set f_mapping of the file to the p2pdma inode so that mappings + * are can be torn down on device unbind. + * + * Returns 0 on success, or a negative error code on failure + */ +void pci_p2pdma_file_open(struct pci_dev *pdev, struct file *file) +{ + struct pci_p2pdma *p2pdma; + + rcu_read_lock(); + p2pdma = rcu_dereference(pdev->p2pdma); + if (p2pdma) + file->f_mapping = p2pdma->inode->i_mapping; + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(pci_p2pdma_file_open); + +/** + * pci_mmap_p2pmem - setup an mmap region to be backed with P2PDMA memory + * that was registered with pci_p2pdma_add_resource() + * @pdev: the device to allocate memory from + * @vma: the userspace vma to map the memory to + * + * The file must call pci_p2pdma_mmap_file_open() in its open() operation. + * + * Returns 0 on success, or a negative error code on failure + */ +int pci_mmap_p2pmem(struct pci_dev *pdev, struct vm_area_struct *vma) +{ + struct pci_p2pdma_map *pmap; + struct pci_p2pdma *p2pdma; + int ret; + + /* prevent private mappings from being established */ + if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) { + pci_info_ratelimited(pdev, + "%s: fail, attempted private mapping\n", + current->comm); + return -EINVAL; + } + + pmap = pci_p2pdma_map_alloc(pdev, vma->vm_end - vma->vm_start); + if (!pmap) + return -ENOMEM; + + rcu_read_lock(); + p2pdma = rcu_dereference(pdev->p2pdma); + if (!p2pdma) { + ret = -ENODEV; + goto out; + } + + ret = simple_pin_fs(&pci_p2pdma_fs_type, &pci_p2pdma_fs_mnt, + &pci_p2pdma_fs_cnt); + if (ret) + goto out; + + ihold(p2pdma->inode); + pmap->inode = p2pdma->inode; + rcu_read_unlock(); + + vma->vm_flags |= VM_MIXEDMAP; + vma->vm_private_data = pmap; + vma->vm_ops = &pci_p2pdma_vmops; + + return 0; + +out: + rcu_read_unlock(); + kfree(pmap); + return ret; +} +EXPORT_SYMBOL_GPL(pci_mmap_p2pmem); diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 2c07aa6b7665..040d79126463 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -34,6 +34,8 @@ int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev, bool *use_p2pdma); ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, bool use_p2pdma); +void pci_p2pdma_file_open(struct pci_dev *pdev, struct file *file); +int pci_mmap_p2pmem(struct pci_dev *pdev, struct vm_area_struct *vma); #else /* CONFIG_PCI_P2PDMA */ static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset) @@ -90,6 +92,15 @@ static inline ssize_t pci_p2pdma_enable_show(char *page, { return sprintf(page, "none\n"); } +static inline void pci_p2pdma_file_open(struct pci_dev *pdev, + struct file *file) +{ +} +static inline int pci_mmap_p2pmem(struct pci_dev *pdev, + struct vm_area_struct *vma) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_PCI_P2PDMA */ diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index 0425cd79af9a..bf5af400fb7d 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -94,6 +94,7 @@ #define BPF_FS_MAGIC 0xcafe4a11 #define AAFS_MAGIC 0x5a3c69f0 #define ZONEFS_MAGIC 0x5a4f4653 +#define P2PDMA_MAGIC 0x70327064 /* Since UDF 2.01 is ISO 13346 based... */ #define UDF_SUPER_MAGIC 0x15013346 -- 2.30.2 _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2022-01-28 0:26 UTC|newest] Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-28 0:25 [PATCH v5 00/24] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-01-28 0:25 ` [PATCH v5 01/24] ext4/xfs: add page refcount helper Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-01-28 9:26 ` Chaitanya Kulkarni 2022-01-28 9:26 ` Chaitanya Kulkarni via iommu 2022-01-28 0:25 ` [PATCH v5 02/24] mm: remove extra ZONE_DEVICE struct page refcount Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-01-28 14:21 ` Jason Gunthorpe 2022-01-28 14:21 ` Jason Gunthorpe 2022-01-28 16:51 ` Logan Gunthorpe 2022-01-28 16:51 ` Logan Gunthorpe 2022-01-28 17:16 ` Ralph Campbell 2022-01-28 17:16 ` Ralph Campbell via iommu 2022-01-28 0:25 ` [PATCH v5 03/24] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-01-28 0:25 ` [PATCH v5 04/24] PCI/P2PDMA: Attempt to set map_type if it has not been set Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-01-28 0:25 ` [PATCH v5 05/24] PCI/P2PDMA: Expose pci_p2pdma_map_type() Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-01-28 0:25 ` [PATCH v5 06/24] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-01-28 0:25 ` [PATCH v5 07/24] dma-mapping: allow EREMOTEIO return code for P2PDMA transfers Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-01-28 0:25 ` [PATCH v5 08/24] dma-direct: support PCI P2PDMA pages in dma-direct map_sg Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-02-01 20:53 ` Jonathan Derrick 2022-02-01 20:53 ` Jonathan Derrick 2022-02-01 20:57 ` Logan Gunthorpe 2022-02-01 20:57 ` Logan Gunthorpe 2022-01-28 0:25 ` [PATCH v5 09/24] dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support Logan Gunthorpe 2022-01-28 0:25 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 10/24] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 11/24] nvme-pci: check DMA ops when indicating support for PCI P2PDMA Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 12/24] nvme-pci: convert to using dma_map_sgtable() Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 9:13 ` Chaitanya Kulkarni 2022-01-28 9:13 ` Chaitanya Kulkarni via iommu 2022-01-28 0:26 ` [PATCH v5 13/24] RDMA/core: introduce ib_dma_pci_p2p_dma_supported() Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 14/24] RDMA/rw: drop pci_p2pdma_[un]map_sg() Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 15/24] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 16/24] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 17/24] iov_iter: introduce iov_iter_get_pages_[alloc_]flags() Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 18/24] block: add check when merging zone device pages Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 19/24] lib/scatterlist: " Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 20/24] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages() Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 21/24] block: set FOLL_PCI_P2PDMA in bio_map_user_iov() Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 22/24] mm: use custom page_free for P2PDMA pages Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-28 14:22 ` Jason Gunthorpe 2022-01-28 14:22 ` Jason Gunthorpe 2022-01-28 16:52 ` Logan Gunthorpe 2022-01-28 16:52 ` Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe [this message] 2022-01-28 0:26 ` [PATCH v5 23/24] PCI/P2PDMA: Introduce pci_mmap_p2pmem() Logan Gunthorpe 2022-01-28 0:26 ` [PATCH v5 24/24] nvme-pci: allow mmaping the CMB in userspace Logan Gunthorpe 2022-01-28 0:26 ` Logan Gunthorpe 2022-01-31 18:56 ` [PATCH v5 00/24] Userspace P2PDMA with O_DIRECT NVMe devices Jonathan Derrick 2022-01-31 18:56 ` Jonathan Derrick 2022-01-31 19:00 ` Logan Gunthorpe 2022-01-31 19:00 ` Logan Gunthorpe
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220128002614.6136-24-logang@deltatee.com \ --to=logang@deltatee.com \ --cc=andrzej.jakowski@intel.com \ --cc=bhelgaas@google.com \ --cc=christian.koenig@amd.com \ --cc=ckulkarnilinux@gmail.com \ --cc=dan.j.williams@intel.com \ --cc=daniel.vetter@ffwll.ch \ --cc=dave.b.minturn@intel.com \ --cc=dave.hansen@linux.intel.com \ --cc=ddutile@redhat.com \ --cc=hch@lst.de \ --cc=helgaas@kernel.org \ --cc=iommu@lists.linux-foundation.org \ --cc=ira.weiny@intel.com \ --cc=jason@jlekstrand.net \ --cc=jgg@ziepe.ca \ --cc=jhubbard@nvidia.com \ --cc=jianxin.xiong@intel.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvme@lists.infradead.org \ --cc=linux-pci@vger.kernel.org \ --cc=martin.oliveira@eideticom.com \ --cc=rcampbell@nvidia.com \ --cc=robin.murphy@arm.com \ --cc=sbates@raithlin.com \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.