From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933230AbdJaX27 (ORCPT ); Tue, 31 Oct 2017 19:28:59 -0400 Received: from mga06.intel.com ([134.134.136.31]:51062 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933163AbdJaX2z (ORCPT ); Tue, 31 Oct 2017 19:28:55 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,326,1505804400"; d="scan'208";a="330306675" Subject: [PATCH 10/15] IB/core: disable memory registration of fileystem-dax vmas From: Dan Williams To: linux-nvdimm@lists.01.org Cc: Sean Hefty , linux-xfs@vger.kernel.org, akpm@linux-foundation.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, Jeff Moyer , stable@vger.kernel.org, hch@lst.de, Jason Gunthorpe , linux-mm@kvack.org, Doug Ledford , linux-fsdevel@vger.kernel.org, Ross Zwisler , Hal Rosenstock Date: Tue, 31 Oct 2017 16:22:29 -0700 Message-ID: <150949214929.24061.10464887309708944817.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <150949209290.24061.6283157778959640151.stgit@dwillia2-desk3.amr.corp.intel.com> References: <150949209290.24061.6283157778959640151.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow RDMA to create long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented, and is limited to non-ODP (On Demand Paging) capable RDMA devices. Cc: Sean Hefty Cc: Doug Ledford Cc: Hal Rosenstock Cc: Jeff Moyer Cc: Christoph Hellwig Cc: Ross Zwisler Cc: Jason Gunthorpe Cc: Cc: Signed-off-by: Dan Williams --- drivers/infiniband/core/umem.c | 49 +++++++++++++++++++++++++++++++--------- 1 file changed, 38 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 21e60b1e2ff4..c30d286c1f24 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -147,19 +147,21 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, umem->hugetlb = 1; page_list = (struct page **) __get_free_page(GFP_KERNEL); - if (!page_list) { - put_pid(umem->pid); - kfree(umem); - return ERR_PTR(-ENOMEM); - } + if (!page_list) + goto err_pagelist; /* - * if we can't alloc the vma_list, it's not so bad; - * just assume the memory is not hugetlb memory + * If DAX is enabled we need the vma to protect against + * registering filesystem-dax memory. Otherwise we can tolerate + * a failure to allocate the vma_list and just assume that all + * vmas are not hugetlb-vmas. */ vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL); - if (!vma_list) + if (!vma_list) { + if (IS_ENABLED(CONFIG_FS_DAX)) + goto err_vmalist; umem->hugetlb = 0; + } npages = ib_umem_num_pages(umem); @@ -199,15 +201,34 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, if (ret < 0) goto out; - umem->npages += ret; cur_base += ret * PAGE_SIZE; npages -= ret; for_each_sg(sg_list_start, sg, ret, i) { - if (vma_list && !is_vm_hugetlb_page(vma_list[i])) - umem->hugetlb = 0; + struct vm_area_struct *vma; + struct inode *inode; sg_set_page(sg, page_list[i], PAGE_SIZE, 0); + umem->npages++; + + if (!vma_list) + continue; + vma = vma_list[i]; + + if (!is_vm_hugetlb_page(vma)) + umem->hugetlb = 0; + + if (!vma_is_dax(vma)) + continue; + + /* device-dax is safe for rdma... */ + inode = file_inode(vma->vm_file); + if (inode->i_mode == S_IFCHR) + continue; + + /* ...filesystem-dax is not. */ + ret = -EOPNOTSUPP; + goto out; } /* preparing for next loop */ @@ -242,6 +263,12 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, free_page((unsigned long) page_list); return ret < 0 ? ERR_PTR(ret) : umem; +err_vmalist: + free_page((unsigned long) page_list); +err_pagelist: + put_pid(umem->pid); + kfree(umem); + return ERR_PTR(-ENOMEM); } EXPORT_SYMBOL(ib_umem_get);