From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com [IPv6:2a00:1450:4010:c04::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 99FC71A1F5F for ; Sun, 24 Apr 2016 15:51:02 -0700 (PDT) Received: by mail-lb0-x22d.google.com with SMTP id ys16so69243878lbb.3 for ; Sun, 24 Apr 2016 15:51:02 -0700 (PDT) Date: Mon, 25 Apr 2016 01:50:57 +0300 From: "Kirill A. Shutemov" Subject: Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings Message-ID: <20160424225057.GA6670@node.shutemov.name> References: <1461370883-7664-1-git-send-email-toshi.kani@hpe.com> <1461370883-7664-2-git-send-email-toshi.kani@hpe.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1461370883-7664-2-git-send-email-toshi.kani@hpe.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Toshi Kani , Hugh Dickins Cc: tytso@mit.edu, jack@suse.cz, linux-nvdimm@lists.01.org, david@fromorbit.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, adilger.kernel@dilger.ca, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, mike.kravetz@oracle.com List-ID: On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote: > When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page > size. This feature relies on both mmap virtual address and FS > block (i.e. physical address) to be aligned by the pmd page size. > Users can use mkfs options to specify FS to align block allocations. > However, aligning mmap address requires code changes to existing > applications for providing a pmd-aligned address to mmap(). > > For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1]. > It calls mmap() with a NULL address, which needs to be changed to > provide a pmd-aligned address for testing with DAX pmd mappings. > Changing all applications that call mmap() with NULL is undesirable. > > Add thp_get_unmapped_area(), which can be called by filesystem's > get_unmapped_area to align an mmap address by the pmd size for > a DAX file. It calls the default handler, mm->get_unmapped_area(), > to find a range and then aligns it for a DAX file. > > thp_get_unmapped_area() can be extended for huge page cache support. > > The patch is based on Matthew Wilcox's change that allows adding > support of the pud page size easily. See Hugh's implementation: http://lkml.kernel.org/r/alpine.LSU.2.11.1604051420110.5965@eggly.anvils > [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c > Signed-off-by: Toshi Kani > Cc: Andrew Morton > Cc: Alexander Viro > Cc: Dan Williams > Cc: Matthew Wilcox > Cc: Ross Zwisler > Cc: Kirill A. Shutemov > Cc: Dave Chinner > Cc: Jan Kara > Cc: Theodore Ts'o > Cc: Andreas Dilger > Cc: Mike Kravetz > --- > include/linux/huge_mm.h | 7 +++++++ > mm/huge_memory.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 50 insertions(+) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 7008623..3769674 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma); > > extern unsigned long transparent_hugepage_flags; > > +extern unsigned long thp_get_unmapped_area(struct file *filp, > + unsigned long addr, unsigned long len, unsigned long pgoff, > + unsigned long flags); > + > extern void prep_transhuge_page(struct page *page); > extern void free_transhuge_page(struct page *page); > > @@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void); > #define transparent_hugepage_enabled(__vma) 0 > > #define transparent_hugepage_flags 0UL > + > +#define thp_get_unmapped_area NULL > + > static inline int > split_huge_page_to_list(struct page *page, struct list_head *list) > { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 86f9f8b..2181c7f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page) > set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR); > } > > +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len, > + loff_t off, unsigned long flags, unsigned long size) > +{ > + unsigned long addr; > + loff_t off_end = off + len; > + loff_t off_align = round_up(off, size); > + unsigned long len_pad; > + > + if (off_end <= off_align || (off_end - off_align) < size) > + return 0; > + > + len_pad = len + size; > + if (len_pad < len || (off + len_pad) < off) > + return 0; > + > + addr = current->mm->get_unmapped_area(filp, 0, len_pad, > + off >> PAGE_SHIFT, flags); > + if (IS_ERR_VALUE(addr)) > + return 0; > + > + addr += (off - addr) & (size - 1); > + return addr; Hugh has more sanity checks before and after call to get_unmapped_area(). Please, consider borrowing them. > +} > + > +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > + unsigned long len, unsigned long pgoff, unsigned long flags) > +{ > + loff_t off = (loff_t)pgoff << PAGE_SHIFT; > + > + if (addr) > + goto out; I think it's too strong reaction to hint, isn't it? We definately need this for MAP_FIXED. But in general? Maybe. > + if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) > + goto out; > + > + addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE); > + if (addr) > + return addr; > + > + out: > + return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); > +} > +EXPORT_SYMBOL_GPL(thp_get_unmapped_area); > + > static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, > struct vm_area_struct *vma, > unsigned long address, pmd_t *pmd, > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kirill A. Shutemov _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753265AbcDXWvF (ORCPT ); Sun, 24 Apr 2016 18:51:05 -0400 Received: from mail-lb0-f180.google.com ([209.85.217.180]:35604 "EHLO mail-lb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752106AbcDXWvD (ORCPT ); Sun, 24 Apr 2016 18:51:03 -0400 Date: Mon, 25 Apr 2016 01:50:57 +0300 From: "Kirill A. Shutemov" To: Toshi Kani , Hugh Dickins Cc: akpm@linux-foundation.org, dan.j.williams@intel.com, viro@zeniv.linux.org.uk, willy@linux.intel.com, ross.zwisler@linux.intel.com, kirill.shutemov@linux.intel.com, david@fromorbit.com, jack@suse.cz, tytso@mit.edu, adilger.kernel@dilger.ca, mike.kravetz@oracle.com, linux-nvdimm@ml01.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings Message-ID: <20160424225057.GA6670@node.shutemov.name> References: <1461370883-7664-1-git-send-email-toshi.kani@hpe.com> <1461370883-7664-2-git-send-email-toshi.kani@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1461370883-7664-2-git-send-email-toshi.kani@hpe.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote: > When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page > size. This feature relies on both mmap virtual address and FS > block (i.e. physical address) to be aligned by the pmd page size. > Users can use mkfs options to specify FS to align block allocations. > However, aligning mmap address requires code changes to existing > applications for providing a pmd-aligned address to mmap(). > > For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1]. > It calls mmap() with a NULL address, which needs to be changed to > provide a pmd-aligned address for testing with DAX pmd mappings. > Changing all applications that call mmap() with NULL is undesirable. > > Add thp_get_unmapped_area(), which can be called by filesystem's > get_unmapped_area to align an mmap address by the pmd size for > a DAX file. It calls the default handler, mm->get_unmapped_area(), > to find a range and then aligns it for a DAX file. > > thp_get_unmapped_area() can be extended for huge page cache support. > > The patch is based on Matthew Wilcox's change that allows adding > support of the pud page size easily. See Hugh's implementation: http://lkml.kernel.org/r/alpine.LSU.2.11.1604051420110.5965@eggly.anvils > [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c > Signed-off-by: Toshi Kani > Cc: Andrew Morton > Cc: Alexander Viro > Cc: Dan Williams > Cc: Matthew Wilcox > Cc: Ross Zwisler > Cc: Kirill A. Shutemov > Cc: Dave Chinner > Cc: Jan Kara > Cc: Theodore Ts'o > Cc: Andreas Dilger > Cc: Mike Kravetz > --- > include/linux/huge_mm.h | 7 +++++++ > mm/huge_memory.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 50 insertions(+) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 7008623..3769674 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma); > > extern unsigned long transparent_hugepage_flags; > > +extern unsigned long thp_get_unmapped_area(struct file *filp, > + unsigned long addr, unsigned long len, unsigned long pgoff, > + unsigned long flags); > + > extern void prep_transhuge_page(struct page *page); > extern void free_transhuge_page(struct page *page); > > @@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void); > #define transparent_hugepage_enabled(__vma) 0 > > #define transparent_hugepage_flags 0UL > + > +#define thp_get_unmapped_area NULL > + > static inline int > split_huge_page_to_list(struct page *page, struct list_head *list) > { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 86f9f8b..2181c7f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page) > set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR); > } > > +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len, > + loff_t off, unsigned long flags, unsigned long size) > +{ > + unsigned long addr; > + loff_t off_end = off + len; > + loff_t off_align = round_up(off, size); > + unsigned long len_pad; > + > + if (off_end <= off_align || (off_end - off_align) < size) > + return 0; > + > + len_pad = len + size; > + if (len_pad < len || (off + len_pad) < off) > + return 0; > + > + addr = current->mm->get_unmapped_area(filp, 0, len_pad, > + off >> PAGE_SHIFT, flags); > + if (IS_ERR_VALUE(addr)) > + return 0; > + > + addr += (off - addr) & (size - 1); > + return addr; Hugh has more sanity checks before and after call to get_unmapped_area(). Please, consider borrowing them. > +} > + > +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > + unsigned long len, unsigned long pgoff, unsigned long flags) > +{ > + loff_t off = (loff_t)pgoff << PAGE_SHIFT; > + > + if (addr) > + goto out; I think it's too strong reaction to hint, isn't it? We definately need this for MAP_FIXED. But in general? Maybe. > + if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) > + goto out; > + > + addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE); > + if (addr) > + return addr; > + > + out: > + return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); > +} > +EXPORT_SYMBOL_GPL(thp_get_unmapped_area); > + > static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, > struct vm_area_struct *vma, > unsigned long address, pmd_t *pmd, > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kirill A. Shutemov From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 25 Apr 2016 01:50:57 +0300 From: "Kirill A. Shutemov" To: Toshi Kani , Hugh Dickins Cc: akpm@linux-foundation.org, dan.j.williams@intel.com, viro@zeniv.linux.org.uk, willy@linux.intel.com, ross.zwisler@linux.intel.com, kirill.shutemov@linux.intel.com, david@fromorbit.com, jack@suse.cz, tytso@mit.edu, adilger.kernel@dilger.ca, mike.kravetz@oracle.com, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings Message-ID: <20160424225057.GA6670@node.shutemov.name> References: <1461370883-7664-1-git-send-email-toshi.kani@hpe.com> <1461370883-7664-2-git-send-email-toshi.kani@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1461370883-7664-2-git-send-email-toshi.kani@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote: > When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page > size. This feature relies on both mmap virtual address and FS > block (i.e. physical address) to be aligned by the pmd page size. > Users can use mkfs options to specify FS to align block allocations. > However, aligning mmap address requires code changes to existing > applications for providing a pmd-aligned address to mmap(). > > For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1]. > It calls mmap() with a NULL address, which needs to be changed to > provide a pmd-aligned address for testing with DAX pmd mappings. > Changing all applications that call mmap() with NULL is undesirable. > > Add thp_get_unmapped_area(), which can be called by filesystem's > get_unmapped_area to align an mmap address by the pmd size for > a DAX file. It calls the default handler, mm->get_unmapped_area(), > to find a range and then aligns it for a DAX file. > > thp_get_unmapped_area() can be extended for huge page cache support. > > The patch is based on Matthew Wilcox's change that allows adding > support of the pud page size easily. See Hugh's implementation: http://lkml.kernel.org/r/alpine.LSU.2.11.1604051420110.5965@eggly.anvils > [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c > Signed-off-by: Toshi Kani > Cc: Andrew Morton > Cc: Alexander Viro > Cc: Dan Williams > Cc: Matthew Wilcox > Cc: Ross Zwisler > Cc: Kirill A. Shutemov > Cc: Dave Chinner > Cc: Jan Kara > Cc: Theodore Ts'o > Cc: Andreas Dilger > Cc: Mike Kravetz > --- > include/linux/huge_mm.h | 7 +++++++ > mm/huge_memory.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 50 insertions(+) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 7008623..3769674 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma); > > extern unsigned long transparent_hugepage_flags; > > +extern unsigned long thp_get_unmapped_area(struct file *filp, > + unsigned long addr, unsigned long len, unsigned long pgoff, > + unsigned long flags); > + > extern void prep_transhuge_page(struct page *page); > extern void free_transhuge_page(struct page *page); > > @@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void); > #define transparent_hugepage_enabled(__vma) 0 > > #define transparent_hugepage_flags 0UL > + > +#define thp_get_unmapped_area NULL > + > static inline int > split_huge_page_to_list(struct page *page, struct list_head *list) > { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 86f9f8b..2181c7f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page) > set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR); > } > > +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len, > + loff_t off, unsigned long flags, unsigned long size) > +{ > + unsigned long addr; > + loff_t off_end = off + len; > + loff_t off_align = round_up(off, size); > + unsigned long len_pad; > + > + if (off_end <= off_align || (off_end - off_align) < size) > + return 0; > + > + len_pad = len + size; > + if (len_pad < len || (off + len_pad) < off) > + return 0; > + > + addr = current->mm->get_unmapped_area(filp, 0, len_pad, > + off >> PAGE_SHIFT, flags); > + if (IS_ERR_VALUE(addr)) > + return 0; > + > + addr += (off - addr) & (size - 1); > + return addr; Hugh has more sanity checks before and after call to get_unmapped_area(). Please, consider borrowing them. > +} > + > +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > + unsigned long len, unsigned long pgoff, unsigned long flags) > +{ > + loff_t off = (loff_t)pgoff << PAGE_SHIFT; > + > + if (addr) > + goto out; I think it's too strong reaction to hint, isn't it? We definately need this for MAP_FIXED. But in general? Maybe. > + if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) > + goto out; > + > + addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE); > + if (addr) > + return addr; > + > + out: > + return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); > +} > +EXPORT_SYMBOL_GPL(thp_get_unmapped_area); > + > static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, > struct vm_area_struct *vma, > unsigned long address, pmd_t *pmd, > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org