All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] Align mmap address for DAX pmd mappings
@ 2016-04-23  0:21 ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-23  0:21 UTC (permalink / raw)
  To: akpm
  Cc: tytso, jack, linux-nvdimm, david, linux-kernel, linux-mm,
	adilger.kernel, viro, linux-fsdevel, kirill.shutemov,
	mike.kravetz

When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
size.  This feature relies on both mmap virtual address and FS
block (i.e. physical address) to be aligned by the pmd page size.
Users can use mkfs options to specify FS to align block allocations.
However, aligning mmap address requires code changes to existing
applications for providing a pmd-aligned address to mmap().

For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
It calls mmap() with a NULL address, which needs to be changed to
provide a pmd-aligned address for testing with DAX pmd mappings.
Changing all applications that call mmap() with NULL is undesirable.

This patch-set extends filesystems to align an mmap address for
a DAX file so that unmodified applications can use DAX pmd mappings.

[1]: https://github.com/axboe/fio/blob/master/engines/mmap.c

v4:
 - Use loff_t for offset and cast before shift (Jan Kara)
 - Remove redundant paranthesis (Jan Kara)
 - Allow integration with huge page cache support (Matthew Wilcox)
 - Prepare for PUD mapping support (Mike Kravetz, Matthew Wilcox)

v3:
 - Check overflow condition to offset + length. (Matthew Wilcox)
 - Remove indent by using gotos. (Matthew Wilcox)
 - Define dax_get_unmapped_area to NULL when CONFIG_FS_DAX is unset.
   (Matthew Wilcox)
 - Squash all filesystem patches together. (Matthew Wilcox)

v2:
 - Change filesystems to provide their get_unmapped_area().
   (Matthew Wilcox)
 - Add more description about the benefit. (Matthew Wilcox)

---
Toshi Kani (2):
 1/2 thp, dax: add thp_get_unmapped_area for pmd mappings
 2/2 ext2/4, xfs, blk: call thp_get_unmapped_area() for pmd mappings

---
 fs/block_dev.c          |  1 +
 fs/ext2/file.c          |  1 +
 fs/ext4/file.c          |  1 +
 fs/xfs/xfs_file.c       |  1 +
 include/linux/huge_mm.h |  7 +++++++
 mm/huge_memory.c        | 43 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 54 insertions(+)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 0/2] Align mmap address for DAX pmd mappings
@ 2016-04-23  0:21 ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-23  0:21 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, toshi.kani,
	linux-nvdimm, linux-fsdevel, linux-mm, linux-kernel

When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
size.  This feature relies on both mmap virtual address and FS
block (i.e. physical address) to be aligned by the pmd page size.
Users can use mkfs options to specify FS to align block allocations.
However, aligning mmap address requires code changes to existing
applications for providing a pmd-aligned address to mmap().

For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
It calls mmap() with a NULL address, which needs to be changed to
provide a pmd-aligned address for testing with DAX pmd mappings.
Changing all applications that call mmap() with NULL is undesirable.

This patch-set extends filesystems to align an mmap address for
a DAX file so that unmodified applications can use DAX pmd mappings.

[1]: https://github.com/axboe/fio/blob/master/engines/mmap.c

v4:
 - Use loff_t for offset and cast before shift (Jan Kara)
 - Remove redundant paranthesis (Jan Kara)
 - Allow integration with huge page cache support (Matthew Wilcox)
 - Prepare for PUD mapping support (Mike Kravetz, Matthew Wilcox)

v3:
 - Check overflow condition to offset + length. (Matthew Wilcox)
 - Remove indent by using gotos. (Matthew Wilcox)
 - Define dax_get_unmapped_area to NULL when CONFIG_FS_DAX is unset.
   (Matthew Wilcox)
 - Squash all filesystem patches together. (Matthew Wilcox)

v2:
 - Change filesystems to provide their get_unmapped_area().
   (Matthew Wilcox)
 - Add more description about the benefit. (Matthew Wilcox)

---
Toshi Kani (2):
 1/2 thp, dax: add thp_get_unmapped_area for pmd mappings
 2/2 ext2/4, xfs, blk: call thp_get_unmapped_area() for pmd mappings

---
 fs/block_dev.c          |  1 +
 fs/ext2/file.c          |  1 +
 fs/ext4/file.c          |  1 +
 fs/xfs/xfs_file.c       |  1 +
 include/linux/huge_mm.h |  7 +++++++
 mm/huge_memory.c        | 43 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 54 insertions(+)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 0/2] Align mmap address for DAX pmd mappings
@ 2016-04-23  0:21 ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-23  0:21 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, toshi.kani,
	linux-nvdimm, linux-fsdevel, linux-mm, linux-kernel

When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
size.  This feature relies on both mmap virtual address and FS
block (i.e. physical address) to be aligned by the pmd page size.
Users can use mkfs options to specify FS to align block allocations.
However, aligning mmap address requires code changes to existing
applications for providing a pmd-aligned address to mmap().

For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
It calls mmap() with a NULL address, which needs to be changed to
provide a pmd-aligned address for testing with DAX pmd mappings.
Changing all applications that call mmap() with NULL is undesirable.

This patch-set extends filesystems to align an mmap address for
a DAX file so that unmodified applications can use DAX pmd mappings.

[1]: https://github.com/axboe/fio/blob/master/engines/mmap.c

v4:
 - Use loff_t for offset and cast before shift (Jan Kara)
 - Remove redundant paranthesis (Jan Kara)
 - Allow integration with huge page cache support (Matthew Wilcox)
 - Prepare for PUD mapping support (Mike Kravetz, Matthew Wilcox)

v3:
 - Check overflow condition to offset + length. (Matthew Wilcox)
 - Remove indent by using gotos. (Matthew Wilcox)
 - Define dax_get_unmapped_area to NULL when CONFIG_FS_DAX is unset.
   (Matthew Wilcox)
 - Squash all filesystem patches together. (Matthew Wilcox)

v2:
 - Change filesystems to provide their get_unmapped_area().
   (Matthew Wilcox)
 - Add more description about the benefit. (Matthew Wilcox)

---
Toshi Kani (2):
 1/2 thp, dax: add thp_get_unmapped_area for pmd mappings
 2/2 ext2/4, xfs, blk: call thp_get_unmapped_area() for pmd mappings

---
 fs/block_dev.c          |  1 +
 fs/ext2/file.c          |  1 +
 fs/ext4/file.c          |  1 +
 fs/xfs/xfs_file.c       |  1 +
 include/linux/huge_mm.h |  7 +++++++
 mm/huge_memory.c        | 43 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 54 insertions(+)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
  2016-04-23  0:21 ` Toshi Kani
  (?)
@ 2016-04-23  0:21   ` Toshi Kani
  -1 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-23  0:21 UTC (permalink / raw)
  To: akpm
  Cc: tytso, jack, linux-nvdimm, david, linux-kernel, linux-mm,
	adilger.kernel, viro, linux-fsdevel, kirill.shutemov,
	mike.kravetz

When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
size.  This feature relies on both mmap virtual address and FS
block (i.e. physical address) to be aligned by the pmd page size.
Users can use mkfs options to specify FS to align block allocations.
However, aligning mmap address requires code changes to existing
applications for providing a pmd-aligned address to mmap().

For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
It calls mmap() with a NULL address, which needs to be changed to
provide a pmd-aligned address for testing with DAX pmd mappings.
Changing all applications that call mmap() with NULL is undesirable.

Add thp_get_unmapped_area(), which can be called by filesystem's
get_unmapped_area to align an mmap address by the pmd size for
a DAX file.  It calls the default handler, mm->get_unmapped_area(),
to find a range and then aligns it for a DAX file.

thp_get_unmapped_area() can be extended for huge page cache support.

The patch is based on Matthew Wilcox's change that allows adding
support of the pud page size easily.

[1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/huge_mm.h |    7 +++++++
 mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7008623..3769674 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
 
 extern unsigned long transparent_hugepage_flags;
 
+extern unsigned long thp_get_unmapped_area(struct file *filp,
+		unsigned long addr, unsigned long len, unsigned long pgoff,
+		unsigned long flags);
+
 extern void prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 
@@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void);
 #define transparent_hugepage_enabled(__vma) 0
 
 #define transparent_hugepage_flags 0UL
+
+#define thp_get_unmapped_area	NULL
+
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86f9f8b..2181c7f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page)
 	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
 }
 
+unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
+		loff_t off, unsigned long flags, unsigned long size)
+{
+	unsigned long addr;
+	loff_t off_end = off + len;
+	loff_t off_align = round_up(off, size);
+	unsigned long len_pad;
+
+	if (off_end <= off_align || (off_end - off_align) < size)
+		return 0;
+
+	len_pad = len + size;
+	if (len_pad < len || (off + len_pad) < off)
+		return 0;
+
+	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
+					      off >> PAGE_SHIFT, flags);
+	if (IS_ERR_VALUE(addr))
+		return 0;
+
+	addr += (off - addr) & (size - 1);
+	return addr;
+}
+
+unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
+		unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
+
+	if (addr)
+		goto out;
+	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
+		goto out;
+
+	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
+	if (addr)
+		return addr;
+
+ out:
+	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+}
+EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
+
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
 					unsigned long address, pmd_t *pmd,
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
@ 2016-04-23  0:21   ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-23  0:21 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, toshi.kani,
	linux-nvdimm, linux-fsdevel, linux-mm, linux-kernel

When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
size.  This feature relies on both mmap virtual address and FS
block (i.e. physical address) to be aligned by the pmd page size.
Users can use mkfs options to specify FS to align block allocations.
However, aligning mmap address requires code changes to existing
applications for providing a pmd-aligned address to mmap().

For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
It calls mmap() with a NULL address, which needs to be changed to
provide a pmd-aligned address for testing with DAX pmd mappings.
Changing all applications that call mmap() with NULL is undesirable.

Add thp_get_unmapped_area(), which can be called by filesystem's
get_unmapped_area to align an mmap address by the pmd size for
a DAX file.  It calls the default handler, mm->get_unmapped_area(),
to find a range and then aligns it for a DAX file.

thp_get_unmapped_area() can be extended for huge page cache support.

The patch is based on Matthew Wilcox's change that allows adding
support of the pud page size easily.

[1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/huge_mm.h |    7 +++++++
 mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7008623..3769674 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
 
 extern unsigned long transparent_hugepage_flags;
 
+extern unsigned long thp_get_unmapped_area(struct file *filp,
+		unsigned long addr, unsigned long len, unsigned long pgoff,
+		unsigned long flags);
+
 extern void prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 
@@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void);
 #define transparent_hugepage_enabled(__vma) 0
 
 #define transparent_hugepage_flags 0UL
+
+#define thp_get_unmapped_area	NULL
+
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86f9f8b..2181c7f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page)
 	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
 }
 
+unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
+		loff_t off, unsigned long flags, unsigned long size)
+{
+	unsigned long addr;
+	loff_t off_end = off + len;
+	loff_t off_align = round_up(off, size);
+	unsigned long len_pad;
+
+	if (off_end <= off_align || (off_end - off_align) < size)
+		return 0;
+
+	len_pad = len + size;
+	if (len_pad < len || (off + len_pad) < off)
+		return 0;
+
+	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
+					      off >> PAGE_SHIFT, flags);
+	if (IS_ERR_VALUE(addr))
+		return 0;
+
+	addr += (off - addr) & (size - 1);
+	return addr;
+}
+
+unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
+		unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
+
+	if (addr)
+		goto out;
+	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
+		goto out;
+
+	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
+	if (addr)
+		return addr;
+
+ out:
+	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+}
+EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
+
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
 					unsigned long address, pmd_t *pmd,

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
@ 2016-04-23  0:21   ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-23  0:21 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, toshi.kani,
	linux-nvdimm, linux-fsdevel, linux-mm, linux-kernel

When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
size.  This feature relies on both mmap virtual address and FS
block (i.e. physical address) to be aligned by the pmd page size.
Users can use mkfs options to specify FS to align block allocations.
However, aligning mmap address requires code changes to existing
applications for providing a pmd-aligned address to mmap().

For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
It calls mmap() with a NULL address, which needs to be changed to
provide a pmd-aligned address for testing with DAX pmd mappings.
Changing all applications that call mmap() with NULL is undesirable.

Add thp_get_unmapped_area(), which can be called by filesystem's
get_unmapped_area to align an mmap address by the pmd size for
a DAX file.  It calls the default handler, mm->get_unmapped_area(),
to find a range and then aligns it for a DAX file.

thp_get_unmapped_area() can be extended for huge page cache support.

The patch is based on Matthew Wilcox's change that allows adding
support of the pud page size easily.

[1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/huge_mm.h |    7 +++++++
 mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7008623..3769674 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
 
 extern unsigned long transparent_hugepage_flags;
 
+extern unsigned long thp_get_unmapped_area(struct file *filp,
+		unsigned long addr, unsigned long len, unsigned long pgoff,
+		unsigned long flags);
+
 extern void prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 
@@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void);
 #define transparent_hugepage_enabled(__vma) 0
 
 #define transparent_hugepage_flags 0UL
+
+#define thp_get_unmapped_area	NULL
+
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86f9f8b..2181c7f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page)
 	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
 }
 
+unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
+		loff_t off, unsigned long flags, unsigned long size)
+{
+	unsigned long addr;
+	loff_t off_end = off + len;
+	loff_t off_align = round_up(off, size);
+	unsigned long len_pad;
+
+	if (off_end <= off_align || (off_end - off_align) < size)
+		return 0;
+
+	len_pad = len + size;
+	if (len_pad < len || (off + len_pad) < off)
+		return 0;
+
+	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
+					      off >> PAGE_SHIFT, flags);
+	if (IS_ERR_VALUE(addr))
+		return 0;
+
+	addr += (off - addr) & (size - 1);
+	return addr;
+}
+
+unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
+		unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
+
+	if (addr)
+		goto out;
+	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
+		goto out;
+
+	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
+	if (addr)
+		return addr;
+
+ out:
+	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+}
+EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
+
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
 					unsigned long address, pmd_t *pmd,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/2] ext2/4, xfs, blk: call thp_get_unmapped_area() for pmd mappings
  2016-04-23  0:21 ` Toshi Kani
  (?)
@ 2016-04-23  0:21   ` Toshi Kani
  -1 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-23  0:21 UTC (permalink / raw)
  To: akpm
  Cc: tytso, jack, linux-nvdimm, david, linux-kernel, linux-mm,
	adilger.kernel, viro, linux-fsdevel, kirill.shutemov,
	mike.kravetz

To support DAX pmd mappings with unmodified applications,
filesystems need to align an mmap address by the pmd size.

Call thp_get_unmapped_area() from f_op->get_unmapped_area.

Note, there is no change in behavior for a non-DAX file.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/block_dev.c    |    1 +
 fs/ext2/file.c    |    1 +
 fs/ext4/file.c    |    1 +
 fs/xfs/xfs_file.c |    1 +
 4 files changed, 4 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..a4aa26d 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1798,6 +1798,7 @@ const struct file_operations def_blk_fops = {
 	.write_iter	= blkdev_write_iter,
 	.mmap		= blkdev_mmap,
 	.fsync		= blkdev_fsync,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.unlocked_ioctl	= block_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= compat_blkdev_ioctl,
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index c1400b1..92209a1 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -172,6 +172,7 @@ const struct file_operations ext2_file_operations = {
 	.open		= dquot_file_open,
 	.release	= ext2_release_file,
 	.fsync		= ext2_fsync,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 };
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index fa2208b..ceb8279 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -708,6 +708,7 @@ const struct file_operations ext4_file_operations = {
 	.open		= ext4_file_open,
 	.release	= ext4_release_file,
 	.fsync		= ext4_sync_file,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.fallocate	= ext4_fallocate,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 569938a..0225927 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1708,6 +1708,7 @@ const struct file_operations xfs_file_operations = {
 	.open		= xfs_file_open,
 	.release	= xfs_file_release,
 	.fsync		= xfs_file_fsync,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.fallocate	= xfs_file_fallocate,
 };
 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/2] ext2/4, xfs, blk: call thp_get_unmapped_area() for pmd mappings
@ 2016-04-23  0:21   ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-23  0:21 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, toshi.kani,
	linux-nvdimm, linux-fsdevel, linux-mm, linux-kernel

To support DAX pmd mappings with unmodified applications,
filesystems need to align an mmap address by the pmd size.

Call thp_get_unmapped_area() from f_op->get_unmapped_area.

Note, there is no change in behavior for a non-DAX file.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/block_dev.c    |    1 +
 fs/ext2/file.c    |    1 +
 fs/ext4/file.c    |    1 +
 fs/xfs/xfs_file.c |    1 +
 4 files changed, 4 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..a4aa26d 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1798,6 +1798,7 @@ const struct file_operations def_blk_fops = {
 	.write_iter	= blkdev_write_iter,
 	.mmap		= blkdev_mmap,
 	.fsync		= blkdev_fsync,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.unlocked_ioctl	= block_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= compat_blkdev_ioctl,
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index c1400b1..92209a1 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -172,6 +172,7 @@ const struct file_operations ext2_file_operations = {
 	.open		= dquot_file_open,
 	.release	= ext2_release_file,
 	.fsync		= ext2_fsync,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 };
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index fa2208b..ceb8279 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -708,6 +708,7 @@ const struct file_operations ext4_file_operations = {
 	.open		= ext4_file_open,
 	.release	= ext4_release_file,
 	.fsync		= ext4_sync_file,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.fallocate	= ext4_fallocate,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 569938a..0225927 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1708,6 +1708,7 @@ const struct file_operations xfs_file_operations = {
 	.open		= xfs_file_open,
 	.release	= xfs_file_release,
 	.fsync		= xfs_file_fsync,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.fallocate	= xfs_file_fallocate,
 };
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/2] ext2/4, xfs, blk: call thp_get_unmapped_area() for pmd mappings
@ 2016-04-23  0:21   ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-23  0:21 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, toshi.kani,
	linux-nvdimm, linux-fsdevel, linux-mm, linux-kernel

To support DAX pmd mappings with unmodified applications,
filesystems need to align an mmap address by the pmd size.

Call thp_get_unmapped_area() from f_op->get_unmapped_area.

Note, there is no change in behavior for a non-DAX file.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/block_dev.c    |    1 +
 fs/ext2/file.c    |    1 +
 fs/ext4/file.c    |    1 +
 fs/xfs/xfs_file.c |    1 +
 4 files changed, 4 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..a4aa26d 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1798,6 +1798,7 @@ const struct file_operations def_blk_fops = {
 	.write_iter	= blkdev_write_iter,
 	.mmap		= blkdev_mmap,
 	.fsync		= blkdev_fsync,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.unlocked_ioctl	= block_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= compat_blkdev_ioctl,
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index c1400b1..92209a1 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -172,6 +172,7 @@ const struct file_operations ext2_file_operations = {
 	.open		= dquot_file_open,
 	.release	= ext2_release_file,
 	.fsync		= ext2_fsync,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 };
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index fa2208b..ceb8279 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -708,6 +708,7 @@ const struct file_operations ext4_file_operations = {
 	.open		= ext4_file_open,
 	.release	= ext4_release_file,
 	.fsync		= ext4_sync_file,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.fallocate	= ext4_fallocate,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 569938a..0225927 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1708,6 +1708,7 @@ const struct file_operations xfs_file_operations = {
 	.open		= xfs_file_open,
 	.release	= xfs_file_release,
 	.fsync		= xfs_file_fsync,
+	.get_unmapped_area = thp_get_unmapped_area,
 	.fallocate	= xfs_file_fallocate,
 };
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
  2016-04-23  0:21   ` Toshi Kani
  (?)
@ 2016-04-24 22:50     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2016-04-24 22:50 UTC (permalink / raw)
  To: Toshi Kani, Hugh Dickins
  Cc: tytso, jack, linux-nvdimm, david, linux-kernel, linux-mm,
	adilger.kernel, viro, linux-fsdevel, akpm, kirill.shutemov,
	mike.kravetz

On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
> size.  This feature relies on both mmap virtual address and FS
> block (i.e. physical address) to be aligned by the pmd page size.
> Users can use mkfs options to specify FS to align block allocations.
> However, aligning mmap address requires code changes to existing
> applications for providing a pmd-aligned address to mmap().
> 
> For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
> It calls mmap() with a NULL address, which needs to be changed to
> provide a pmd-aligned address for testing with DAX pmd mappings.
> Changing all applications that call mmap() with NULL is undesirable.
> 
> Add thp_get_unmapped_area(), which can be called by filesystem's
> get_unmapped_area to align an mmap address by the pmd size for
> a DAX file.  It calls the default handler, mm->get_unmapped_area(),
> to find a range and then aligns it for a DAX file.
> 
> thp_get_unmapped_area() can be extended for huge page cache support.
> 
> The patch is based on Matthew Wilcox's change that allows adding
> support of the pud page size easily.

See Hugh's implementation:

http://lkml.kernel.org/r/alpine.LSU.2.11.1604051420110.5965@eggly.anvils

> [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@linux.intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  include/linux/huge_mm.h |    7 +++++++
>  mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7008623..3769674 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
>  
>  extern unsigned long transparent_hugepage_flags;
>  
> +extern unsigned long thp_get_unmapped_area(struct file *filp,
> +		unsigned long addr, unsigned long len, unsigned long pgoff,
> +		unsigned long flags);
> +
>  extern void prep_transhuge_page(struct page *page);
>  extern void free_transhuge_page(struct page *page);
>  
> @@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void);
>  #define transparent_hugepage_enabled(__vma) 0
>  
>  #define transparent_hugepage_flags 0UL
> +
> +#define thp_get_unmapped_area	NULL
> +
>  static inline int
>  split_huge_page_to_list(struct page *page, struct list_head *list)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 86f9f8b..2181c7f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page)
>  	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
>  }
>  
> +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
> +		loff_t off, unsigned long flags, unsigned long size)
> +{
> +	unsigned long addr;
> +	loff_t off_end = off + len;
> +	loff_t off_align = round_up(off, size);
> +	unsigned long len_pad;
> +
> +	if (off_end <= off_align || (off_end - off_align) < size)
> +		return 0;
> +
> +	len_pad = len + size;
> +	if (len_pad < len || (off + len_pad) < off)
> +		return 0;
> +
> +	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> +					      off >> PAGE_SHIFT, flags);
> +	if (IS_ERR_VALUE(addr))
> +		return 0;
> +
> +	addr += (off - addr) & (size - 1);
> +	return addr;

Hugh has more sanity checks before and after call to get_unmapped_area().
Please, consider borrowing them.

> +}
> +
> +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> +	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> +
> +	if (addr)
> +		goto out;

I think it's too strong reaction to hint, isn't it?
We definately need this for MAP_FIXED. But in general? Maybe.

> +	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
> +		goto out;
> +
> +	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
> +	if (addr)
> +		return addr;
> +
> + out:
> +	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> +}
> +EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
> +
>  static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  					struct vm_area_struct *vma,
>  					unsigned long address, pmd_t *pmd,
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
@ 2016-04-24 22:50     ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2016-04-24 22:50 UTC (permalink / raw)
  To: Toshi Kani, Hugh Dickins
  Cc: akpm, dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, linux-nvdimm,
	linux-fsdevel, linux-mm, linux-kernel

On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
> size.  This feature relies on both mmap virtual address and FS
> block (i.e. physical address) to be aligned by the pmd page size.
> Users can use mkfs options to specify FS to align block allocations.
> However, aligning mmap address requires code changes to existing
> applications for providing a pmd-aligned address to mmap().
> 
> For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
> It calls mmap() with a NULL address, which needs to be changed to
> provide a pmd-aligned address for testing with DAX pmd mappings.
> Changing all applications that call mmap() with NULL is undesirable.
> 
> Add thp_get_unmapped_area(), which can be called by filesystem's
> get_unmapped_area to align an mmap address by the pmd size for
> a DAX file.  It calls the default handler, mm->get_unmapped_area(),
> to find a range and then aligns it for a DAX file.
> 
> thp_get_unmapped_area() can be extended for huge page cache support.
> 
> The patch is based on Matthew Wilcox's change that allows adding
> support of the pud page size easily.

See Hugh's implementation:

http://lkml.kernel.org/r/alpine.LSU.2.11.1604051420110.5965@eggly.anvils

> [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@linux.intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  include/linux/huge_mm.h |    7 +++++++
>  mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7008623..3769674 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
>  
>  extern unsigned long transparent_hugepage_flags;
>  
> +extern unsigned long thp_get_unmapped_area(struct file *filp,
> +		unsigned long addr, unsigned long len, unsigned long pgoff,
> +		unsigned long flags);
> +
>  extern void prep_transhuge_page(struct page *page);
>  extern void free_transhuge_page(struct page *page);
>  
> @@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void);
>  #define transparent_hugepage_enabled(__vma) 0
>  
>  #define transparent_hugepage_flags 0UL
> +
> +#define thp_get_unmapped_area	NULL
> +
>  static inline int
>  split_huge_page_to_list(struct page *page, struct list_head *list)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 86f9f8b..2181c7f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page)
>  	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
>  }
>  
> +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
> +		loff_t off, unsigned long flags, unsigned long size)
> +{
> +	unsigned long addr;
> +	loff_t off_end = off + len;
> +	loff_t off_align = round_up(off, size);
> +	unsigned long len_pad;
> +
> +	if (off_end <= off_align || (off_end - off_align) < size)
> +		return 0;
> +
> +	len_pad = len + size;
> +	if (len_pad < len || (off + len_pad) < off)
> +		return 0;
> +
> +	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> +					      off >> PAGE_SHIFT, flags);
> +	if (IS_ERR_VALUE(addr))
> +		return 0;
> +
> +	addr += (off - addr) & (size - 1);
> +	return addr;

Hugh has more sanity checks before and after call to get_unmapped_area().
Please, consider borrowing them.

> +}
> +
> +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> +	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> +
> +	if (addr)
> +		goto out;

I think it's too strong reaction to hint, isn't it?
We definately need this for MAP_FIXED. But in general? Maybe.

> +	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
> +		goto out;
> +
> +	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
> +	if (addr)
> +		return addr;
> +
> + out:
> +	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> +}
> +EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
> +
>  static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  					struct vm_area_struct *vma,
>  					unsigned long address, pmd_t *pmd,
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
@ 2016-04-24 22:50     ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2016-04-24 22:50 UTC (permalink / raw)
  To: Toshi Kani, Hugh Dickins
  Cc: akpm, dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, linux-nvdimm,
	linux-fsdevel, linux-mm, linux-kernel

On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
> size.  This feature relies on both mmap virtual address and FS
> block (i.e. physical address) to be aligned by the pmd page size.
> Users can use mkfs options to specify FS to align block allocations.
> However, aligning mmap address requires code changes to existing
> applications for providing a pmd-aligned address to mmap().
> 
> For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
> It calls mmap() with a NULL address, which needs to be changed to
> provide a pmd-aligned address for testing with DAX pmd mappings.
> Changing all applications that call mmap() with NULL is undesirable.
> 
> Add thp_get_unmapped_area(), which can be called by filesystem's
> get_unmapped_area to align an mmap address by the pmd size for
> a DAX file.  It calls the default handler, mm->get_unmapped_area(),
> to find a range and then aligns it for a DAX file.
> 
> thp_get_unmapped_area() can be extended for huge page cache support.
> 
> The patch is based on Matthew Wilcox's change that allows adding
> support of the pud page size easily.

See Hugh's implementation:

http://lkml.kernel.org/r/alpine.LSU.2.11.1604051420110.5965@eggly.anvils

> [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Matthew Wilcox <willy@linux.intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  include/linux/huge_mm.h |    7 +++++++
>  mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7008623..3769674 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
>  
>  extern unsigned long transparent_hugepage_flags;
>  
> +extern unsigned long thp_get_unmapped_area(struct file *filp,
> +		unsigned long addr, unsigned long len, unsigned long pgoff,
> +		unsigned long flags);
> +
>  extern void prep_transhuge_page(struct page *page);
>  extern void free_transhuge_page(struct page *page);
>  
> @@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void);
>  #define transparent_hugepage_enabled(__vma) 0
>  
>  #define transparent_hugepage_flags 0UL
> +
> +#define thp_get_unmapped_area	NULL
> +
>  static inline int
>  split_huge_page_to_list(struct page *page, struct list_head *list)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 86f9f8b..2181c7f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page)
>  	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
>  }
>  
> +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
> +		loff_t off, unsigned long flags, unsigned long size)
> +{
> +	unsigned long addr;
> +	loff_t off_end = off + len;
> +	loff_t off_align = round_up(off, size);
> +	unsigned long len_pad;
> +
> +	if (off_end <= off_align || (off_end - off_align) < size)
> +		return 0;
> +
> +	len_pad = len + size;
> +	if (len_pad < len || (off + len_pad) < off)
> +		return 0;
> +
> +	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> +					      off >> PAGE_SHIFT, flags);
> +	if (IS_ERR_VALUE(addr))
> +		return 0;
> +
> +	addr += (off - addr) & (size - 1);
> +	return addr;

Hugh has more sanity checks before and after call to get_unmapped_area().
Please, consider borrowing them.

> +}
> +
> +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> +	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> +
> +	if (addr)
> +		goto out;

I think it's too strong reaction to hint, isn't it?
We definately need this for MAP_FIXED. But in general? Maybe.

> +	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
> +		goto out;
> +
> +	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
> +	if (addr)
> +		return addr;
> +
> + out:
> +	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> +}
> +EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
> +
>  static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  					struct vm_area_struct *vma,
>  					unsigned long address, pmd_t *pmd,
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
  2016-04-24 22:50     ` Kirill A. Shutemov
  (?)
  (?)
@ 2016-04-25 16:06       ` Toshi Kani
  -1 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-25 16:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins
  Cc: tytso, jack, linux-nvdimm, david, linux-kernel, linux-mm,
	adilger.kernel, viro, linux-fsdevel, akpm, kirill.shutemov,
	mike.kravetz

On Mon, 2016-04-25 at 01:50 +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> > 
 :
> > +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long
> > len,
> > +		loff_t off, unsigned long flags, unsigned long size)
> > +{
> > +	unsigned long addr;
> > +	loff_t off_end = off + len;
> > +	loff_t off_align = round_up(off, size);
> > +	unsigned long len_pad;
> > +
> > +	if (off_end <= off_align || (off_end - off_align) < size)
> > +		return 0;
> > +
> > +	len_pad = len + size;
> > +	if (len_pad < len || (off + len_pad) < off)
> > +		return 0;
> > +
> > +	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> > +					      off >> PAGE_SHIFT,
> > flags);
> > +	if (IS_ERR_VALUE(addr))
> > +		return 0;
> > +
> > +	addr += (off - addr) & (size - 1);
> > +	return addr;
>
> Hugh has more sanity checks before and after call to get_unmapped_area().
> Please, consider borrowing them.

This function only checks if the request is qualified for THP mappings. It
tries not to step into the implementation of the allocation code current-
>mm->get_unmapped_area(), such as arch_get_unmapped_area_topdown() on x86.

Let me walk thru Hugh's checks to make sure I am not missing something:

---(Hugh's checks)---
| +	if (len > TASK_SIZE)
| +		return -ENOMEM;

This check is made by arch_get_unmapped_area_topdown().

| +
| +	get_area = current->mm->get_unmapped_area;
| +	addr = get_area(file, uaddr, len, pgoff, flags);
| +
| +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
| +		return addr;

thp_get_unmapped_area() is defined to NULL in this case.

| +	if (IS_ERR_VALUE(addr))
| +		return addr;

Checked in my patch.

| +	if (addr & ~PAGE_MASK)
| +		return addr;

arch_get_unmapped_area_topdown() aligns 'addr' unless MAP_FIXED is set. No
need to check in this func.

| +	if (addr > TASK_SIZE - len)
| +		return addr;

The allocation code needs to assure this case.

| +	if (shmem_huge == SHMEM_HUGE_DENY)
| +		return addr;

This check is specific to Hugh's patch.

| +	if (len < HPAGE_PMD_SIZE)
| +		return addr;

Checked in my patch.

| +	if (flags & MAP_FIXED)
| +		return addr;

Checked by arch_get_unmapped_area_topdown().

| +	/*
| +	 * Our priority is to support MAP_SHARED mapped hugely;
| +	 * and support MAP_PRIVATE mapped hugely too, until it is COWed.
| +	 * But if caller specified an address hint, respect that as
before.
| +	 */
| +	if (uaddr)
| +		return addr;

Checked in my patch.

(cut)

| +	offset = (pgoff << PAGE_SHIFT) & (HPAGE_PMD_SIZE-1);
| +	if (offset && offset + len < 2 * HPAGE_PMD_SIZE)
| +		return addr;

Checked in my patch.

| +	if ((addr & (HPAGE_PMD_SIZE-1)) == offset)
| +		return addr;

This is a lucky case, i.e. the 1st get_unmapped_area() call returned an
aligned addr. Not applicable to my patch.

| +
| +	inflated_len = len + HPAGE_PMD_SIZE - PAGE_SIZE;
| +	if (inflated_len > TASK_SIZE)
| +		return addr;

Checked by arch_get_unmapped_area_topdown().

| +	if (inflated_len < len)
| +		return addr;

Checked in my patch.

| +	inflated_addr = get_area(NULL, 0, inflated_len, 0, flags);

Not sure why passing 'filp' and 'off' as NULL here.

| +	if (IS_ERR_VALUE(inflated_addr))
| +		return addr;

Checked in my patch.

| +	if (inflated_addr & ~PAGE_MASK)
| +		return addr;

Hmm... if this happens, it is a bug in the allocation code. I do not think
this check is necessary.

| +	inflated_offset = inflated_addr & (HPAGE_PMD_SIZE-1);
| +	inflated_addr += offset - inflated_offset;
| +	if (inflated_offset > offset)
| +		inflated_addr += HPAGE_PMD_SIZE;
| +
| +	if (inflated_addr > TASK_SIZE - len)
| +		return addr;

The allocation code needs to assure this.

| +	return inflated_addr;

> > 
> > +}
> > +
> > +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long
> > addr,
> > +		unsigned long len, unsigned long pgoff, unsigned long
> > flags)
> > +{
> > +	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> > +
> > +	if (addr)
> > +		goto out;
>
> I think it's too strong reaction to hint, isn't it?
> We definately need this for MAP_FIXED. But in general? Maybe.

It calls arch's get_unmapped_area() to proceed with the original args when
'addr' is passed. The arch's get_unmapped_are() then handles 'addr' as a
hint when MAP_FIXED is not set. This can be used as a hint to avoid using
THP mappings if a non-aligned address is passed. Hugh's code handles it in
the same way as well.

Thanks,
-Toshi
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
@ 2016-04-25 16:06       ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-25 16:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins
  Cc: akpm, dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, linux-nvdimm,
	linux-fsdevel, linux-mm, linux-kernel

On Mon, 2016-04-25 at 01:50 +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> > 
 :
> > +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long
> > len,
> > +		loff_t off, unsigned long flags, unsigned long size)
> > +{
> > +	unsigned long addr;
> > +	loff_t off_end = off + len;
> > +	loff_t off_align = round_up(off, size);
> > +	unsigned long len_pad;
> > +
> > +	if (off_end <= off_align || (off_end - off_align) < size)
> > +		return 0;
> > +
> > +	len_pad = len + size;
> > +	if (len_pad < len || (off + len_pad) < off)
> > +		return 0;
> > +
> > +	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> > +					      off >> PAGE_SHIFT,
> > flags);
> > +	if (IS_ERR_VALUE(addr))
> > +		return 0;
> > +
> > +	addr += (off - addr) & (size - 1);
> > +	return addr;
>
> Hugh has more sanity checks before and after call to get_unmapped_area().
> Please, consider borrowing them.

This function only checks if the request is qualified for THP mappings. It
tries not to step into the implementation of the allocation code current-
>mm->get_unmapped_area(), such as arch_get_unmapped_area_topdown() on x86.

Let me walk thru Hugh's checks to make sure I am not missing something:

---(Hugh's checks)---
| +	if (len > TASK_SIZE)
| +		return -ENOMEM;

This check is made by arch_get_unmapped_area_topdown().

| +
| +	get_area = current->mm->get_unmapped_area;
| +	addr = get_area(file, uaddr, len, pgoff, flags);
| +
| +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
| +		return addr;

thp_get_unmapped_area() is defined to NULL in this case.

| +	if (IS_ERR_VALUE(addr))
| +		return addr;

Checked in my patch.

| +	if (addr & ~PAGE_MASK)
| +		return addr;

arch_get_unmapped_area_topdown() aligns 'addr' unless MAP_FIXED is set. No
need to check in this func.

| +	if (addr > TASK_SIZE - len)
| +		return addr;

The allocation code needs to assure this case.

| +	if (shmem_huge == SHMEM_HUGE_DENY)
| +		return addr;

This check is specific to Hugh's patch.

| +	if (len < HPAGE_PMD_SIZE)
| +		return addr;

Checked in my patch.

| +	if (flags & MAP_FIXED)
| +		return addr;

Checked by arch_get_unmapped_area_topdown().

| +	/*
| +	 * Our priority is to support MAP_SHARED mapped hugely;
| +	 * and support MAP_PRIVATE mapped hugely too, until it is COWed.
| +	 * But if caller specified an address hint, respect that as
before.
| +	 */
| +	if (uaddr)
| +		return addr;

Checked in my patch.

(cut)

| +	offset = (pgoff << PAGE_SHIFT) & (HPAGE_PMD_SIZE-1);
| +	if (offset && offset + len < 2 * HPAGE_PMD_SIZE)
| +		return addr;

Checked in my patch.

| +	if ((addr & (HPAGE_PMD_SIZE-1)) == offset)
| +		return addr;

This is a lucky case, i.e. the 1st get_unmapped_area() call returned an
aligned addr. Not applicable to my patch.

| +
| +	inflated_len = len + HPAGE_PMD_SIZE - PAGE_SIZE;
| +	if (inflated_len > TASK_SIZE)
| +		return addr;

Checked by arch_get_unmapped_area_topdown().

| +	if (inflated_len < len)
| +		return addr;

Checked in my patch.

| +	inflated_addr = get_area(NULL, 0, inflated_len, 0, flags);

Not sure why passing 'filp' and 'off' as NULL here.

| +	if (IS_ERR_VALUE(inflated_addr))
| +		return addr;

Checked in my patch.

| +	if (inflated_addr & ~PAGE_MASK)
| +		return addr;

Hmm... if this happens, it is a bug in the allocation code. I do not think
this check is necessary.

| +	inflated_offset = inflated_addr & (HPAGE_PMD_SIZE-1);
| +	inflated_addr += offset - inflated_offset;
| +	if (inflated_offset > offset)
| +		inflated_addr += HPAGE_PMD_SIZE;
| +
| +	if (inflated_addr > TASK_SIZE - len)
| +		return addr;

The allocation code needs to assure this.

| +	return inflated_addr;

> > 
> > +}
> > +
> > +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long
> > addr,
> > +		unsigned long len, unsigned long pgoff, unsigned long
> > flags)
> > +{
> > +	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> > +
> > +	if (addr)
> > +		goto out;
>
> I think it's too strong reaction to hint, isn't it?
> We definately need this for MAP_FIXED. But in general? Maybe.

It calls arch's get_unmapped_area() to proceed with the original args when
'addr' is passed. The arch's get_unmapped_are() then handles 'addr' as a
hint when MAP_FIXED is not set. This can be used as a hint to avoid using
THP mappings if a non-aligned address is passed. Hugh's code handles it in
the same way as well.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
@ 2016-04-25 16:06       ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-25 16:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins
  Cc: akpm, dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, linux-nvdimm,
	linux-fsdevel, linux-mm, linux-kernel

On Mon, 2016-04-25 at 01:50 +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> > 
 :
> > +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long
> > len,
> > +		loff_t off, unsigned long flags, unsigned long size)
> > +{
> > +	unsigned long addr;
> > +	loff_t off_end = off + len;
> > +	loff_t off_align = round_up(off, size);
> > +	unsigned long len_pad;
> > +
> > +	if (off_end <= off_align || (off_end - off_align) < size)
> > +		return 0;
> > +
> > +	len_pad = len + size;
> > +	if (len_pad < len || (off + len_pad) < off)
> > +		return 0;
> > +
> > +	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> > +					      off >> PAGE_SHIFT,
> > flags);
> > +	if (IS_ERR_VALUE(addr))
> > +		return 0;
> > +
> > +	addr += (off - addr) & (size - 1);
> > +	return addr;
>
> Hugh has more sanity checks before and after call to get_unmapped_area().
> Please, consider borrowing them.

This function only checks if the request is qualified for THP mappings. It
tries not to step into the implementation of the allocation code current-
>mm->get_unmapped_area(), such as arch_get_unmapped_area_topdown() on x86.

Let me walk thru Hugh's checks to make sure I am not missing something:

---(Hugh's checks)---
| +	if (len > TASK_SIZE)
| +		return -ENOMEM;

This check is made by arch_get_unmapped_area_topdown().

| +
| +	get_area = current->mm->get_unmapped_area;
| +	addr = get_area(file, uaddr, len, pgoff, flags);
| +
| +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
| +		return addr;

thp_get_unmapped_area() is defined to NULL in this case.

| +	if (IS_ERR_VALUE(addr))
| +		return addr;

Checked in my patch.

| +	if (addr & ~PAGE_MASK)
| +		return addr;

arch_get_unmapped_area_topdown() aligns 'addr' unless MAP_FIXED is set. No
need to check in this func.

| +	if (addr > TASK_SIZE - len)
| +		return addr;

The allocation code needs to assure this case.

| +	if (shmem_huge == SHMEM_HUGE_DENY)
| +		return addr;

This check is specific to Hugh's patch.

| +	if (len < HPAGE_PMD_SIZE)
| +		return addr;

Checked in my patch.

| +	if (flags & MAP_FIXED)
| +		return addr;

Checked by arch_get_unmapped_area_topdown().

| +	/*
| +	 * Our priority is to support MAP_SHARED mapped hugely;
| +	 * and support MAP_PRIVATE mapped hugely too, until it is COWed.
| +	 * But if caller specified an address hint, respect that as
before.
| +	 */
| +	if (uaddr)
| +		return addr;

Checked in my patch.

(cut)

| +	offset = (pgoff << PAGE_SHIFT) & (HPAGE_PMD_SIZE-1);
| +	if (offset && offset + len < 2 * HPAGE_PMD_SIZE)
| +		return addr;

Checked in my patch.

| +	if ((addr & (HPAGE_PMD_SIZE-1)) == offset)
| +		return addr;

This is a lucky case, i.e. the 1st get_unmapped_area() call returned an
aligned addr. Not applicable to my patch.

| +
| +	inflated_len = len + HPAGE_PMD_SIZE - PAGE_SIZE;
| +	if (inflated_len > TASK_SIZE)
| +		return addr;

Checked by arch_get_unmapped_area_topdown().

| +	if (inflated_len < len)
| +		return addr;

Checked in my patch.

| +	inflated_addr = get_area(NULL, 0, inflated_len, 0, flags);

Not sure why passing 'filp' and 'off' as NULL here.

| +	if (IS_ERR_VALUE(inflated_addr))
| +		return addr;

Checked in my patch.

| +	if (inflated_addr & ~PAGE_MASK)
| +		return addr;

Hmm... if this happens, it is a bug in the allocation code. I do not think
this check is necessary.

| +	inflated_offset = inflated_addr & (HPAGE_PMD_SIZE-1);
| +	inflated_addr += offset - inflated_offset;
| +	if (inflated_offset > offset)
| +		inflated_addr += HPAGE_PMD_SIZE;
| +
| +	if (inflated_addr > TASK_SIZE - len)
| +		return addr;

The allocation code needs to assure this.

| +	return inflated_addr;

> > 
> > +}
> > +
> > +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long
> > addr,
> > +		unsigned long len, unsigned long pgoff, unsigned long
> > flags)
> > +{
> > +	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> > +
> > +	if (addr)
> > +		goto out;
>
> I think it's too strong reaction to hint, isn't it?
> We definately need this for MAP_FIXED. But in general? Maybe.

It calls arch's get_unmapped_area() to proceed with the original args when
'addr' is passed. The arch's get_unmapped_are() then handles 'addr' as a
hint when MAP_FIXED is not set. This can be used as a hint to avoid using
THP mappings if a non-aligned address is passed. Hugh's code handles it in
the same way as well.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings
@ 2016-04-25 16:06       ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2016-04-25 16:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins
  Cc: akpm, dan.j.williams, viro, willy, ross.zwisler, kirill.shutemov,
	david, jack, tytso, adilger.kernel, mike.kravetz, linux-nvdimm,
	linux-fsdevel, linux-mm, linux-kernel

On Mon, 2016-04-25 at 01:50 +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> > 
A :
> > +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long
> > len,
> > +		loff_t off, unsigned long flags, unsigned long size)
> > +{
> > +	unsigned long addr;
> > +	loff_t off_end = off + len;
> > +	loff_t off_align = round_up(off, size);
> > +	unsigned long len_pad;
> > +
> > +	if (off_end <= off_align || (off_end - off_align) < size)
> > +		return 0;
> > +
> > +	len_pad = len + size;
> > +	if (len_pad < len || (off + len_pad) < off)
> > +		return 0;
> > +
> > +	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> > +					A A A A A A off >> PAGE_SHIFT,
> > flags);
> > +	if (IS_ERR_VALUE(addr))
> > +		return 0;
> > +
> > +	addr += (off - addr) & (size - 1);
> > +	return addr;
>
> Hugh has more sanity checks before and after call to get_unmapped_area().
> Please, consider borrowing them.

This function only checks if the request is qualified for THP mappings. It
tries not to step into the implementation of the allocation code current-
>mm->get_unmapped_area(), such asA arch_get_unmapped_area_topdown() on x86.

Let me walk thru Hugh's checks to make sure I am not missing something:

---(Hugh's checks)---
| +	if (len > TASK_SIZE)
| +		return -ENOMEM;

This check is made by arch_get_unmapped_area_topdown().

| +
| +	get_area = current->mm->get_unmapped_area;
| +	addr = get_area(file, uaddr, len, pgoff, flags);
| +
| +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
| +		return addr;

thp_get_unmapped_area() is defined to NULL in this case.

| +	if (IS_ERR_VALUE(addr))
| +		return addr;

Checked in my patch.

| +	if (addr & ~PAGE_MASK)
| +		return addr;

arch_get_unmapped_area_topdown() aligns 'addr' unless MAP_FIXED is set. No
need to check in this func.

| +	if (addr > TASK_SIZE - len)
| +		return addr;

The allocation code needs to assure this case.

| +	if (shmem_huge == SHMEM_HUGE_DENY)
| +		return addr;

This check is specific to Hugh's patch.

| +	if (len < HPAGE_PMD_SIZE)
| +		return addr;

Checked in my patch.

| +	if (flags & MAP_FIXED)
| +		return addr;

Checked by arch_get_unmapped_area_topdown().

| +	/*
| +	A * Our priority is to support MAP_SHARED mapped hugely;
| +	A * and support MAP_PRIVATE mapped hugely too, until it is COWed.
| +	A * But if caller specified an address hint, respect that as
before.
| +	A */
| +	if (uaddr)
| +		return addr;

Checked in my patch.

(cut)

| +	offset = (pgoff << PAGE_SHIFT) & (HPAGE_PMD_SIZE-1);
| +	if (offset && offset + len < 2 * HPAGE_PMD_SIZE)
| +		return addr;

Checked in my patch.

| +	if ((addr & (HPAGE_PMD_SIZE-1)) == offset)
| +		return addr;

This is a lucky case, i.e. the 1st get_unmapped_area() call returned an
aligned addr. Not applicable to my patch.

| +
| +	inflated_len = len + HPAGE_PMD_SIZE - PAGE_SIZE;
| +	if (inflated_len > TASK_SIZE)
| +		return addr;

Checked by arch_get_unmapped_area_topdown().

| +	if (inflated_len < len)
| +		return addr;

Checked in my patch.

| +	inflated_addr = get_area(NULL, 0, inflated_len, 0, flags);

Not sure why passing 'filp' and 'off' as NULL here.

| +	if (IS_ERR_VALUE(inflated_addr))
| +		return addr;

Checked in my patch.

| +	if (inflated_addr & ~PAGE_MASK)
| +		return addr;

Hmm... if this happens, it is a bug in the allocation code. I do not think
this check is necessary.

| +	inflated_offset = inflated_addr & (HPAGE_PMD_SIZE-1);
| +	inflated_addr += offset - inflated_offset;
| +	if (inflated_offset > offset)
| +		inflated_addr += HPAGE_PMD_SIZE;
| +
| +	if (inflated_addr > TASK_SIZE - len)
| +		return addr;

The allocation code needs to assure this.

| +	return inflated_addr;

> > 
> > +}
> > +
> > +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long
> > addr,
> > +		unsigned long len, unsigned long pgoff, unsigned long
> > flags)
> > +{
> > +	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> > +
> > +	if (addr)
> > +		goto out;
>
> I think it's too strong reaction to hint, isn't it?
> We definately need this for MAP_FIXED. But in general? Maybe.

It calls arch's get_unmapped_area() to proceed with the original args when
'addr' is passed. The arch's get_unmapped_are() then handles 'addr' as a
hint when MAP_FIXED is not set. This can be used as a hint to avoid using
THP mappings if a non-aligned address is passed. Hugh's code handles it in
the same way as well.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-04-25 16:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-23  0:21 [PATCH v4 0/2] Align mmap address for DAX pmd mappings Toshi Kani
2016-04-23  0:21 ` Toshi Kani
2016-04-23  0:21 ` Toshi Kani
2016-04-23  0:21 ` [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for " Toshi Kani
2016-04-23  0:21   ` Toshi Kani
2016-04-23  0:21   ` Toshi Kani
2016-04-24 22:50   ` Kirill A. Shutemov
2016-04-24 22:50     ` Kirill A. Shutemov
2016-04-24 22:50     ` Kirill A. Shutemov
2016-04-25 16:06     ` Toshi Kani
2016-04-25 16:06       ` Toshi Kani
2016-04-25 16:06       ` Toshi Kani
2016-04-25 16:06       ` Toshi Kani
2016-04-23  0:21 ` [PATCH v4 2/2] ext2/4, xfs, blk: call thp_get_unmapped_area() " Toshi Kani
2016-04-23  0:21   ` Toshi Kani
2016-04-23  0:21   ` Toshi Kani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.