[PATCH, REBASED 0/8] Transparent huge page cache: phase 0, prep work

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH, REBASED 0/8] Transparent huge page cache: phase 0, prep work
@ 2013-07-15 10:47 ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

[ no changes since last post, only rebased to v3.11-rc1 ]

My patchset which introduces transparent huge page cache is pretty big and
hardly reviewable. Dave Hansen suggested to split it in few parts.

This is the first part: preparation work. I think it's useful without rest
patches.

There's one fix for bug in lru_add_page_tail(). I doubt it's possible to
trigger it on current code, but nice to have it upstream anyway.
Rest is cleanups.

Patch 8 depends on patch 7. Other patches are independent and can be
applied separately.

Please, consider applying.

Kirill A. Shutemov (8):
  mm: drop actor argument of do_generic_file_read()
  thp, mm: avoid PageUnevictable on active/inactive lru lists
  thp: account anon transparent huge pages into NR_ANON_PAGES
  mm: cleanup add_to_page_cache_locked()
  thp, mm: locking tail page is a bug
  thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
  thp: do_huge_pmd_anonymous_page() cleanup
  thp: consolidate code between handle_mm_fault() and
    do_huge_pmd_anonymous_page()

 drivers/base/node.c     |   6 ---
 fs/proc/meminfo.c       |   6 ---
 include/linux/huge_mm.h |   3 --
 include/linux/mm.h      |   3 +-
 mm/filemap.c            |  60 ++++++++++++-----------
 mm/huge_memory.c        | 125 ++++++++++++++++++++----------------------------
 mm/memory.c             |   9 ++--
 mm/rmap.c               |  18 +++----
 mm/swap.c               |  20 +-------
 9 files changed, 104 insertions(+), 146 deletions(-)

-- 
1.8.3.2


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH, REBASED 0/8] Transparent huge page cache: phase 0, prep work
@ 2013-07-15 10:47 ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

[ no changes since last post, only rebased to v3.11-rc1 ]

My patchset which introduces transparent huge page cache is pretty big and
hardly reviewable. Dave Hansen suggested to split it in few parts.

This is the first part: preparation work. I think it's useful without rest
patches.

There's one fix for bug in lru_add_page_tail(). I doubt it's possible to
trigger it on current code, but nice to have it upstream anyway.
Rest is cleanups.

Patch 8 depends on patch 7. Other patches are independent and can be
applied separately.

Please, consider applying.

Kirill A. Shutemov (8):
  mm: drop actor argument of do_generic_file_read()
  thp, mm: avoid PageUnevictable on active/inactive lru lists
  thp: account anon transparent huge pages into NR_ANON_PAGES
  mm: cleanup add_to_page_cache_locked()
  thp, mm: locking tail page is a bug
  thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
  thp: do_huge_pmd_anonymous_page() cleanup
  thp: consolidate code between handle_mm_fault() and
    do_huge_pmd_anonymous_page()

 drivers/base/node.c     |   6 ---
 fs/proc/meminfo.c       |   6 ---
 include/linux/huge_mm.h |   3 --
 include/linux/mm.h      |   3 +-
 mm/filemap.c            |  60 ++++++++++++-----------
 mm/huge_memory.c        | 125 ++++++++++++++++++++----------------------------
 mm/memory.c             |   9 ++--
 mm/rmap.c               |  18 +++----
 mm/swap.c               |  20 +-------
 9 files changed, 104 insertions(+), 146 deletions(-)

-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/8] mm: drop actor argument of do_generic_file_read()
  2013-07-15 10:47 ` Kirill A. Shutemov
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

There's only one caller of do_generic_file_read() and the only actor is
file_read_actor(). No reason to have a callback parameter.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/filemap.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 4b51ac1..f6fe61f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1088,7 +1088,6 @@ static void shrink_readahead_size_eio(struct file *filp,
  * @filp:	the file to read
  * @ppos:	current file position
  * @desc:	read_descriptor
- * @actor:	read method
  *
  * This is a generic file read routine, and uses the
  * mapping->a_ops->readpage() function for the actual low-level stuff.
@@ -1097,7 +1096,7 @@ static void shrink_readahead_size_eio(struct file *filp,
  * of the logic when it comes to error handling etc.
  */
 static void do_generic_file_read(struct file *filp, loff_t *ppos,
-		read_descriptor_t *desc, read_actor_t actor)
+		read_descriptor_t *desc)
 {
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
@@ -1198,13 +1197,14 @@ page_ok:
 		 * Ok, we have the page, and it's up-to-date, so
 		 * now we can copy it to user space...
 		 *
-		 * The actor routine returns how many bytes were actually used..
+		 * The file_read_actor routine returns how many bytes were
+		 * actually used..
 		 * NOTE! This may not be the same as how much of a user buffer
 		 * we filled up (we may be padding etc), so we can only update
 		 * "pos" here (the actor routine has to update the user buffer
 		 * pointers and the remaining count).
 		 */
-		ret = actor(desc, page, offset, nr);
+		ret = file_read_actor(desc, page, offset, nr);
 		offset += ret;
 		index += offset >> PAGE_CACHE_SHIFT;
 		offset &= ~PAGE_CACHE_MASK;
@@ -1477,7 +1477,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
 		if (desc.count == 0)
 			continue;
 		desc.error = 0;
-		do_generic_file_read(filp, ppos, &desc, file_read_actor);
+		do_generic_file_read(filp, ppos, &desc);
 		retval += desc.written;
 		if (desc.error) {
 			retval = retval ?: desc.error;
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 1/8] mm: drop actor argument of do_generic_file_read()
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

There's only one caller of do_generic_file_read() and the only actor is
file_read_actor(). No reason to have a callback parameter.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/filemap.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 4b51ac1..f6fe61f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1088,7 +1088,6 @@ static void shrink_readahead_size_eio(struct file *filp,
  * @filp:	the file to read
  * @ppos:	current file position
  * @desc:	read_descriptor
- * @actor:	read method
  *
  * This is a generic file read routine, and uses the
  * mapping->a_ops->readpage() function for the actual low-level stuff.
@@ -1097,7 +1096,7 @@ static void shrink_readahead_size_eio(struct file *filp,
  * of the logic when it comes to error handling etc.
  */
 static void do_generic_file_read(struct file *filp, loff_t *ppos,
-		read_descriptor_t *desc, read_actor_t actor)
+		read_descriptor_t *desc)
 {
 	struct address_space *mapping = filp->f_mapping;
 	struct inode *inode = mapping->host;
@@ -1198,13 +1197,14 @@ page_ok:
 		 * Ok, we have the page, and it's up-to-date, so
 		 * now we can copy it to user space...
 		 *
-		 * The actor routine returns how many bytes were actually used..
+		 * The file_read_actor routine returns how many bytes were
+		 * actually used..
 		 * NOTE! This may not be the same as how much of a user buffer
 		 * we filled up (we may be padding etc), so we can only update
 		 * "pos" here (the actor routine has to update the user buffer
 		 * pointers and the remaining count).
 		 */
-		ret = actor(desc, page, offset, nr);
+		ret = file_read_actor(desc, page, offset, nr);
 		offset += ret;
 		index += offset >> PAGE_CACHE_SHIFT;
 		offset &= ~PAGE_CACHE_MASK;
@@ -1477,7 +1477,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
 		if (desc.count == 0)
 			continue;
 		desc.error = 0;
-		do_generic_file_read(filp, ppos, &desc, file_read_actor);
+		do_generic_file_read(filp, ppos, &desc);
 		retval += desc.written;
 		if (desc.error) {
 			retval = retval ?: desc.error;
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/8] thp, mm: avoid PageUnevictable on active/inactive lru lists
  2013-07-15 10:47 ` Kirill A. Shutemov
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

active/inactive lru lists can contain unevicable pages (i.e. ramfs pages
that have been placed on the LRU lists when first allocated), but these
pages must not have PageUnevictable set - otherwise shrink_[in]active_list
goes crazy:

kernel BUG at /home/space/kas/git/public/linux-next/mm/vmscan.c:1122!

1090 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
1091                 struct lruvec *lruvec, struct list_head *dst,
1092                 unsigned long *nr_scanned, struct scan_control *sc,
1093                 isolate_mode_t mode, enum lru_list lru)
1094 {
...
1108                 switch (__isolate_lru_page(page, mode)) {
1109                 case 0:
...
1116                 case -EBUSY:
...
1121                 default:
1122                         BUG();
1123                 }
1124         }
...
1130 }

__isolate_lru_page() returns EINVAL for PageUnevictable(page).

For lru_add_page_tail(), it means we should not set PageUnevictable()
for tail pages unless we're sure that it will go to LRU_UNEVICTABLE.
Let's just copy PG_active and PG_unevictable from head page in
__split_huge_page_refcount(), it will simplify lru_add_page_tail().

This will fix one more bug in lru_add_page_tail():
if page_evictable(page_tail) is false and PageLRU(page) is true, page_tail
will go to the same lru as page, but nobody cares to sync page_tail
active/inactive state with page. So we can end up with inactive page on
active lru.
The patch will fix it as well since we copy PG_active from head page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/huge_memory.c |  4 +++-
 mm/swap.c        | 20 ++------------------
 2 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 243e710..a92012a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1620,7 +1620,9 @@ static void __split_huge_page_refcount(struct page *page,
 				     ((1L << PG_referenced) |
 				      (1L << PG_swapbacked) |
 				      (1L << PG_mlocked) |
-				      (1L << PG_uptodate)));
+				      (1L << PG_uptodate) |
+				      (1L << PG_active) |
+				      (1L << PG_unevictable)));
 		page_tail->flags |= (1L << PG_dirty);
 
 		/* clear PageTail before overwriting first_page */
diff --git a/mm/swap.c b/mm/swap.c
index 4a1d0d2..7bd8910 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -774,8 +774,6 @@ EXPORT_SYMBOL(__pagevec_release);
 void lru_add_page_tail(struct page *page, struct page *page_tail,
 		       struct lruvec *lruvec, struct list_head *list)
 {
-	int uninitialized_var(active);
-	enum lru_list lru;
 	const int file = 0;
 
 	VM_BUG_ON(!PageHead(page));
@@ -787,20 +785,6 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 	if (!list)
 		SetPageLRU(page_tail);
 
-	if (page_evictable(page_tail)) {
-		if (PageActive(page)) {
-			SetPageActive(page_tail);
-			active = 1;
-			lru = LRU_ACTIVE_ANON;
-		} else {
-			active = 0;
-			lru = LRU_INACTIVE_ANON;
-		}
-	} else {
-		SetPageUnevictable(page_tail);
-		lru = LRU_UNEVICTABLE;
-	}
-
 	if (likely(PageLRU(page)))
 		list_add_tail(&page_tail->lru, &page->lru);
 	else if (list) {
@@ -816,13 +800,13 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 		 * Use the standard add function to put page_tail on the list,
 		 * but then correct its position so they all end up in order.
 		 */
-		add_page_to_lru_list(page_tail, lruvec, lru);
+		add_page_to_lru_list(page_tail, lruvec, page_lru(page_tail));
 		list_head = page_tail->lru.prev;
 		list_move_tail(&page_tail->lru, list_head);
 	}
 
 	if (!PageUnevictable(page))
-		update_page_reclaim_stat(lruvec, file, active);
+		update_page_reclaim_stat(lruvec, file, PageActive(page_tail));
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/8] thp, mm: avoid PageUnevictable on active/inactive lru lists
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

active/inactive lru lists can contain unevicable pages (i.e. ramfs pages
that have been placed on the LRU lists when first allocated), but these
pages must not have PageUnevictable set - otherwise shrink_[in]active_list
goes crazy:

kernel BUG at /home/space/kas/git/public/linux-next/mm/vmscan.c:1122!

1090 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
1091                 struct lruvec *lruvec, struct list_head *dst,
1092                 unsigned long *nr_scanned, struct scan_control *sc,
1093                 isolate_mode_t mode, enum lru_list lru)
1094 {
...
1108                 switch (__isolate_lru_page(page, mode)) {
1109                 case 0:
...
1116                 case -EBUSY:
...
1121                 default:
1122                         BUG();
1123                 }
1124         }
...
1130 }

__isolate_lru_page() returns EINVAL for PageUnevictable(page).

For lru_add_page_tail(), it means we should not set PageUnevictable()
for tail pages unless we're sure that it will go to LRU_UNEVICTABLE.
Let's just copy PG_active and PG_unevictable from head page in
__split_huge_page_refcount(), it will simplify lru_add_page_tail().

This will fix one more bug in lru_add_page_tail():
if page_evictable(page_tail) is false and PageLRU(page) is true, page_tail
will go to the same lru as page, but nobody cares to sync page_tail
active/inactive state with page. So we can end up with inactive page on
active lru.
The patch will fix it as well since we copy PG_active from head page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/huge_memory.c |  4 +++-
 mm/swap.c        | 20 ++------------------
 2 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 243e710..a92012a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1620,7 +1620,9 @@ static void __split_huge_page_refcount(struct page *page,
 				     ((1L << PG_referenced) |
 				      (1L << PG_swapbacked) |
 				      (1L << PG_mlocked) |
-				      (1L << PG_uptodate)));
+				      (1L << PG_uptodate) |
+				      (1L << PG_active) |
+				      (1L << PG_unevictable)));
 		page_tail->flags |= (1L << PG_dirty);
 
 		/* clear PageTail before overwriting first_page */
diff --git a/mm/swap.c b/mm/swap.c
index 4a1d0d2..7bd8910 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -774,8 +774,6 @@ EXPORT_SYMBOL(__pagevec_release);
 void lru_add_page_tail(struct page *page, struct page *page_tail,
 		       struct lruvec *lruvec, struct list_head *list)
 {
-	int uninitialized_var(active);
-	enum lru_list lru;
 	const int file = 0;
 
 	VM_BUG_ON(!PageHead(page));
@@ -787,20 +785,6 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 	if (!list)
 		SetPageLRU(page_tail);
 
-	if (page_evictable(page_tail)) {
-		if (PageActive(page)) {
-			SetPageActive(page_tail);
-			active = 1;
-			lru = LRU_ACTIVE_ANON;
-		} else {
-			active = 0;
-			lru = LRU_INACTIVE_ANON;
-		}
-	} else {
-		SetPageUnevictable(page_tail);
-		lru = LRU_UNEVICTABLE;
-	}
-
 	if (likely(PageLRU(page)))
 		list_add_tail(&page_tail->lru, &page->lru);
 	else if (list) {
@@ -816,13 +800,13 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 		 * Use the standard add function to put page_tail on the list,
 		 * but then correct its position so they all end up in order.
 		 */
-		add_page_to_lru_list(page_tail, lruvec, lru);
+		add_page_to_lru_list(page_tail, lruvec, page_lru(page_tail));
 		list_head = page_tail->lru.prev;
 		list_move_tail(&page_tail->lru, list_head);
 	}
 
 	if (!PageUnevictable(page))
-		update_page_reclaim_stat(lruvec, file, active);
+		update_page_reclaim_stat(lruvec, file, PageActive(page_tail));
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/8] thp: account anon transparent huge pages into NR_ANON_PAGES
  2013-07-15 10:47 ` Kirill A. Shutemov
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

We use NR_ANON_PAGES as base for reporting AnonPages to user.
There's not much sense in not accounting transparent huge pages there, but
add them on printing to user.

Let's account transparent huge pages in NR_ANON_PAGES in the first place.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 drivers/base/node.c |  6 ------
 fs/proc/meminfo.c   |  6 ------
 mm/huge_memory.c    |  1 -
 mm/rmap.c           | 18 +++++++++---------
 4 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 7616a77..bc9f43b 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -125,13 +125,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(nid, NR_WRITEBACK)),
 		       nid, K(node_page_state(nid, NR_FILE_PAGES)),
 		       nid, K(node_page_state(nid, NR_FILE_MAPPED)),
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		       nid, K(node_page_state(nid, NR_ANON_PAGES)
-			+ node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
-			HPAGE_PMD_NR),
-#else
 		       nid, K(node_page_state(nid, NR_ANON_PAGES)),
-#endif
 		       nid, K(node_page_state(nid, NR_SHMEM)),
 		       nid, node_page_state(nid, NR_KERNEL_STACK) *
 				THREAD_SIZE / 1024,
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 5aa847a..59d85d6 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -132,13 +132,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		K(i.freeswap),
 		K(global_page_state(NR_FILE_DIRTY)),
 		K(global_page_state(NR_WRITEBACK)),
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		K(global_page_state(NR_ANON_PAGES)
-		  + global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
-		  HPAGE_PMD_NR),
-#else
 		K(global_page_state(NR_ANON_PAGES)),
-#endif
 		K(global_page_state(NR_FILE_MAPPED)),
 		K(global_page_state(NR_SHMEM)),
 		K(global_page_state(NR_SLAB_RECLAIMABLE) +
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..04f0749 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1661,7 +1661,6 @@ static void __split_huge_page_refcount(struct page *page,
 	BUG_ON(atomic_read(&page->_count) <= 0);
 
 	__mod_zone_page_state(zone, NR_ANON_TRANSPARENT_HUGEPAGES, -1);
-	__mod_zone_page_state(zone, NR_ANON_PAGES, HPAGE_PMD_NR);
 
 	ClearPageCompound(page);
 	compound_unlock(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index cd356df..7066470 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1055,11 +1055,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
 	int first = atomic_inc_and_test(&page->_mapcount);
 	if (first) {
-		if (!PageTransHuge(page))
-			__inc_zone_page_state(page, NR_ANON_PAGES);
-		else
+		if (PageTransHuge(page))
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
+		__mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
+				hpage_nr_pages(page));
 	}
 	if (unlikely(PageKsm(page)))
 		return;
@@ -1088,10 +1088,10 @@ void page_add_new_anon_rmap(struct page *page,
 	VM_BUG_ON(address < vma->vm_start || address >= vma->vm_end);
 	SetPageSwapBacked(page);
 	atomic_set(&page->_mapcount, 0); /* increment count (starts at -1) */
-	if (!PageTransHuge(page))
-		__inc_zone_page_state(page, NR_ANON_PAGES);
-	else
+	if (PageTransHuge(page))
 		__inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
+	__mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
+			hpage_nr_pages(page));
 	__page_set_anon_rmap(page, vma, address, 1);
 	if (!mlocked_vma_newpage(vma, page)) {
 		SetPageActive(page);
@@ -1151,11 +1151,11 @@ void page_remove_rmap(struct page *page)
 		goto out;
 	if (anon) {
 		mem_cgroup_uncharge_page(page);
-		if (!PageTransHuge(page))
-			__dec_zone_page_state(page, NR_ANON_PAGES);
-		else
+		if (PageTransHuge(page))
 			__dec_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
+		__mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
+				hpage_nr_pages(page));
 	} else {
 		__dec_zone_page_state(page, NR_FILE_MAPPED);
 		mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED);
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/8] thp: account anon transparent huge pages into NR_ANON_PAGES
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

We use NR_ANON_PAGES as base for reporting AnonPages to user.
There's not much sense in not accounting transparent huge pages there, but
add them on printing to user.

Let's account transparent huge pages in NR_ANON_PAGES in the first place.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 drivers/base/node.c |  6 ------
 fs/proc/meminfo.c   |  6 ------
 mm/huge_memory.c    |  1 -
 mm/rmap.c           | 18 +++++++++---------
 4 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 7616a77..bc9f43b 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -125,13 +125,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(nid, NR_WRITEBACK)),
 		       nid, K(node_page_state(nid, NR_FILE_PAGES)),
 		       nid, K(node_page_state(nid, NR_FILE_MAPPED)),
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		       nid, K(node_page_state(nid, NR_ANON_PAGES)
-			+ node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
-			HPAGE_PMD_NR),
-#else
 		       nid, K(node_page_state(nid, NR_ANON_PAGES)),
-#endif
 		       nid, K(node_page_state(nid, NR_SHMEM)),
 		       nid, node_page_state(nid, NR_KERNEL_STACK) *
 				THREAD_SIZE / 1024,
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 5aa847a..59d85d6 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -132,13 +132,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		K(i.freeswap),
 		K(global_page_state(NR_FILE_DIRTY)),
 		K(global_page_state(NR_WRITEBACK)),
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		K(global_page_state(NR_ANON_PAGES)
-		  + global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
-		  HPAGE_PMD_NR),
-#else
 		K(global_page_state(NR_ANON_PAGES)),
-#endif
 		K(global_page_state(NR_FILE_MAPPED)),
 		K(global_page_state(NR_SHMEM)),
 		K(global_page_state(NR_SLAB_RECLAIMABLE) +
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..04f0749 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1661,7 +1661,6 @@ static void __split_huge_page_refcount(struct page *page,
 	BUG_ON(atomic_read(&page->_count) <= 0);
 
 	__mod_zone_page_state(zone, NR_ANON_TRANSPARENT_HUGEPAGES, -1);
-	__mod_zone_page_state(zone, NR_ANON_PAGES, HPAGE_PMD_NR);
 
 	ClearPageCompound(page);
 	compound_unlock(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index cd356df..7066470 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1055,11 +1055,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
 	int first = atomic_inc_and_test(&page->_mapcount);
 	if (first) {
-		if (!PageTransHuge(page))
-			__inc_zone_page_state(page, NR_ANON_PAGES);
-		else
+		if (PageTransHuge(page))
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
+		__mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
+				hpage_nr_pages(page));
 	}
 	if (unlikely(PageKsm(page)))
 		return;
@@ -1088,10 +1088,10 @@ void page_add_new_anon_rmap(struct page *page,
 	VM_BUG_ON(address < vma->vm_start || address >= vma->vm_end);
 	SetPageSwapBacked(page);
 	atomic_set(&page->_mapcount, 0); /* increment count (starts at -1) */
-	if (!PageTransHuge(page))
-		__inc_zone_page_state(page, NR_ANON_PAGES);
-	else
+	if (PageTransHuge(page))
 		__inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
+	__mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
+			hpage_nr_pages(page));
 	__page_set_anon_rmap(page, vma, address, 1);
 	if (!mlocked_vma_newpage(vma, page)) {
 		SetPageActive(page);
@@ -1151,11 +1151,11 @@ void page_remove_rmap(struct page *page)
 		goto out;
 	if (anon) {
 		mem_cgroup_uncharge_page(page);
-		if (!PageTransHuge(page))
-			__dec_zone_page_state(page, NR_ANON_PAGES);
-		else
+		if (PageTransHuge(page))
 			__dec_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
+		__mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
+				hpage_nr_pages(page));
 	} else {
 		__dec_zone_page_state(page, NR_FILE_MAPPED);
 		mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_MAPPED);
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/8] mm: cleanup add_to_page_cache_locked()
  2013-07-15 10:47 ` Kirill A. Shutemov
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

The patch makes add_to_page_cache_locked() cleaner:
 - unindent most code of the function by inverting one condition;
 - streamline code no-error path;
 - move insert error path outside normal code path;
 - call radix_tree_preload_end() earlier;

No functional changes.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/filemap.c | 48 +++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index f6fe61f..f7c4ed5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -467,32 +467,34 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 	error = mem_cgroup_cache_charge(page, current->mm,
 					gfp_mask & GFP_RECLAIM_MASK);
 	if (error)
-		goto out;
+		return error;
 
 	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
-	if (error == 0) {
-		page_cache_get(page);
-		page->mapping = mapping;
-		page->index = offset;
-
-		spin_lock_irq(&mapping->tree_lock);
-		error = radix_tree_insert(&mapping->page_tree, offset, page);
-		if (likely(!error)) {
-			mapping->nrpages++;
-			__inc_zone_page_state(page, NR_FILE_PAGES);
-			spin_unlock_irq(&mapping->tree_lock);
-			trace_mm_filemap_add_to_page_cache(page);
-		} else {
-			page->mapping = NULL;
-			/* Leave page->index set: truncation relies upon it */
-			spin_unlock_irq(&mapping->tree_lock);
-			mem_cgroup_uncharge_cache_page(page);
-			page_cache_release(page);
-		}
-		radix_tree_preload_end();
-	} else
+	if (error) {
 		mem_cgroup_uncharge_cache_page(page);
-out:
+		return error;
+	}
+
+	page_cache_get(page);
+	page->mapping = mapping;
+	page->index = offset;
+
+	spin_lock_irq(&mapping->tree_lock);
+	error = radix_tree_insert(&mapping->page_tree, offset, page);
+	radix_tree_preload_end();
+	if (unlikely(!error))
+		goto err_insert;
+	mapping->nrpages++;
+	__inc_zone_page_state(page, NR_FILE_PAGES);
+	spin_unlock_irq(&mapping->tree_lock);
+	trace_mm_filemap_add_to_page_cache(page);
+	return 0;
+err_insert:
+	page->mapping = NULL;
+	/* Leave page->index set: truncation relies upon it */
+	spin_unlock_irq(&mapping->tree_lock);
+	mem_cgroup_uncharge_cache_page(page);
+	page_cache_release(page);
 	return error;
 }
 EXPORT_SYMBOL(add_to_page_cache_locked);
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/8] mm: cleanup add_to_page_cache_locked()
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

The patch makes add_to_page_cache_locked() cleaner:
 - unindent most code of the function by inverting one condition;
 - streamline code no-error path;
 - move insert error path outside normal code path;
 - call radix_tree_preload_end() earlier;

No functional changes.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/filemap.c | 48 +++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index f6fe61f..f7c4ed5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -467,32 +467,34 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 	error = mem_cgroup_cache_charge(page, current->mm,
 					gfp_mask & GFP_RECLAIM_MASK);
 	if (error)
-		goto out;
+		return error;
 
 	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
-	if (error == 0) {
-		page_cache_get(page);
-		page->mapping = mapping;
-		page->index = offset;
-
-		spin_lock_irq(&mapping->tree_lock);
-		error = radix_tree_insert(&mapping->page_tree, offset, page);
-		if (likely(!error)) {
-			mapping->nrpages++;
-			__inc_zone_page_state(page, NR_FILE_PAGES);
-			spin_unlock_irq(&mapping->tree_lock);
-			trace_mm_filemap_add_to_page_cache(page);
-		} else {
-			page->mapping = NULL;
-			/* Leave page->index set: truncation relies upon it */
-			spin_unlock_irq(&mapping->tree_lock);
-			mem_cgroup_uncharge_cache_page(page);
-			page_cache_release(page);
-		}
-		radix_tree_preload_end();
-	} else
+	if (error) {
 		mem_cgroup_uncharge_cache_page(page);
-out:
+		return error;
+	}
+
+	page_cache_get(page);
+	page->mapping = mapping;
+	page->index = offset;
+
+	spin_lock_irq(&mapping->tree_lock);
+	error = radix_tree_insert(&mapping->page_tree, offset, page);
+	radix_tree_preload_end();
+	if (unlikely(!error))
+		goto err_insert;
+	mapping->nrpages++;
+	__inc_zone_page_state(page, NR_FILE_PAGES);
+	spin_unlock_irq(&mapping->tree_lock);
+	trace_mm_filemap_add_to_page_cache(page);
+	return 0;
+err_insert:
+	page->mapping = NULL;
+	/* Leave page->index set: truncation relies upon it */
+	spin_unlock_irq(&mapping->tree_lock);
+	mem_cgroup_uncharge_cache_page(page);
+	page_cache_release(page);
 	return error;
 }
 EXPORT_SYMBOL(add_to_page_cache_locked);
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/8] thp, mm: locking tail page is a bug
  2013-07-15 10:47 ` Kirill A. Shutemov
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Locking head page means locking entire compound page.
If we try to lock tail page, something went wrong.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/filemap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index f7c4ed5..e59049c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -639,6 +639,7 @@ void __lock_page(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_BUG_ON(PageTail(page));
 	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_BUG_ON(PageTail(page));
 	return __wait_on_bit_lock(page_waitqueue(page), &wait,
 					sleep_on_page_killable, TASK_KILLABLE);
 }
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/8] thp, mm: locking tail page is a bug
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Locking head page means locking entire compound page.
If we try to lock tail page, something went wrong.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/filemap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index f7c4ed5..e59049c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -639,6 +639,7 @@ void __lock_page(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_BUG_ON(PageTail(page));
 	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_BUG_ON(PageTail(page));
 	return __wait_on_bit_lock(page_waitqueue(page), &wait,
 					sleep_on_page_killable, TASK_KILLABLE);
 }
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/8] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
  2013-07-15 10:47 ` Kirill A. Shutemov
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

It's confusing that mk_huge_pmd() has semantics different from mk_pte()
or mk_pmd(). I spent some time on debugging issue cased by this
inconsistency.

Let's move maybe_pmd_mkwrite() out of mk_huge_pmd() and adjust
prototype to match mk_pte().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/huge_memory.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 04f0749..ec735a9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -690,11 +690,10 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 	return pmd;
 }
 
-static inline pmd_t mk_huge_pmd(struct page *page, struct vm_area_struct *vma)
+static inline pmd_t mk_huge_pmd(struct page *page, pgprot_t prot)
 {
 	pmd_t entry;
-	entry = mk_pmd(page, vma->vm_page_prot);
-	entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+	entry = mk_pmd(page, prot);
 	entry = pmd_mkhuge(entry);
 	return entry;
 }
@@ -727,7 +726,8 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 		pte_free(mm, pgtable);
 	} else {
 		pmd_t entry;
-		entry = mk_huge_pmd(page, vma);
+		entry = mk_huge_pmd(page, vma->vm_page_prot);
+		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
 		page_add_new_anon_rmap(page, vma, haddr);
 		pgtable_trans_huge_deposit(mm, pmd, pgtable);
 		set_pmd_at(mm, haddr, pmd, entry);
@@ -1210,7 +1210,8 @@ alloc:
 		goto out_mn;
 	} else {
 		pmd_t entry;
-		entry = mk_huge_pmd(new_page, vma);
+		entry = mk_huge_pmd(new_page, vma->vm_page_prot);
+		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
 		pmdp_clear_flush(vma, haddr, pmd);
 		page_add_new_anon_rmap(new_page, vma, haddr);
 		set_pmd_at(mm, haddr, pmd, entry);
@@ -2356,7 +2357,8 @@ static void collapse_huge_page(struct mm_struct *mm,
 	__SetPageUptodate(new_page);
 	pgtable = pmd_pgtable(_pmd);
 
-	_pmd = mk_huge_pmd(new_page, vma);
+	_pmd = mk_huge_pmd(new_page, vma->vm_page_prot);
+	_pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
 
 	/*
 	 * spin_lock() below is not the equivalent of smp_wmb(), so
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/8] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

It's confusing that mk_huge_pmd() has semantics different from mk_pte()
or mk_pmd(). I spent some time on debugging issue cased by this
inconsistency.

Let's move maybe_pmd_mkwrite() out of mk_huge_pmd() and adjust
prototype to match mk_pte().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/huge_memory.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 04f0749..ec735a9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -690,11 +690,10 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 	return pmd;
 }
 
-static inline pmd_t mk_huge_pmd(struct page *page, struct vm_area_struct *vma)
+static inline pmd_t mk_huge_pmd(struct page *page, pgprot_t prot)
 {
 	pmd_t entry;
-	entry = mk_pmd(page, vma->vm_page_prot);
-	entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+	entry = mk_pmd(page, prot);
 	entry = pmd_mkhuge(entry);
 	return entry;
 }
@@ -727,7 +726,8 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 		pte_free(mm, pgtable);
 	} else {
 		pmd_t entry;
-		entry = mk_huge_pmd(page, vma);
+		entry = mk_huge_pmd(page, vma->vm_page_prot);
+		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
 		page_add_new_anon_rmap(page, vma, haddr);
 		pgtable_trans_huge_deposit(mm, pmd, pgtable);
 		set_pmd_at(mm, haddr, pmd, entry);
@@ -1210,7 +1210,8 @@ alloc:
 		goto out_mn;
 	} else {
 		pmd_t entry;
-		entry = mk_huge_pmd(new_page, vma);
+		entry = mk_huge_pmd(new_page, vma->vm_page_prot);
+		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
 		pmdp_clear_flush(vma, haddr, pmd);
 		page_add_new_anon_rmap(new_page, vma, haddr);
 		set_pmd_at(mm, haddr, pmd, entry);
@@ -2356,7 +2357,8 @@ static void collapse_huge_page(struct mm_struct *mm,
 	__SetPageUptodate(new_page);
 	pgtable = pmd_pgtable(_pmd);
 
-	_pmd = mk_huge_pmd(new_page, vma);
+	_pmd = mk_huge_pmd(new_page, vma->vm_page_prot);
+	_pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
 
 	/*
 	 * spin_lock() below is not the equivalent of smp_wmb(), so
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/8] thp: do_huge_pmd_anonymous_page() cleanup
  2013-07-15 10:47 ` Kirill A. Shutemov
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Minor cleanup: unindent most code of the fucntion by inverting one
condition. It's preparation for the next patch.

No functional changes.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 mm/huge_memory.c | 83 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 41 insertions(+), 42 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ec735a9..82f0697 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -785,55 +785,54 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long haddr = address & HPAGE_PMD_MASK;
 	pte_t *pte;
 
-	if (haddr >= vma->vm_start && haddr + HPAGE_PMD_SIZE <= vma->vm_end) {
-		if (unlikely(anon_vma_prepare(vma)))
-			return VM_FAULT_OOM;
-		if (unlikely(khugepaged_enter(vma)))
+	if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
+		goto out;
+	if (unlikely(anon_vma_prepare(vma)))
+		return VM_FAULT_OOM;
+	if (unlikely(khugepaged_enter(vma)))
+		return VM_FAULT_OOM;
+	if (!(flags & FAULT_FLAG_WRITE) &&
+			transparent_hugepage_use_zero_page()) {
+		pgtable_t pgtable;
+		struct page *zero_page;
+		bool set;
+		pgtable = pte_alloc_one(mm, haddr);
+		if (unlikely(!pgtable))
 			return VM_FAULT_OOM;
-		if (!(flags & FAULT_FLAG_WRITE) &&
-				transparent_hugepage_use_zero_page()) {
-			pgtable_t pgtable;
-			struct page *zero_page;
-			bool set;
-			pgtable = pte_alloc_one(mm, haddr);
-			if (unlikely(!pgtable))
-				return VM_FAULT_OOM;
-			zero_page = get_huge_zero_page();
-			if (unlikely(!zero_page)) {
-				pte_free(mm, pgtable);
-				count_vm_event(THP_FAULT_FALLBACK);
-				goto out;
-			}
-			spin_lock(&mm->page_table_lock);
-			set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd,
-					zero_page);
-			spin_unlock(&mm->page_table_lock);
-			if (!set) {
-				pte_free(mm, pgtable);
-				put_huge_zero_page();
-			}
-			return 0;
-		}
-		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
-		if (unlikely(!page)) {
+		zero_page = get_huge_zero_page();
+		if (unlikely(!zero_page)) {
+			pte_free(mm, pgtable);
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
 		}
-		count_vm_event(THP_FAULT_ALLOC);
-		if (unlikely(mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))) {
-			put_page(page);
-			goto out;
-		}
-		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd,
-							  page))) {
-			mem_cgroup_uncharge_page(page);
-			put_page(page);
-			goto out;
+		spin_lock(&mm->page_table_lock);
+		set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd,
+				zero_page);
+		spin_unlock(&mm->page_table_lock);
+		if (!set) {
+			pte_free(mm, pgtable);
+			put_huge_zero_page();
 		}
-
 		return 0;
 	}
+	page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
+			vma, haddr, numa_node_id(), 0);
+	if (unlikely(!page)) {
+		count_vm_event(THP_FAULT_FALLBACK);
+		goto out;
+	}
+	count_vm_event(THP_FAULT_ALLOC);
+	if (unlikely(mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))) {
+		put_page(page);
+		goto out;
+	}
+	if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd, page))) {
+		mem_cgroup_uncharge_page(page);
+		put_page(page);
+		goto out;
+	}
+
+	return 0;
 out:
 	/*
 	 * Use __pte_alloc instead of pte_alloc_map, because we can't
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/8] thp: do_huge_pmd_anonymous_page() cleanup
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Minor cleanup: unindent most code of the fucntion by inverting one
condition. It's preparation for the next patch.

No functional changes.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 mm/huge_memory.c | 83 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 41 insertions(+), 42 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ec735a9..82f0697 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -785,55 +785,54 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long haddr = address & HPAGE_PMD_MASK;
 	pte_t *pte;
 
-	if (haddr >= vma->vm_start && haddr + HPAGE_PMD_SIZE <= vma->vm_end) {
-		if (unlikely(anon_vma_prepare(vma)))
-			return VM_FAULT_OOM;
-		if (unlikely(khugepaged_enter(vma)))
+	if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
+		goto out;
+	if (unlikely(anon_vma_prepare(vma)))
+		return VM_FAULT_OOM;
+	if (unlikely(khugepaged_enter(vma)))
+		return VM_FAULT_OOM;
+	if (!(flags & FAULT_FLAG_WRITE) &&
+			transparent_hugepage_use_zero_page()) {
+		pgtable_t pgtable;
+		struct page *zero_page;
+		bool set;
+		pgtable = pte_alloc_one(mm, haddr);
+		if (unlikely(!pgtable))
 			return VM_FAULT_OOM;
-		if (!(flags & FAULT_FLAG_WRITE) &&
-				transparent_hugepage_use_zero_page()) {
-			pgtable_t pgtable;
-			struct page *zero_page;
-			bool set;
-			pgtable = pte_alloc_one(mm, haddr);
-			if (unlikely(!pgtable))
-				return VM_FAULT_OOM;
-			zero_page = get_huge_zero_page();
-			if (unlikely(!zero_page)) {
-				pte_free(mm, pgtable);
-				count_vm_event(THP_FAULT_FALLBACK);
-				goto out;
-			}
-			spin_lock(&mm->page_table_lock);
-			set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd,
-					zero_page);
-			spin_unlock(&mm->page_table_lock);
-			if (!set) {
-				pte_free(mm, pgtable);
-				put_huge_zero_page();
-			}
-			return 0;
-		}
-		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
-		if (unlikely(!page)) {
+		zero_page = get_huge_zero_page();
+		if (unlikely(!zero_page)) {
+			pte_free(mm, pgtable);
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
 		}
-		count_vm_event(THP_FAULT_ALLOC);
-		if (unlikely(mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))) {
-			put_page(page);
-			goto out;
-		}
-		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd,
-							  page))) {
-			mem_cgroup_uncharge_page(page);
-			put_page(page);
-			goto out;
+		spin_lock(&mm->page_table_lock);
+		set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd,
+				zero_page);
+		spin_unlock(&mm->page_table_lock);
+		if (!set) {
+			pte_free(mm, pgtable);
+			put_huge_zero_page();
 		}
-
 		return 0;
 	}
+	page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
+			vma, haddr, numa_node_id(), 0);
+	if (unlikely(!page)) {
+		count_vm_event(THP_FAULT_FALLBACK);
+		goto out;
+	}
+	count_vm_event(THP_FAULT_ALLOC);
+	if (unlikely(mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))) {
+		put_page(page);
+		goto out;
+	}
+	if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd, page))) {
+		mem_cgroup_uncharge_page(page);
+		put_page(page);
+		goto out;
+	}
+
+	return 0;
 out:
 	/*
 	 * Use __pte_alloc instead of pte_alloc_map, because we can't
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 8/8] thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
  2013-07-15 10:47 ` Kirill A. Shutemov
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

do_huge_pmd_anonymous_page() has copy-pasted piece of handle_mm_fault()
to handle fallback path.

Let's consolidate code back by introducing VM_FAULT_FALLBACK return
code.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 include/linux/huge_mm.h |  3 ---
 include/linux/mm.h      |  3 ++-
 mm/huge_memory.c        | 31 +++++--------------------------
 mm/memory.c             |  9 ++++++---
 4 files changed, 13 insertions(+), 33 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index b60de92..3935428 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -96,9 +96,6 @@ extern int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			  pmd_t *dst_pmd, pmd_t *src_pmd,
 			  struct vm_area_struct *vma,
 			  unsigned long addr, unsigned long end);
-extern int handle_pte_fault(struct mm_struct *mm,
-			    struct vm_area_struct *vma, unsigned long address,
-			    pte_t *pte, pmd_t *pmd, unsigned int flags);
 extern int split_huge_page_to_list(struct page *page, struct list_head *list);
 static inline int split_huge_page(struct page *page)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f022460..d5c82dc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -884,11 +884,12 @@ static inline int page_mapped(struct page *page)
 #define VM_FAULT_NOPAGE	0x0100	/* ->fault installed the pte, not return page */
 #define VM_FAULT_LOCKED	0x0200	/* ->fault locked the returned page */
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
+#define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 
 #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
 
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON | \
-			 VM_FAULT_HWPOISON_LARGE)
+			 VM_FAULT_FALLBACK | VM_FAULT_HWPOISON_LARGE)
 
 /* Encode hstate index for a hwpoisoned large page */
 #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 82f0697..c3b8c9c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -783,10 +783,9 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 {
 	struct page *page;
 	unsigned long haddr = address & HPAGE_PMD_MASK;
-	pte_t *pte;
 
 	if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
-		goto out;
+		return VM_FAULT_FALLBACK;
 	if (unlikely(anon_vma_prepare(vma)))
 		return VM_FAULT_OOM;
 	if (unlikely(khugepaged_enter(vma)))
@@ -803,7 +802,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(!zero_page)) {
 			pte_free(mm, pgtable);
 			count_vm_event(THP_FAULT_FALLBACK);
-			goto out;
+			return VM_FAULT_FALLBACK;
 		}
 		spin_lock(&mm->page_table_lock);
 		set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd,
@@ -819,40 +818,20 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			vma, haddr, numa_node_id(), 0);
 	if (unlikely(!page)) {
 		count_vm_event(THP_FAULT_FALLBACK);
-		goto out;
+		return VM_FAULT_FALLBACK;
 	}
 	count_vm_event(THP_FAULT_ALLOC);
 	if (unlikely(mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))) {
 		put_page(page);
-		goto out;
+		return VM_FAULT_FALLBACK;
 	}
 	if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd, page))) {
 		mem_cgroup_uncharge_page(page);
 		put_page(page);
-		goto out;
+		return VM_FAULT_FALLBACK;
 	}
 
 	return 0;
-out:
-	/*
-	 * Use __pte_alloc instead of pte_alloc_map, because we can't
-	 * run pte_offset_map on the pmd, if an huge pmd could
-	 * materialize from under us from a different thread.
-	 */
-	if (unlikely(pmd_none(*pmd)) &&
-	    unlikely(__pte_alloc(mm, vma, pmd, address)))
-		return VM_FAULT_OOM;
-	/* if an huge pmd materialized from under us just retry later */
-	if (unlikely(pmd_trans_huge(*pmd)))
-		return 0;
-	/*
-	 * A regular pmd is established and it can't morph into a huge pmd
-	 * from under us anymore at this point because we hold the mmap_sem
-	 * read mode and khugepaged takes it in write mode. So now it's
-	 * safe to run pte_offset_map().
-	 */
-	pte = pte_offset_map(pmd, address);
-	return handle_pte_fault(mm, vma, address, pte, pmd, flags);
 }
 
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
diff --git a/mm/memory.c b/mm/memory.c
index 1ce2e2a..f2ab2a8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3693,7 +3693,7 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
  * but allow concurrent faults), and pte mapped but not yet locked.
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
-int handle_pte_fault(struct mm_struct *mm,
+static int handle_pte_fault(struct mm_struct *mm,
 		     struct vm_area_struct *vma, unsigned long address,
 		     pte_t *pte, pmd_t *pmd, unsigned int flags)
 {
@@ -3780,9 +3780,12 @@ retry:
 	if (!pmd)
 		return VM_FAULT_OOM;
 	if (pmd_none(*pmd) && transparent_hugepage_enabled(vma)) {
+		int ret = VM_FAULT_FALLBACK;
 		if (!vma->vm_ops)
-			return do_huge_pmd_anonymous_page(mm, vma, address,
-							  pmd, flags);
+			ret = do_huge_pmd_anonymous_page(mm, vma, address,
+					pmd, flags);
+		if (!(ret & VM_FAULT_FALLBACK))
+			return ret;
 	} else {
 		pmd_t orig_pmd = *pmd;
 		int ret;
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 8/8] thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
@ 2013-07-15 10:47   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

do_huge_pmd_anonymous_page() has copy-pasted piece of handle_mm_fault()
to handle fallback path.

Let's consolidate code back by introducing VM_FAULT_FALLBACK return
code.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
 include/linux/huge_mm.h |  3 ---
 include/linux/mm.h      |  3 ++-
 mm/huge_memory.c        | 31 +++++--------------------------
 mm/memory.c             |  9 ++++++---
 4 files changed, 13 insertions(+), 33 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index b60de92..3935428 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -96,9 +96,6 @@ extern int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			  pmd_t *dst_pmd, pmd_t *src_pmd,
 			  struct vm_area_struct *vma,
 			  unsigned long addr, unsigned long end);
-extern int handle_pte_fault(struct mm_struct *mm,
-			    struct vm_area_struct *vma, unsigned long address,
-			    pte_t *pte, pmd_t *pmd, unsigned int flags);
 extern int split_huge_page_to_list(struct page *page, struct list_head *list);
 static inline int split_huge_page(struct page *page)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f022460..d5c82dc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -884,11 +884,12 @@ static inline int page_mapped(struct page *page)
 #define VM_FAULT_NOPAGE	0x0100	/* ->fault installed the pte, not return page */
 #define VM_FAULT_LOCKED	0x0200	/* ->fault locked the returned page */
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
+#define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 
 #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
 
 #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON | \
-			 VM_FAULT_HWPOISON_LARGE)
+			 VM_FAULT_FALLBACK | VM_FAULT_HWPOISON_LARGE)
 
 /* Encode hstate index for a hwpoisoned large page */
 #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 82f0697..c3b8c9c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -783,10 +783,9 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 {
 	struct page *page;
 	unsigned long haddr = address & HPAGE_PMD_MASK;
-	pte_t *pte;
 
 	if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
-		goto out;
+		return VM_FAULT_FALLBACK;
 	if (unlikely(anon_vma_prepare(vma)))
 		return VM_FAULT_OOM;
 	if (unlikely(khugepaged_enter(vma)))
@@ -803,7 +802,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(!zero_page)) {
 			pte_free(mm, pgtable);
 			count_vm_event(THP_FAULT_FALLBACK);
-			goto out;
+			return VM_FAULT_FALLBACK;
 		}
 		spin_lock(&mm->page_table_lock);
 		set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd,
@@ -819,40 +818,20 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			vma, haddr, numa_node_id(), 0);
 	if (unlikely(!page)) {
 		count_vm_event(THP_FAULT_FALLBACK);
-		goto out;
+		return VM_FAULT_FALLBACK;
 	}
 	count_vm_event(THP_FAULT_ALLOC);
 	if (unlikely(mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))) {
 		put_page(page);
-		goto out;
+		return VM_FAULT_FALLBACK;
 	}
 	if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd, page))) {
 		mem_cgroup_uncharge_page(page);
 		put_page(page);
-		goto out;
+		return VM_FAULT_FALLBACK;
 	}
 
 	return 0;
-out:
-	/*
-	 * Use __pte_alloc instead of pte_alloc_map, because we can't
-	 * run pte_offset_map on the pmd, if an huge pmd could
-	 * materialize from under us from a different thread.
-	 */
-	if (unlikely(pmd_none(*pmd)) &&
-	    unlikely(__pte_alloc(mm, vma, pmd, address)))
-		return VM_FAULT_OOM;
-	/* if an huge pmd materialized from under us just retry later */
-	if (unlikely(pmd_trans_huge(*pmd)))
-		return 0;
-	/*
-	 * A regular pmd is established and it can't morph into a huge pmd
-	 * from under us anymore at this point because we hold the mmap_sem
-	 * read mode and khugepaged takes it in write mode. So now it's
-	 * safe to run pte_offset_map().
-	 */
-	pte = pte_offset_map(pmd, address);
-	return handle_pte_fault(mm, vma, address, pte, pmd, flags);
 }
 
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
diff --git a/mm/memory.c b/mm/memory.c
index 1ce2e2a..f2ab2a8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3693,7 +3693,7 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
  * but allow concurrent faults), and pte mapped but not yet locked.
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
-int handle_pte_fault(struct mm_struct *mm,
+static int handle_pte_fault(struct mm_struct *mm,
 		     struct vm_area_struct *vma, unsigned long address,
 		     pte_t *pte, pmd_t *pmd, unsigned int flags)
 {
@@ -3780,9 +3780,12 @@ retry:
 	if (!pmd)
 		return VM_FAULT_OOM;
 	if (pmd_none(*pmd) && transparent_hugepage_enabled(vma)) {
+		int ret = VM_FAULT_FALLBACK;
 		if (!vma->vm_ops)
-			return do_huge_pmd_anonymous_page(mm, vma, address,
-							  pmd, flags);
+			ret = do_huge_pmd_anonymous_page(mm, vma, address,
+					pmd, flags);
+		if (!(ret & VM_FAULT_FALLBACK))
+			return ret;
 	} else {
 		pmd_t orig_pmd = *pmd;
 		int ret;
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* RE: [PATCH 4/8] mm: cleanup add_to_page_cache_locked()
  2013-07-15 10:47   ` Kirill A. Shutemov
@ 2013-07-15 13:55     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 13:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Andrew Morton, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Matthew Wilcox, Kirill A. Shutemov, Hillf Danton, Dave Hansen,
	linux-fsdevel, linux-kernel, Kirill A. Shutemov

Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> The patch makes add_to_page_cache_locked() cleaner:
>  - unindent most code of the function by inverting one condition;
>  - streamline code no-error path;
>  - move insert error path outside normal code path;
>  - call radix_tree_preload_end() earlier;
> 
> No functional changes.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---

...

> +	spin_lock_irq(&mapping->tree_lock);
> +	error = radix_tree_insert(&mapping->page_tree, offset, page);
> +	radix_tree_preload_end();
> +	if (unlikely(!error))
> +		goto err_insert;

Nadav Shemer noticed mistake here. It should be 'if (unlikely(error))'.

I've missed this during becase it was fixed by other patch in my
thp-pagecache series.

Fixed patch is below. Retested with this patchset applied only.
Looks okay now.

Sorry for this.

---

>From 0c74a39660f41a6c9504e3b4ec7a1ba3433af2a3 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Thu, 23 May 2013 16:08:12 +0300
Subject: [PATCH] mm: cleanup add_to_page_cache_locked()

The patch makes add_to_page_cache_locked() cleaner:
 - unindent most code of the function by inverting one condition;
 - streamline code no-error path;
 - move insert error path outside normal code path;
 - call radix_tree_preload_end() earlier;

No functional changes.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/filemap.c | 48 +++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index f6fe61f..f1ca99c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -467,32 +467,34 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 	error = mem_cgroup_cache_charge(page, current->mm,
 					gfp_mask & GFP_RECLAIM_MASK);
 	if (error)
-		goto out;
+		return error;
 
 	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
-	if (error == 0) {
-		page_cache_get(page);
-		page->mapping = mapping;
-		page->index = offset;
-
-		spin_lock_irq(&mapping->tree_lock);
-		error = radix_tree_insert(&mapping->page_tree, offset, page);
-		if (likely(!error)) {
-			mapping->nrpages++;
-			__inc_zone_page_state(page, NR_FILE_PAGES);
-			spin_unlock_irq(&mapping->tree_lock);
-			trace_mm_filemap_add_to_page_cache(page);
-		} else {
-			page->mapping = NULL;
-			/* Leave page->index set: truncation relies upon it */
-			spin_unlock_irq(&mapping->tree_lock);
-			mem_cgroup_uncharge_cache_page(page);
-			page_cache_release(page);
-		}
-		radix_tree_preload_end();
-	} else
+	if (error) {
 		mem_cgroup_uncharge_cache_page(page);
-out:
+		return error;
+	}
+
+	page_cache_get(page);
+	page->mapping = mapping;
+	page->index = offset;
+
+	spin_lock_irq(&mapping->tree_lock);
+	error = radix_tree_insert(&mapping->page_tree, offset, page);
+	radix_tree_preload_end();
+	if (unlikely(error))
+		goto err_insert;
+	mapping->nrpages++;
+	__inc_zone_page_state(page, NR_FILE_PAGES);
+	spin_unlock_irq(&mapping->tree_lock);
+	trace_mm_filemap_add_to_page_cache(page);
+	return 0;
+err_insert:
+	page->mapping = NULL;
+	/* Leave page->index set: truncation relies upon it */
+	spin_unlock_irq(&mapping->tree_lock);
+	mem_cgroup_uncharge_cache_page(page);
+	page_cache_release(page);
 	return error;
 }
 EXPORT_SYMBOL(add_to_page_cache_locked);
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* RE: [PATCH 4/8] mm: cleanup add_to_page_cache_locked()
@ 2013-07-15 13:55     ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-15 13:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Andrew Morton, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Matthew Wilcox, Kirill A. Shutemov, Hillf Danton, Dave Hansen,
	linux-fsdevel, linux-kernel

Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> The patch makes add_to_page_cache_locked() cleaner:
>  - unindent most code of the function by inverting one condition;
>  - streamline code no-error path;
>  - move insert error path outside normal code path;
>  - call radix_tree_preload_end() earlier;
> 
> No functional changes.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---

...

> +	spin_lock_irq(&mapping->tree_lock);
> +	error = radix_tree_insert(&mapping->page_tree, offset, page);
> +	radix_tree_preload_end();
> +	if (unlikely(!error))
> +		goto err_insert;

Nadav Shemer noticed mistake here. It should be 'if (unlikely(error))'.

I've missed this during becase it was fixed by other patch in my
thp-pagecache series.

Fixed patch is below. Retested with this patchset applied only.
Looks okay now.

Sorry for this.

---

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] mm: drop actor argument of do_generic_file_read()
  2013-07-15 10:47   ` Kirill A. Shutemov
@ 2013-07-16  3:31     ` Wanpeng Li
  -1 siblings, 0 replies; 41+ messages in thread
From: Wanpeng Li @ 2013-07-16  3:31 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Andrew Morton, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Matthew Wilcox, Kirill A. Shutemov, Hillf Danton, Dave Hansen,
	linux-fsdevel, linux-kernel, Kirill A. Shutemov

On Mon, Jul 15, 2013 at 01:47:47PM +0300, Kirill A. Shutemov wrote:
>From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
>There's only one caller of do_generic_file_read() and the only actor is
>file_read_actor(). No reason to have a callback parameter.
>
>Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

>---
> mm/filemap.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/mm/filemap.c b/mm/filemap.c
>index 4b51ac1..f6fe61f 100644
>--- a/mm/filemap.c
>+++ b/mm/filemap.c
>@@ -1088,7 +1088,6 @@ static void shrink_readahead_size_eio(struct file *filp,
>  * @filp:	the file to read
>  * @ppos:	current file position
>  * @desc:	read_descriptor
>- * @actor:	read method
>  *
>  * This is a generic file read routine, and uses the
>  * mapping->a_ops->readpage() function for the actual low-level stuff.
>@@ -1097,7 +1096,7 @@ static void shrink_readahead_size_eio(struct file *filp,
>  * of the logic when it comes to error handling etc.
>  */
> static void do_generic_file_read(struct file *filp, loff_t *ppos,
>-		read_descriptor_t *desc, read_actor_t actor)
>+		read_descriptor_t *desc)
> {
> 	struct address_space *mapping = filp->f_mapping;
> 	struct inode *inode = mapping->host;
>@@ -1198,13 +1197,14 @@ page_ok:
> 		 * Ok, we have the page, and it's up-to-date, so
> 		 * now we can copy it to user space...
> 		 *
>-		 * The actor routine returns how many bytes were actually used..
>+		 * The file_read_actor routine returns how many bytes were
>+		 * actually used..
> 		 * NOTE! This may not be the same as how much of a user buffer
> 		 * we filled up (we may be padding etc), so we can only update
> 		 * "pos" here (the actor routine has to update the user buffer
> 		 * pointers and the remaining count).
> 		 */
>-		ret = actor(desc, page, offset, nr);
>+		ret = file_read_actor(desc, page, offset, nr);
> 		offset += ret;
> 		index += offset >> PAGE_CACHE_SHIFT;
> 		offset &= ~PAGE_CACHE_MASK;
>@@ -1477,7 +1477,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
> 		if (desc.count == 0)
> 			continue;
> 		desc.error = 0;
>-		do_generic_file_read(filp, ppos, &desc, file_read_actor);
>+		do_generic_file_read(filp, ppos, &desc);
> 		retval += desc.written;
> 		if (desc.error) {
> 			retval = retval ?: desc.error;
>-- 
>1.8.3.2
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] mm: drop actor argument of do_generic_file_read()
  2013-07-15 10:47   ` Kirill A. Shutemov
  (?)
@ 2013-07-16  3:31   ` Wanpeng Li
  -1 siblings, 0 replies; 41+ messages in thread
From: Wanpeng Li @ 2013-07-16  3:31 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Andrew Morton, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Matthew Wilcox, Kirill A. Shutemov, Hillf Danton, Dave Hansen,
	linux-fsdevel, linux-kernel

On Mon, Jul 15, 2013 at 01:47:47PM +0300, Kirill A. Shutemov wrote:
>From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
>There's only one caller of do_generic_file_read() and the only actor is
>file_read_actor(). No reason to have a callback parameter.
>
>Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

>---
> mm/filemap.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/mm/filemap.c b/mm/filemap.c
>index 4b51ac1..f6fe61f 100644
>--- a/mm/filemap.c
>+++ b/mm/filemap.c
>@@ -1088,7 +1088,6 @@ static void shrink_readahead_size_eio(struct file *filp,
>  * @filp:	the file to read
>  * @ppos:	current file position
>  * @desc:	read_descriptor
>- * @actor:	read method
>  *
>  * This is a generic file read routine, and uses the
>  * mapping->a_ops->readpage() function for the actual low-level stuff.
>@@ -1097,7 +1096,7 @@ static void shrink_readahead_size_eio(struct file *filp,
>  * of the logic when it comes to error handling etc.
>  */
> static void do_generic_file_read(struct file *filp, loff_t *ppos,
>-		read_descriptor_t *desc, read_actor_t actor)
>+		read_descriptor_t *desc)
> {
> 	struct address_space *mapping = filp->f_mapping;
> 	struct inode *inode = mapping->host;
>@@ -1198,13 +1197,14 @@ page_ok:
> 		 * Ok, we have the page, and it's up-to-date, so
> 		 * now we can copy it to user space...
> 		 *
>-		 * The actor routine returns how many bytes were actually used..
>+		 * The file_read_actor routine returns how many bytes were
>+		 * actually used..
> 		 * NOTE! This may not be the same as how much of a user buffer
> 		 * we filled up (we may be padding etc), so we can only update
> 		 * "pos" here (the actor routine has to update the user buffer
> 		 * pointers and the remaining count).
> 		 */
>-		ret = actor(desc, page, offset, nr);
>+		ret = file_read_actor(desc, page, offset, nr);
> 		offset += ret;
> 		index += offset >> PAGE_CACHE_SHIFT;
> 		offset &= ~PAGE_CACHE_MASK;
>@@ -1477,7 +1477,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
> 		if (desc.count == 0)
> 			continue;
> 		desc.error = 0;
>-		do_generic_file_read(filp, ppos, &desc, file_read_actor);
>+		do_generic_file_read(filp, ppos, &desc);
> 		retval += desc.written;
> 		if (desc.error) {
> 			retval = retval ?: desc.error;
>-- 
>1.8.3.2
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] mm: drop actor argument of do_generic_file_read()
@ 2013-07-16  3:31     ` Wanpeng Li
  0 siblings, 0 replies; 41+ messages in thread
From: Wanpeng Li @ 2013-07-16  3:31 UTC (permalink / raw)
  Cc: Andrea Arcangeli, Andrew Morton, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Matthew Wilcox, Kirill A. Shutemov, Hillf Danton, Dave Hansen,
	linux-fsdevel, linux-kernel, Kirill A. Shutemov

On Mon, Jul 15, 2013 at 01:47:47PM +0300, Kirill A. Shutemov wrote:
>From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
>There's only one caller of do_generic_file_read() and the only actor is
>file_read_actor(). No reason to have a callback parameter.
>
>Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

>---
> mm/filemap.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/mm/filemap.c b/mm/filemap.c
>index 4b51ac1..f6fe61f 100644
>--- a/mm/filemap.c
>+++ b/mm/filemap.c
>@@ -1088,7 +1088,6 @@ static void shrink_readahead_size_eio(struct file *filp,
>  * @filp:	the file to read
>  * @ppos:	current file position
>  * @desc:	read_descriptor
>- * @actor:	read method
>  *
>  * This is a generic file read routine, and uses the
>  * mapping->a_ops->readpage() function for the actual low-level stuff.
>@@ -1097,7 +1096,7 @@ static void shrink_readahead_size_eio(struct file *filp,
>  * of the logic when it comes to error handling etc.
>  */
> static void do_generic_file_read(struct file *filp, loff_t *ppos,
>-		read_descriptor_t *desc, read_actor_t actor)
>+		read_descriptor_t *desc)
> {
> 	struct address_space *mapping = filp->f_mapping;
> 	struct inode *inode = mapping->host;
>@@ -1198,13 +1197,14 @@ page_ok:
> 		 * Ok, we have the page, and it's up-to-date, so
> 		 * now we can copy it to user space...
> 		 *
>-		 * The actor routine returns how many bytes were actually used..
>+		 * The file_read_actor routine returns how many bytes were
>+		 * actually used..
> 		 * NOTE! This may not be the same as how much of a user buffer
> 		 * we filled up (we may be padding etc), so we can only update
> 		 * "pos" here (the actor routine has to update the user buffer
> 		 * pointers and the remaining count).
> 		 */
>-		ret = actor(desc, page, offset, nr);
>+		ret = file_read_actor(desc, page, offset, nr);
> 		offset += ret;
> 		index += offset >> PAGE_CACHE_SHIFT;
> 		offset &= ~PAGE_CACHE_MASK;
>@@ -1477,7 +1477,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
> 		if (desc.count == 0)
> 			continue;
> 		desc.error = 0;
>-		do_generic_file_read(filp, ppos, &desc, file_read_actor);
>+		do_generic_file_read(filp, ppos, &desc);
> 		retval += desc.written;
> 		if (desc.error) {
> 			retval = retval ?: desc.error;
>-- 
>1.8.3.2
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] mm: drop actor argument of do_generic_file_read()
  2013-07-15 10:47   ` Kirill A. Shutemov
@ 2013-07-16 19:10     ` Matthew Wilcox
  -1 siblings, 0 replies; 41+ messages in thread
From: Matthew Wilcox @ 2013-07-16 19:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Andrew Morton, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Kirill A. Shutemov, Hillf Danton, Dave Hansen, linux-fsdevel,
	linux-kernel

On Mon, Jul 15, 2013 at 01:47:47PM +0300, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> There's only one caller of do_generic_file_read() and the only actor is
> file_read_actor(). No reason to have a callback parameter.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Would it make sense to do the same thing to do_shmem_file_read()?

From: Matthew Wilcox <willy@linux.intel.com>

There's only one caller of do_shmem_file_read() and the only actor is
file_read_actor(). No reason to have a callback parameter.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

diff --git a/mm/shmem.c b/mm/shmem.c
index 5e6a842..6a9c325 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1464,7 +1464,7 @@ shmem_write_end(struct file *file, struct address_space *mapping,
 	return copied;
 }
 
-static void do_shmem_file_read(struct file *filp, loff_t *ppos, read_descriptor_t *desc, read_actor_t actor)
+static void do_shmem_file_read(struct file *filp, loff_t *ppos, read_descriptor_t *desc)
 {
 	struct inode *inode = file_inode(filp);
 	struct address_space *mapping = inode->i_mapping;
@@ -1546,13 +1546,14 @@ static void do_shmem_file_read(struct file *filp, loff_t *ppos, read_descriptor_
 		 * Ok, we have the page, and it's up-to-date, so
 		 * now we can copy it to user space...
 		 *
-		 * The actor routine returns how many bytes were actually used..
+		 * The file_read_actor routine returns how many bytes were actually
+		 * used..
 		 * NOTE! This may not be the same as how much of a user buffer
 		 * we filled up (we may be padding etc), so we can only update
-		 * "pos" here (the actor routine has to update the user buffer
+		 * "pos" here (file_read_actor has to update the user buffer
 		 * pointers and the remaining count).
 		 */
-		ret = actor(desc, page, offset, nr);
+		ret = file_read_actor(desc, page, offset, nr);
 		offset += ret;
 		index += offset >> PAGE_CACHE_SHIFT;
 		offset &= ~PAGE_CACHE_MASK;
@@ -1590,7 +1591,7 @@ static ssize_t shmem_file_aio_read(struct kiocb *iocb,
 		if (desc.count == 0)
 			continue;
 		desc.error = 0;
-		do_shmem_file_read(filp, ppos, &desc, file_read_actor);
+		do_shmem_file_read(filp, ppos, &desc);
 		retval += desc.written;
 		if (desc.error) {
 			retval = retval ?: desc.error;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] mm: drop actor argument of do_generic_file_read()
@ 2013-07-16 19:10     ` Matthew Wilcox
  0 siblings, 0 replies; 41+ messages in thread
From: Matthew Wilcox @ 2013-07-16 19:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Andrew Morton, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Kirill A. Shutemov, Hillf Danton, Dave Hansen, linux-fsdevel,
	linux-kernel

On Mon, Jul 15, 2013 at 01:47:47PM +0300, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> There's only one caller of do_generic_file_read() and the only actor is
> file_read_actor(). No reason to have a callback parameter.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Would it make sense to do the same thing to do_shmem_file_read()?

From: Matthew Wilcox <willy@linux.intel.com>

There's only one caller of do_shmem_file_read() and the only actor is
file_read_actor(). No reason to have a callback parameter.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

diff --git a/mm/shmem.c b/mm/shmem.c
index 5e6a842..6a9c325 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1464,7 +1464,7 @@ shmem_write_end(struct file *file, struct address_space *mapping,
 	return copied;
 }
 
-static void do_shmem_file_read(struct file *filp, loff_t *ppos, read_descriptor_t *desc, read_actor_t actor)
+static void do_shmem_file_read(struct file *filp, loff_t *ppos, read_descriptor_t *desc)
 {
 	struct inode *inode = file_inode(filp);
 	struct address_space *mapping = inode->i_mapping;
@@ -1546,13 +1546,14 @@ static void do_shmem_file_read(struct file *filp, loff_t *ppos, read_descriptor_
 		 * Ok, we have the page, and it's up-to-date, so
 		 * now we can copy it to user space...
 		 *
-		 * The actor routine returns how many bytes were actually used..
+		 * The file_read_actor routine returns how many bytes were actually
+		 * used..
 		 * NOTE! This may not be the same as how much of a user buffer
 		 * we filled up (we may be padding etc), so we can only update
-		 * "pos" here (the actor routine has to update the user buffer
+		 * "pos" here (file_read_actor has to update the user buffer
 		 * pointers and the remaining count).
 		 */
-		ret = actor(desc, page, offset, nr);
+		ret = file_read_actor(desc, page, offset, nr);
 		offset += ret;
 		index += offset >> PAGE_CACHE_SHIFT;
 		offset &= ~PAGE_CACHE_MASK;
@@ -1590,7 +1591,7 @@ static ssize_t shmem_file_aio_read(struct kiocb *iocb,
 		if (desc.count == 0)
 			continue;
 		desc.error = 0;
-		do_shmem_file_read(filp, ppos, &desc, file_read_actor);
+		do_shmem_file_read(filp, ppos, &desc);
 		retval += desc.written;
 		if (desc.error) {
 			retval = retval ?: desc.error;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] mm: drop actor argument of do_generic_file_read()
  2013-07-16 19:10     ` Matthew Wilcox
@ 2013-07-16 23:13       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-16 23:13 UTC (permalink / raw)
  To: Matthew Wilcox, Andreas Dilger, Peng Tao, Oleg Drokin
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Andrew Morton, Al Viro,
	Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman, linux-mm,
	Andi Kleen, Kirill A. Shutemov, Hillf Danton, Dave Hansen,
	linux-fsdevel, linux-kernel

Matthew Wilcox wrote:
> On Mon, Jul 15, 2013 at 01:47:47PM +0300, Kirill A. Shutemov wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > 
> > There's only one caller of do_generic_file_read() and the only actor is
> > file_read_actor(). No reason to have a callback parameter.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Would it make sense to do the same thing to do_shmem_file_read()?
> 
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> There's only one caller of do_shmem_file_read() and the only actor is
> file_read_actor(). No reason to have a callback parameter.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

Looks good to me:

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

v3.11-rc1 brings one more user for read_actor_t -- lustre. But it seemes
it's artifact of ages when f_op had ->sendfile and it's not in use
anymore.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] mm: drop actor argument of do_generic_file_read()
@ 2013-07-16 23:13       ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-16 23:13 UTC (permalink / raw)
  To: Matthew Wilcox, Andreas Dilger, Peng Tao, Oleg Drokin
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Andrew Morton, Al Viro,
	Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman, linux-mm,
	Andi Kleen, Kirill A. Shutemov, Hillf Danton, Dave Hansen,
	linux-fsdevel, linux-kernel

Matthew Wilcox wrote:
> On Mon, Jul 15, 2013 at 01:47:47PM +0300, Kirill A. Shutemov wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > 
> > There's only one caller of do_generic_file_read() and the only actor is
> > file_read_actor(). No reason to have a callback parameter.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Would it make sense to do the same thing to do_shmem_file_read()?
> 
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> There's only one caller of do_shmem_file_read() and the only actor is
> file_read_actor(). No reason to have a callback parameter.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

Looks good to me:

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

v3.11-rc1 brings one more user for read_actor_t -- lustre. But it seemes
it's artifact of ages when f_op had ->sendfile and it's not in use
anymore.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
  2013-07-15 10:47   ` Kirill A. Shutemov
@ 2013-07-17 21:09     ` Andrew Morton
  -1 siblings, 0 replies; 41+ messages in thread
From: Andrew Morton @ 2013-07-17 21:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara,
	Mel Gorman, linux-mm, Andi Kleen, Matthew Wilcox,
	Kirill A. Shutemov, Hillf Danton, Dave Hansen, linux-fsdevel,
	linux-kernel

On Mon, 15 Jul 2013 13:47:51 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> Locking head page means locking entire compound page.
> If we try to lock tail page, something went wrong.
> 
> ..
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -639,6 +639,7 @@ void __lock_page(struct page *page)
>  {
>  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
>  
> +	VM_BUG_ON(PageTail(page));
>  	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
>  							TASK_UNINTERRUPTIBLE);
>  }
> @@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
>  {
>  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
>  
> +	VM_BUG_ON(PageTail(page));
>  	return __wait_on_bit_lock(page_waitqueue(page), &wait,
>  					sleep_on_page_killable, TASK_KILLABLE);
>  }

lock_page() is a pretty commonly called function, and I assume quite a
lot of people run with CONFIG_DEBUG_VM=y.

Is the overhead added by this patch really worthwhile?

I'm thinking I might leave it in -mm indefinitely but not send it
upstream.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
@ 2013-07-17 21:09     ` Andrew Morton
  0 siblings, 0 replies; 41+ messages in thread
From: Andrew Morton @ 2013-07-17 21:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara,
	Mel Gorman, linux-mm, Andi Kleen, Matthew Wilcox,
	Kirill A. Shutemov, Hillf Danton, Dave Hansen, linux-fsdevel,
	linux-kernel

On Mon, 15 Jul 2013 13:47:51 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> Locking head page means locking entire compound page.
> If we try to lock tail page, something went wrong.
> 
> ..
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -639,6 +639,7 @@ void __lock_page(struct page *page)
>  {
>  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
>  
> +	VM_BUG_ON(PageTail(page));
>  	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
>  							TASK_UNINTERRUPTIBLE);
>  }
> @@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
>  {
>  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
>  
> +	VM_BUG_ON(PageTail(page));
>  	return __wait_on_bit_lock(page_waitqueue(page), &wait,
>  					sleep_on_page_killable, TASK_KILLABLE);
>  }

lock_page() is a pretty commonly called function, and I assume quite a
lot of people run with CONFIG_DEBUG_VM=y.

Is the overhead added by this patch really worthwhile?

I'm thinking I might leave it in -mm indefinitely but not send it
upstream.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
  2013-07-17 21:09     ` Andrew Morton
@ 2013-07-17 22:43       ` Dave Hansen
  -1 siblings, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2013-07-17 22:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Matthew Wilcox, Kirill A. Shutemov, Hillf Danton, linux-fsdevel,
	linux-kernel

On 07/17/2013 02:09 PM, Andrew Morton wrote:
> lock_page() is a pretty commonly called function, and I assume quite a
> lot of people run with CONFIG_DEBUG_VM=y.
> 
> Is the overhead added by this patch really worthwhile?

I always thought of it as a developer-only thing.  I don't think any of
the big distros turn it on by default.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
@ 2013-07-17 22:43       ` Dave Hansen
  0 siblings, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2013-07-17 22:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Matthew Wilcox, Kirill A. Shutemov, Hillf Danton, linux-fsdevel,
	linux-kernel

On 07/17/2013 02:09 PM, Andrew Morton wrote:
> lock_page() is a pretty commonly called function, and I assume quite a
> lot of people run with CONFIG_DEBUG_VM=y.
> 
> Is the overhead added by this patch really worthwhile?

I always thought of it as a developer-only thing.  I don't think any of
the big distros turn it on by default.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
  2013-07-17 21:09     ` Andrew Morton
@ 2013-07-17 22:45       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-17 22:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Matthew Wilcox, Kirill A. Shutemov, Hillf Danton, Dave Hansen,
	linux-fsdevel, linux-kernel

Andrew Morton wrote:
> On Mon, 15 Jul 2013 13:47:51 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > 
> > Locking head page means locking entire compound page.
> > If we try to lock tail page, something went wrong.
> > 
> > ..
> >
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -639,6 +639,7 @@ void __lock_page(struct page *page)
> >  {
> >  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> >  
> > +	VM_BUG_ON(PageTail(page));
> >  	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
> >  							TASK_UNINTERRUPTIBLE);
> >  }
> > @@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
> >  {
> >  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> >  
> > +	VM_BUG_ON(PageTail(page));
> >  	return __wait_on_bit_lock(page_waitqueue(page), &wait,
> >  					sleep_on_page_killable, TASK_KILLABLE);
> >  }
> 
> lock_page() is a pretty commonly called function, and I assume quite a
> lot of people run with CONFIG_DEBUG_VM=y.
> 
> Is the overhead added by this patch really worthwhile?

I found it useful, especially, when I was starting experiments with THP
for pagecache. But feel free to drop it if think that it adds to much
overhead.

> I'm thinking I might leave it in -mm indefinitely but not send it
> upstream.

Works for me too.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
@ 2013-07-17 22:45       ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-07-17 22:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Al Viro, Hugh Dickins,
	Wu Fengguang, Jan Kara, Mel Gorman, linux-mm, Andi Kleen,
	Matthew Wilcox, Kirill A. Shutemov, Hillf Danton, Dave Hansen,
	linux-fsdevel, linux-kernel

Andrew Morton wrote:
> On Mon, 15 Jul 2013 13:47:51 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > 
> > Locking head page means locking entire compound page.
> > If we try to lock tail page, something went wrong.
> > 
> > ..
> >
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -639,6 +639,7 @@ void __lock_page(struct page *page)
> >  {
> >  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> >  
> > +	VM_BUG_ON(PageTail(page));
> >  	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
> >  							TASK_UNINTERRUPTIBLE);
> >  }
> > @@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
> >  {
> >  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
> >  
> > +	VM_BUG_ON(PageTail(page));
> >  	return __wait_on_bit_lock(page_waitqueue(page), &wait,
> >  					sleep_on_page_killable, TASK_KILLABLE);
> >  }
> 
> lock_page() is a pretty commonly called function, and I assume quite a
> lot of people run with CONFIG_DEBUG_VM=y.
> 
> Is the overhead added by this patch really worthwhile?

I found it useful, especially, when I was starting experiments with THP
for pagecache. But feel free to drop it if think that it adds to much
overhead.

> I'm thinking I might leave it in -mm indefinitely but not send it
> upstream.

Works for me too.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
  2013-07-17 22:43       ` Dave Hansen
@ 2013-07-18  0:58         ` Hugh Dickins
  -1 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2013-07-18  0:58 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Dave Jones, Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, linux-fsdevel, linux-kernel

On Wed, 17 Jul 2013, Dave Hansen wrote:
> On 07/17/2013 02:09 PM, Andrew Morton wrote:
> > lock_page() is a pretty commonly called function, and I assume quite a
> > lot of people run with CONFIG_DEBUG_VM=y.
> > 
> > Is the overhead added by this patch really worthwhile?
> 
> I always thought of it as a developer-only thing.  I don't think any of
> the big distros turn it on by default.

That's how I think of it too (and the problem is often that too few mm
developers turn it on); but Dave Jones did confirm last November that
Fedora turns it on.

I believe Fedora turns it on to help us all, and wouldn't mind a mere
VM_BUG_ON(PageTail(page)) in __lock_page() if it's helpful to Kirill.

But if VM_BUG_ONs become expensive, I do think it's for Fedora to
turn off CONFIG_DEBUG_VM, rather than for mm developers to avoid it.

Hugh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
@ 2013-07-18  0:58         ` Hugh Dickins
  0 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2013-07-18  0:58 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Dave Jones, Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, linux-fsdevel, linux-kernel

On Wed, 17 Jul 2013, Dave Hansen wrote:
> On 07/17/2013 02:09 PM, Andrew Morton wrote:
> > lock_page() is a pretty commonly called function, and I assume quite a
> > lot of people run with CONFIG_DEBUG_VM=y.
> > 
> > Is the overhead added by this patch really worthwhile?
> 
> I always thought of it as a developer-only thing.  I don't think any of
> the big distros turn it on by default.

That's how I think of it too (and the problem is often that too few mm
developers turn it on); but Dave Jones did confirm last November that
Fedora turns it on.

I believe Fedora turns it on to help us all, and wouldn't mind a mere
VM_BUG_ON(PageTail(page)) in __lock_page() if it's helpful to Kirill.

But if VM_BUG_ONs become expensive, I do think it's for Fedora to
turn off CONFIG_DEBUG_VM, rather than for mm developers to avoid it.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
  2013-07-18  0:58         ` Hugh Dickins
@ 2013-07-18  2:04           ` Dave Jones
  -1 siblings, 0 replies; 41+ messages in thread
From: Dave Jones @ 2013-07-18  2:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Hansen, Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Al Viro, Wu Fengguang, Jan Kara, Mel Gorman, linux-mm,
	Andi Kleen, Matthew Wilcox, Kirill A. Shutemov, Hillf Danton,
	linux-fsdevel, linux-kernel

On Wed, Jul 17, 2013 at 05:58:13PM -0700, Hugh Dickins wrote:
 > On Wed, 17 Jul 2013, Dave Hansen wrote:
 > > On 07/17/2013 02:09 PM, Andrew Morton wrote:
 > > > lock_page() is a pretty commonly called function, and I assume quite a
 > > > lot of people run with CONFIG_DEBUG_VM=y.
 > > > 
 > > > Is the overhead added by this patch really worthwhile?
 > > 
 > > I always thought of it as a developer-only thing.  I don't think any of
 > > the big distros turn it on by default.
 > 
 > That's how I think of it too (and the problem is often that too few mm
 > developers turn it on); but Dave Jones did confirm last November that
 > Fedora turns it on.
 > 
 > I believe Fedora turns it on to help us all, and wouldn't mind a mere
 > VM_BUG_ON(PageTail(page)) in __lock_page() if it's helpful to Kirill.
 > 
 > But if VM_BUG_ONs become expensive, I do think it's for Fedora to
 > turn off CONFIG_DEBUG_VM, rather than for mm developers to avoid it.

I'm ambivalent about whether we keep it on or off, we have no shortage
of bugs to fix already, though I think as mentioned above, very few people
actually enable it, so we're going to lose a lot of testing.

Another idea, perhaps is an extra config option for more expensive debug options ?

	Dave



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
@ 2013-07-18  2:04           ` Dave Jones
  0 siblings, 0 replies; 41+ messages in thread
From: Dave Jones @ 2013-07-18  2:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Hansen, Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Al Viro, Wu Fengguang, Jan Kara, Mel Gorman, linux-mm,
	Andi Kleen, Matthew Wilcox, Kirill A. Shutemov, Hillf Danton,
	linux-fsdevel, linux-kernel

On Wed, Jul 17, 2013 at 05:58:13PM -0700, Hugh Dickins wrote:
 > On Wed, 17 Jul 2013, Dave Hansen wrote:
 > > On 07/17/2013 02:09 PM, Andrew Morton wrote:
 > > > lock_page() is a pretty commonly called function, and I assume quite a
 > > > lot of people run with CONFIG_DEBUG_VM=y.
 > > > 
 > > > Is the overhead added by this patch really worthwhile?
 > > 
 > > I always thought of it as a developer-only thing.  I don't think any of
 > > the big distros turn it on by default.
 > 
 > That's how I think of it too (and the problem is often that too few mm
 > developers turn it on); but Dave Jones did confirm last November that
 > Fedora turns it on.
 > 
 > I believe Fedora turns it on to help us all, and wouldn't mind a mere
 > VM_BUG_ON(PageTail(page)) in __lock_page() if it's helpful to Kirill.
 > 
 > But if VM_BUG_ONs become expensive, I do think it's for Fedora to
 > turn off CONFIG_DEBUG_VM, rather than for mm developers to avoid it.

I'm ambivalent about whether we keep it on or off, we have no shortage
of bugs to fix already, though I think as mentioned above, very few people
actually enable it, so we're going to lose a lot of testing.

Another idea, perhaps is an extra config option for more expensive debug options ?

	Dave


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
  2013-07-18  2:04           ` Dave Jones
@ 2013-07-18 19:59             ` Hugh Dickins
  -1 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2013-07-18 19:59 UTC (permalink / raw)
  To: Dave Jones
  Cc: Dave Hansen, Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Al Viro, Wu Fengguang, Jan Kara, Mel Gorman, linux-mm,
	Andi Kleen, Matthew Wilcox, Kirill A. Shutemov, Hillf Danton,
	linux-fsdevel, linux-kernel

On Wed, 17 Jul 2013, Dave Jones wrote:
> On Wed, Jul 17, 2013 at 05:58:13PM -0700, Hugh Dickins wrote:
>  > On Wed, 17 Jul 2013, Dave Hansen wrote:
>  > > On 07/17/2013 02:09 PM, Andrew Morton wrote:
>  > > > lock_page() is a pretty commonly called function, and I assume quite a
>  > > > lot of people run with CONFIG_DEBUG_VM=y.
>  > > > 
>  > > > Is the overhead added by this patch really worthwhile?
>  > > 
>  > > I always thought of it as a developer-only thing.  I don't think any of
>  > > the big distros turn it on by default.
>  > 
>  > That's how I think of it too (and the problem is often that too few mm
>  > developers turn it on); but Dave Jones did confirm last November that
>  > Fedora turns it on.
>  > 
>  > I believe Fedora turns it on to help us all, and wouldn't mind a mere
>  > VM_BUG_ON(PageTail(page)) in __lock_page() if it's helpful to Kirill.
>  > 
>  > But if VM_BUG_ONs become expensive, I do think it's for Fedora to
>  > turn off CONFIG_DEBUG_VM, rather than for mm developers to avoid it.
> 
> I'm ambivalent about whether we keep it on or off, we have no shortage
> of bugs to fix already, though I think as mentioned above, very few people
> actually enable it, so we're going to lose a lot of testing.
> 
> Another idea, perhaps is an extra config option for more expensive debug options ?

I can't think of any expensive debug under CONFIG_DEBUG_VM at present,
just a little overhead as you'd expect.  I don't think checking a page
flag or two in __lock_page() will change that much.

We do tend to make specific debug options for the more expensive ones,
rather than a generic heavy debug option.  For example CONFIG_DEBUG_VM_RB,
which checks an mm's entire vma tree whenever a change is made there.

We could group the heavy debug options, like that and CONFIG_PROVE_LOCKING,
into a menu of their own; but I've no appetite for such a change myself!

Hugh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] thp, mm: locking tail page is a bug
@ 2013-07-18 19:59             ` Hugh Dickins
  0 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2013-07-18 19:59 UTC (permalink / raw)
  To: Dave Jones
  Cc: Dave Hansen, Andrew Morton, Kirill A. Shutemov, Andrea Arcangeli,
	Al Viro, Wu Fengguang, Jan Kara, Mel Gorman, linux-mm,
	Andi Kleen, Matthew Wilcox, Kirill A. Shutemov, Hillf Danton,
	linux-fsdevel, linux-kernel

On Wed, 17 Jul 2013, Dave Jones wrote:
> On Wed, Jul 17, 2013 at 05:58:13PM -0700, Hugh Dickins wrote:
>  > On Wed, 17 Jul 2013, Dave Hansen wrote:
>  > > On 07/17/2013 02:09 PM, Andrew Morton wrote:
>  > > > lock_page() is a pretty commonly called function, and I assume quite a
>  > > > lot of people run with CONFIG_DEBUG_VM=y.
>  > > > 
>  > > > Is the overhead added by this patch really worthwhile?
>  > > 
>  > > I always thought of it as a developer-only thing.  I don't think any of
>  > > the big distros turn it on by default.
>  > 
>  > That's how I think of it too (and the problem is often that too few mm
>  > developers turn it on); but Dave Jones did confirm last November that
>  > Fedora turns it on.
>  > 
>  > I believe Fedora turns it on to help us all, and wouldn't mind a mere
>  > VM_BUG_ON(PageTail(page)) in __lock_page() if it's helpful to Kirill.
>  > 
>  > But if VM_BUG_ONs become expensive, I do think it's for Fedora to
>  > turn off CONFIG_DEBUG_VM, rather than for mm developers to avoid it.
> 
> I'm ambivalent about whether we keep it on or off, we have no shortage
> of bugs to fix already, though I think as mentioned above, very few people
> actually enable it, so we're going to lose a lot of testing.
> 
> Another idea, perhaps is an extra config option for more expensive debug options ?

I can't think of any expensive debug under CONFIG_DEBUG_VM at present,
just a little overhead as you'd expect.  I don't think checking a page
flag or two in __lock_page() will change that much.

We do tend to make specific debug options for the more expensive ones,
rather than a generic heavy debug option.  For example CONFIG_DEBUG_VM_RB,
which checks an mm's entire vma tree whenever a change is made there.

We could group the heavy debug options, like that and CONFIG_PROVE_LOCKING,
into a menu of their own; but I've no appetite for such a change myself!

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 5/8] thp, mm: locking tail page is a bug
  2013-06-11 15:35 [PATCH 0/8] Transparent huge page cache: phase 0, prep work Kirill A. Shutemov
@ 2013-06-11 15:35   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-06-11 15:35 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Locking head page means locking entire compound page.
If we try to lock tail page, something went wrong.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/filemap.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index bb6ad8c..2bd1d24 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -639,6 +639,7 @@ void __lock_page(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_BUG_ON(PageTail(page));
 	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_BUG_ON(PageTail(page));
 	return __wait_on_bit_lock(page_waitqueue(page), &wait,
 					sleep_on_page_killable, TASK_KILLABLE);
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/8] thp, mm: locking tail page is a bug
@ 2013-06-11 15:35   ` Kirill A. Shutemov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill A. Shutemov @ 2013-06-11 15:35 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Al Viro, Hugh Dickins, Wu Fengguang, Jan Kara, Mel Gorman,
	linux-mm, Andi Kleen, Matthew Wilcox, Kirill A. Shutemov,
	Hillf Danton, Dave Hansen, linux-fsdevel, linux-kernel,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Locking head page means locking entire compound page.
If we try to lock tail page, something went wrong.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 mm/filemap.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index bb6ad8c..2bd1d24 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -639,6 +639,7 @@ void __lock_page(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_BUG_ON(PageTail(page));
 	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_BUG_ON(PageTail(page));
 	return __wait_on_bit_lock(page_waitqueue(page), &wait,
 					sleep_on_page_killable, TASK_KILLABLE);
 }
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2013-07-18 19:59 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-15 10:47 [PATCH, REBASED 0/8] Transparent huge page cache: phase 0, prep work Kirill A. Shutemov
2013-07-15 10:47 ` Kirill A. Shutemov
2013-07-15 10:47 ` [PATCH 1/8] mm: drop actor argument of do_generic_file_read() Kirill A. Shutemov
2013-07-15 10:47   ` Kirill A. Shutemov
2013-07-16  3:31   ` Wanpeng Li
2013-07-16  3:31   ` Wanpeng Li
2013-07-16  3:31     ` Wanpeng Li
2013-07-16 19:10   ` Matthew Wilcox
2013-07-16 19:10     ` Matthew Wilcox
2013-07-16 23:13     ` Kirill A. Shutemov
2013-07-16 23:13       ` Kirill A. Shutemov
2013-07-15 10:47 ` [PATCH 2/8] thp, mm: avoid PageUnevictable on active/inactive lru lists Kirill A. Shutemov
2013-07-15 10:47   ` Kirill A. Shutemov
2013-07-15 10:47 ` [PATCH 3/8] thp: account anon transparent huge pages into NR_ANON_PAGES Kirill A. Shutemov
2013-07-15 10:47   ` Kirill A. Shutemov
2013-07-15 10:47 ` [PATCH 4/8] mm: cleanup add_to_page_cache_locked() Kirill A. Shutemov
2013-07-15 10:47   ` Kirill A. Shutemov
2013-07-15 13:55   ` Kirill A. Shutemov
2013-07-15 13:55     ` Kirill A. Shutemov
2013-07-15 10:47 ` [PATCH 5/8] thp, mm: locking tail page is a bug Kirill A. Shutemov
2013-07-15 10:47   ` Kirill A. Shutemov
2013-07-17 21:09   ` Andrew Morton
2013-07-17 21:09     ` Andrew Morton
2013-07-17 22:43     ` Dave Hansen
2013-07-17 22:43       ` Dave Hansen
2013-07-18  0:58       ` Hugh Dickins
2013-07-18  0:58         ` Hugh Dickins
2013-07-18  2:04         ` Dave Jones
2013-07-18  2:04           ` Dave Jones
2013-07-18 19:59           ` Hugh Dickins
2013-07-18 19:59             ` Hugh Dickins
2013-07-17 22:45     ` Kirill A. Shutemov
2013-07-17 22:45       ` Kirill A. Shutemov
2013-07-15 10:47 ` [PATCH 6/8] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() Kirill A. Shutemov
2013-07-15 10:47   ` Kirill A. Shutemov
2013-07-15 10:47 ` [PATCH 7/8] thp: do_huge_pmd_anonymous_page() cleanup Kirill A. Shutemov
2013-07-15 10:47   ` Kirill A. Shutemov
2013-07-15 10:47 ` [PATCH 8/8] thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page() Kirill A. Shutemov
2013-07-15 10:47   ` Kirill A. Shutemov
  -- strict thread matches above, loose matches on Subject: below --
2013-06-11 15:35 [PATCH 0/8] Transparent huge page cache: phase 0, prep work Kirill A. Shutemov
2013-06-11 15:35 ` [PATCH 5/8] thp, mm: locking tail page is a bug Kirill A. Shutemov
2013-06-11 15:35   ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.