linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7 v1] Speed up page cache truncation
@ 2017-10-10 15:19 Jan Kara
  2017-10-10 15:19 ` [PATCH 1/7] mm: Speedup cancel_dirty_page() for clean pages Jan Kara
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-10 15:19 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Kirill A. Shutemov, linux-fsdevel, Jan Kara

Hello,

when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we
have noticed a regression in bonnie++ benchmark when deleting files.
Eventually we have tracked this down to a fact that page cache truncation got
slower by about 10%. There were both gains and losses in the above interval of
kernels but we have been able to identify that commit 83929372f629 "filemap:
prepare find and delete operations for huge pages" caused about 10% regression
on its own.

After some investigation it didn't seem easily possible to fix the regression
while maintaining the THP in page cache functionality so we've decided to
optimize the page cache truncation path instead to make up for the change.
This series is a result of that effort.

Patch 1 is an easy speedup of cancel_dirty_page(). Patches 2-6 refactor page
cache truncation code so that it is easier to batch radix tree operations.
Patch 7 implements batching of deletes from the radix tree which more than
makes up for the original regression.

What do people think about this series?

								Honza

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/7] mm: Speedup cancel_dirty_page() for clean pages
  2017-10-10 15:19 [PATCH 0/7 v1] Speed up page cache truncation Jan Kara
@ 2017-10-10 15:19 ` Jan Kara
  2017-10-10 15:19 ` [PATCH 2/7] mm: Refactor truncate_complete_page() Jan Kara
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-10 15:19 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Kirill A. Shutemov, linux-fsdevel, Jan Kara

cancel_dirty_page() does quite some work even for clean pages (fetching
of mapping, locking of memcg, atomic bit op on page flags) so it
accounts for ~2.5% of cost of truncation of a clean page. That is not
much but still dumb for something we don't need at all. Check whether
a page is actually dirty and avoid any work if not.

Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h  | 8 +++++++-
 mm/page-writeback.c | 4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 065d99deb847..d14a9bb2a3d7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1430,7 +1430,13 @@ void account_page_cleaned(struct page *page, struct address_space *mapping,
 			  struct bdi_writeback *wb);
 int set_page_dirty(struct page *page);
 int set_page_dirty_lock(struct page *page);
-void cancel_dirty_page(struct page *page);
+void __cancel_dirty_page(struct page *page);
+static inline void cancel_dirty_page(struct page *page)
+{
+	/* Avoid atomic ops, locking, etc. when not actually needed. */
+	if (PageDirty(page))
+		__cancel_dirty_page(page);
+}
 int clear_page_dirty_for_io(struct page *page);
 
 int get_cmdline(struct task_struct *task, char *buffer, int buflen);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0b9c5cbe8eba..c3bed3f5cd24 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2623,7 +2623,7 @@ EXPORT_SYMBOL(set_page_dirty_lock);
  * page without actually doing it through the VM. Can you say "ext3 is
  * horribly ugly"? Thought you could.
  */
-void cancel_dirty_page(struct page *page)
+void __cancel_dirty_page(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 
@@ -2644,7 +2644,7 @@ void cancel_dirty_page(struct page *page)
 		ClearPageDirty(page);
 	}
 }
-EXPORT_SYMBOL(cancel_dirty_page);
+EXPORT_SYMBOL(__cancel_dirty_page);
 
 /*
  * Clear a page's dirty flag, while caring for dirty memory accounting.
-- 
2.12.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/7] mm: Refactor truncate_complete_page()
  2017-10-10 15:19 [PATCH 0/7 v1] Speed up page cache truncation Jan Kara
  2017-10-10 15:19 ` [PATCH 1/7] mm: Speedup cancel_dirty_page() for clean pages Jan Kara
@ 2017-10-10 15:19 ` Jan Kara
  2017-10-10 15:19 ` [PATCH 3/7] mm: Factor out page cache page freeing into a separate function Jan Kara
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-10 15:19 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Kirill A. Shutemov, linux-fsdevel, Jan Kara

Move call of delete_from_page_cache() and page->mapping check out of
truncate_complete_page() into the single caller - truncate_inode_page().
Also move page_mapped() check into truncate_complete_page(). That way it
will be easier to batch operations. Also rename truncate_complete_page()
to truncate_cleanup_page().

Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/truncate.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 2330223841fb..383a530d511e 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -134,11 +134,17 @@ void do_invalidatepage(struct page *page, unsigned int offset,
  * its lock, b) when a concurrent invalidate_mapping_pages got there first and
  * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
  */
-static int
-truncate_complete_page(struct address_space *mapping, struct page *page)
+static void
+truncate_cleanup_page(struct address_space *mapping, struct page *page)
 {
-	if (page->mapping != mapping)
-		return -EIO;
+	if (page_mapped(page)) {
+		loff_t holelen;
+
+		holelen = PageTransHuge(page) ? HPAGE_PMD_SIZE : PAGE_SIZE;
+		unmap_mapping_range(mapping,
+				   (loff_t)page->index << PAGE_SHIFT,
+				   holelen, 0);
+	}
 
 	if (page_has_private(page))
 		do_invalidatepage(page, 0, PAGE_SIZE);
@@ -150,8 +156,6 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
 	 */
 	cancel_dirty_page(page);
 	ClearPageMappedToDisk(page);
-	delete_from_page_cache(page);
-	return 0;
 }
 
 /*
@@ -180,16 +184,14 @@ invalidate_complete_page(struct address_space *mapping, struct page *page)
 
 int truncate_inode_page(struct address_space *mapping, struct page *page)
 {
-	loff_t holelen;
 	VM_BUG_ON_PAGE(PageTail(page), page);
 
-	holelen = PageTransHuge(page) ? HPAGE_PMD_SIZE : PAGE_SIZE;
-	if (page_mapped(page)) {
-		unmap_mapping_range(mapping,
-				   (loff_t)page->index << PAGE_SHIFT,
-				   holelen, 0);
-	}
-	return truncate_complete_page(mapping, page);
+	if (page->mapping != mapping)
+		return -EIO;
+
+	truncate_cleanup_page(mapping, page);
+	delete_from_page_cache(page);
+	return 0;
 }
 
 /*
-- 
2.12.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/7] mm: Factor out page cache page freeing into a separate function
  2017-10-10 15:19 [PATCH 0/7 v1] Speed up page cache truncation Jan Kara
  2017-10-10 15:19 ` [PATCH 1/7] mm: Speedup cancel_dirty_page() for clean pages Jan Kara
  2017-10-10 15:19 ` [PATCH 2/7] mm: Refactor truncate_complete_page() Jan Kara
@ 2017-10-10 15:19 ` Jan Kara
  2017-10-10 15:19 ` [PATCH 4/7] mm: Move accounting updates before page_cache_tree_delete() Jan Kara
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-10 15:19 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Kirill A. Shutemov, linux-fsdevel, Jan Kara

Factor out page freeing from delete_from_page_cache() into a separate
function. We will need to call the same when batching pagecache deletion
operations.

invalidate_complete_page2() and replace_page_cache_page() might want to
call this function as well however they currently don't seem to handle
THPs so it's unnecessary for them to take the hit of checking whether
a page is THP or not.

Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index cf74d0dacc6a..cdb44dacabd2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -254,6 +254,23 @@ void __delete_from_page_cache(struct page *page, void *shadow)
 		account_page_cleaned(page, mapping, inode_to_wb(mapping->host));
 }
 
+static void page_cache_free_page(struct address_space *mapping,
+				struct page *page)
+{
+	void (*freepage)(struct page *);
+
+	freepage = mapping->a_ops->freepage;
+	if (freepage)
+		freepage(page);
+
+	if (PageTransHuge(page) && !PageHuge(page)) {
+		page_ref_sub(page, HPAGE_PMD_NR);
+		VM_BUG_ON_PAGE(page_count(page) <= 0, page);
+	} else {
+		put_page(page);
+	}
+}
+
 /**
  * delete_from_page_cache - delete page from page cache
  * @page: the page which the kernel is trying to remove from page cache
@@ -266,25 +283,13 @@ void delete_from_page_cache(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 	unsigned long flags;
-	void (*freepage)(struct page *);
 
 	BUG_ON(!PageLocked(page));
-
-	freepage = mapping->a_ops->freepage;
-
 	spin_lock_irqsave(&mapping->tree_lock, flags);
 	__delete_from_page_cache(page, NULL);
 	spin_unlock_irqrestore(&mapping->tree_lock, flags);
 
-	if (freepage)
-		freepage(page);
-
-	if (PageTransHuge(page) && !PageHuge(page)) {
-		page_ref_sub(page, HPAGE_PMD_NR);
-		VM_BUG_ON_PAGE(page_count(page) <= 0, page);
-	} else {
-		put_page(page);
-	}
+	page_cache_free_page(mapping, page);
 }
 EXPORT_SYMBOL(delete_from_page_cache);
 
-- 
2.12.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/7] mm: Move accounting updates before page_cache_tree_delete()
  2017-10-10 15:19 [PATCH 0/7 v1] Speed up page cache truncation Jan Kara
                   ` (2 preceding siblings ...)
  2017-10-10 15:19 ` [PATCH 3/7] mm: Factor out page cache page freeing into a separate function Jan Kara
@ 2017-10-10 15:19 ` Jan Kara
  2017-10-10 15:19 ` [PATCH 5/7] mm: Move clearing of page->mapping to page_cache_tree_delete() Jan Kara
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-10 15:19 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Kirill A. Shutemov, linux-fsdevel, Jan Kara

Move updates of various counters before page_cache_tree_delete() call.
It will be easier to batch things this way and there is no difference
whether the counters get updated before or after removal from the radix
tree.

Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c | 49 +++++++++++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index cdb44dacabd2..c58ccd26bbe6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -224,34 +224,35 @@ void __delete_from_page_cache(struct page *page, void *shadow)
 		}
 	}
 
-	page_cache_tree_delete(mapping, page, shadow);
-
-	page->mapping = NULL;
-	/* Leave page->index set: truncation lookup relies upon it */
-
 	/* hugetlb pages do not participate in page cache accounting. */
-	if (PageHuge(page))
-		return;
+	if (!PageHuge(page)) {
+		__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
+		if (PageSwapBacked(page)) {
+			__mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr);
+			if (PageTransHuge(page))
+				__dec_node_page_state(page, NR_SHMEM_THPS);
+		} else {
+			VM_BUG_ON_PAGE(PageTransHuge(page), page);
+		}
 
-	__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
-	if (PageSwapBacked(page)) {
-		__mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr);
-		if (PageTransHuge(page))
-			__dec_node_page_state(page, NR_SHMEM_THPS);
-	} else {
-		VM_BUG_ON_PAGE(PageTransHuge(page), page);
+		/*
+		 * At this point page must be either written or cleaned by
+		 * truncate.  Dirty page here signals a bug and loss of
+		 * unwritten data.
+		 *
+		 * This fixes dirty accounting after removing the page entirely
+		 * but leaves PageDirty set: it has no effect for truncated
+		 * page and anyway will be cleared before returning page into
+		 * buddy allocator.
+		 */
+		if (WARN_ON_ONCE(PageDirty(page)))
+			account_page_cleaned(page, mapping,
+					     inode_to_wb(mapping->host));
 	}
+	page_cache_tree_delete(mapping, page, shadow);
 
-	/*
-	 * At this point page must be either written or cleaned by truncate.
-	 * Dirty page here signals a bug and loss of unwritten data.
-	 *
-	 * This fixes dirty accounting after removing the page entirely but
-	 * leaves PageDirty set: it has no effect for truncated page and
-	 * anyway will be cleared before returning page into buddy allocator.
-	 */
-	if (WARN_ON_ONCE(PageDirty(page)))
-		account_page_cleaned(page, mapping, inode_to_wb(mapping->host));
+	page->mapping = NULL;
+	/* Leave page->index set: truncation lookup relies upon it */
 }
 
 static void page_cache_free_page(struct address_space *mapping,
-- 
2.12.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/7] mm: Move clearing of page->mapping to page_cache_tree_delete()
  2017-10-10 15:19 [PATCH 0/7 v1] Speed up page cache truncation Jan Kara
                   ` (3 preceding siblings ...)
  2017-10-10 15:19 ` [PATCH 4/7] mm: Move accounting updates before page_cache_tree_delete() Jan Kara
@ 2017-10-10 15:19 ` Jan Kara
  2017-10-10 15:19 ` [PATCH 6/7] mm: Factor out checks and accounting from __delete_from_page_cache() Jan Kara
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-10 15:19 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Kirill A. Shutemov, linux-fsdevel, Jan Kara

Clearing of page->mapping makes sense in page_cache_tree_delete() as
well and it will help us with batching things this way.

Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index c58ccd26bbe6..c866a84bd45c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -165,6 +165,9 @@ static void page_cache_tree_delete(struct address_space *mapping,
 				     workingset_update_node, mapping);
 	}
 
+	page->mapping = NULL;
+	/* Leave page->index set: truncation lookup relies upon it */
+
 	if (shadow) {
 		mapping->nrexceptional += nr;
 		/*
@@ -250,9 +253,6 @@ void __delete_from_page_cache(struct page *page, void *shadow)
 					     inode_to_wb(mapping->host));
 	}
 	page_cache_tree_delete(mapping, page, shadow);
-
-	page->mapping = NULL;
-	/* Leave page->index set: truncation lookup relies upon it */
 }
 
 static void page_cache_free_page(struct address_space *mapping,
-- 
2.12.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/7] mm: Factor out checks and accounting from __delete_from_page_cache()
  2017-10-10 15:19 [PATCH 0/7 v1] Speed up page cache truncation Jan Kara
                   ` (4 preceding siblings ...)
  2017-10-10 15:19 ` [PATCH 5/7] mm: Move clearing of page->mapping to page_cache_tree_delete() Jan Kara
@ 2017-10-10 15:19 ` Jan Kara
  2017-10-10 15:19 ` [PATCH 7/7] mm: Batch radix tree operations when truncating pages Jan Kara
  2017-10-10 17:25 ` [PATCH 0/7 v1] Speed up page cache truncation Andi Kleen
  7 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-10 15:19 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Kirill A. Shutemov, linux-fsdevel, Jan Kara

Move checks and accounting updates from __delete_from_page_cache() into
a separate function. We will reuse it when batching page cache
truncation operations.

Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c | 72 ++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index c866a84bd45c..6fb01b2404a7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -181,17 +181,11 @@ static void page_cache_tree_delete(struct address_space *mapping,
 	mapping->nrpages -= nr;
 }
 
-/*
- * Delete a page from the page cache and free it. Caller has to make
- * sure the page is locked and that nobody else uses it - or that usage
- * is safe.  The caller must hold the mapping's tree_lock.
- */
-void __delete_from_page_cache(struct page *page, void *shadow)
+static void unaccount_page_cache_page(struct address_space *mapping,
+				      struct page *page)
 {
-	struct address_space *mapping = page->mapping;
-	int nr = hpage_nr_pages(page);
+	int nr;
 
-	trace_mm_filemap_delete_from_page_cache(page);
 	/*
 	 * if we're uptodate, flush out into the cleancache, otherwise
 	 * invalidate any existing cleancache entries.  We can't leave
@@ -228,30 +222,46 @@ void __delete_from_page_cache(struct page *page, void *shadow)
 	}
 
 	/* hugetlb pages do not participate in page cache accounting. */
-	if (!PageHuge(page)) {
-		__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
-		if (PageSwapBacked(page)) {
-			__mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr);
-			if (PageTransHuge(page))
-				__dec_node_page_state(page, NR_SHMEM_THPS);
-		} else {
-			VM_BUG_ON_PAGE(PageTransHuge(page), page);
-		}
+	if (PageHuge(page))
+		return;
 
-		/*
-		 * At this point page must be either written or cleaned by
-		 * truncate.  Dirty page here signals a bug and loss of
-		 * unwritten data.
-		 *
-		 * This fixes dirty accounting after removing the page entirely
-		 * but leaves PageDirty set: it has no effect for truncated
-		 * page and anyway will be cleared before returning page into
-		 * buddy allocator.
-		 */
-		if (WARN_ON_ONCE(PageDirty(page)))
-			account_page_cleaned(page, mapping,
-					     inode_to_wb(mapping->host));
+	nr = hpage_nr_pages(page);
+
+	__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
+	if (PageSwapBacked(page)) {
+		__mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr);
+		if (PageTransHuge(page))
+			__dec_node_page_state(page, NR_SHMEM_THPS);
+	} else {
+		VM_BUG_ON_PAGE(PageTransHuge(page), page);
 	}
+
+	/*
+	 * At this point page must be either written or cleaned by
+	 * truncate.  Dirty page here signals a bug and loss of
+	 * unwritten data.
+	 *
+	 * This fixes dirty accounting after removing the page entirely
+	 * but leaves PageDirty set: it has no effect for truncated
+	 * page and anyway will be cleared before returning page into
+	 * buddy allocator.
+	 */
+	if (WARN_ON_ONCE(PageDirty(page)))
+		account_page_cleaned(page, mapping, inode_to_wb(mapping->host));
+}
+
+/*
+ * Delete a page from the page cache and free it. Caller has to make
+ * sure the page is locked and that nobody else uses it - or that usage
+ * is safe.  The caller must hold the mapping's tree_lock.
+ */
+void __delete_from_page_cache(struct page *page, void *shadow)
+{
+	struct address_space *mapping = page->mapping;
+
+	trace_mm_filemap_delete_from_page_cache(page);
+
+	unaccount_page_cache_page(mapping, page);
 	page_cache_tree_delete(mapping, page, shadow);
 }
 
-- 
2.12.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 7/7] mm: Batch radix tree operations when truncating pages
  2017-10-10 15:19 [PATCH 0/7 v1] Speed up page cache truncation Jan Kara
                   ` (5 preceding siblings ...)
  2017-10-10 15:19 ` [PATCH 6/7] mm: Factor out checks and accounting from __delete_from_page_cache() Jan Kara
@ 2017-10-10 15:19 ` Jan Kara
  2017-10-11  7:39   ` Mel Gorman
  2017-10-17 23:05   ` Andrew Morton
  2017-10-10 17:25 ` [PATCH 0/7 v1] Speed up page cache truncation Andi Kleen
  7 siblings, 2 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-10 15:19 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Kirill A. Shutemov, linux-fsdevel, Jan Kara

Currently we remove pages from the radix tree one by one. To speed up
page cache truncation, lock several pages at once and free them in one
go. This allows us to batch radix tree operations in a more efficient
way and also save round-trips on mapping->tree_lock. As a result we gain
about 20% speed improvement in page cache truncation.

Data from a simple benchmark timing 10000 truncates of 1024 pages (on
ext4 on ramdisk but the filesystem is barely visible in the profiles).
The range shows 1% and 95% percentiles of the measured times:

4.14-rc2	4.14-rc2 + batched truncation
248-256		209-219
249-258		209-217
248-255		211-239
248-255		209-217
247-256		210-218

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/pagemap.h |  2 ++
 mm/filemap.c            | 84 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/truncate.c           | 20 ++++++++++--
 3 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 75cd074a23b4..e857c62aef06 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -623,6 +623,8 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 extern void delete_from_page_cache(struct page *page);
 extern void __delete_from_page_cache(struct page *page, void *shadow);
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
+void delete_from_page_cache_batch(struct address_space *mapping, int count,
+				  struct page **pages);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
diff --git a/mm/filemap.c b/mm/filemap.c
index 6fb01b2404a7..38fc390d51a2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -304,6 +304,90 @@ void delete_from_page_cache(struct page *page)
 }
 EXPORT_SYMBOL(delete_from_page_cache);
 
+/*
+ * page_cache_tree_delete_batch - delete several pages from page cache
+ * @mapping: the mapping to which pages belong
+ * @count: the number of pages to delete
+ * @pages: pages that should be deleted
+ *
+ * The function walks over mapping->page_tree and removes pages passed in
+ * @pages array from the radix tree. The function expects @pages array to
+ * sorted by page index. It tolerates holes in @pages array (radix tree
+ * entries at those indices are not modified). The function expects only THP
+ * head pages to be present in the @pages array and takes care to delete all
+ * corresponding tail pages from the radix tree as well.
+ *
+ * The function expects mapping->tree_lock to be held.
+ */
+static void
+page_cache_tree_delete_batch(struct address_space *mapping, int count,
+			     struct page **pages)
+{
+	struct radix_tree_iter iter;
+	void **slot;
+	int total_pages = 0;
+	int i = 0, tail_pages = 0;
+	struct page *page;
+	pgoff_t start;
+
+	start = pages[0]->index;
+	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+		if (i >= count && !tail_pages)
+			break;
+		page = radix_tree_deref_slot_protected(slot,
+						       &mapping->tree_lock);
+		if (radix_tree_exceptional_entry(page))
+			continue;
+		if (!tail_pages) {
+			/*
+			 * Some page got inserted in our range? Skip it. We
+			 * have our pages locked so they are protected from
+			 * being removed.
+			 */
+			if (page != pages[i])
+				continue;
+			WARN_ON_ONCE(!PageLocked(page));
+			if (PageTransHuge(page) && !PageHuge(page))
+				tail_pages = HPAGE_PMD_NR - 1;
+			page->mapping = NULL;
+			/*
+			 * Leave page->index set: truncation lookup relies
+			 * upon it
+			 */
+			i++;
+		} else {
+			tail_pages--;
+		}
+		radix_tree_clear_tags(&mapping->page_tree, iter.node, slot);
+		__radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL,
+				     workingset_update_node, mapping);
+		total_pages++;
+	}
+	mapping->nrpages -= total_pages;
+}
+
+void delete_from_page_cache_batch(struct address_space *mapping, int count,
+				  struct page **pages)
+{
+	int i;
+	unsigned long flags;
+
+	if (!count)
+		return;
+
+	spin_lock_irqsave(&mapping->tree_lock, flags);
+	for (i = 0; i < count; i++) {
+		trace_mm_filemap_delete_from_page_cache(pages[i]);
+
+		unaccount_page_cache_page(mapping, pages[i]);
+	}
+	page_cache_tree_delete_batch(mapping, count, pages);
+	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+
+	for (i = 0; i < count; i++)
+		page_cache_free_page(mapping, pages[i]);
+}
+
 int filemap_check_errors(struct address_space *mapping)
 {
 	int ret = 0;
diff --git a/mm/truncate.c b/mm/truncate.c
index 383a530d511e..3dfa2d5e642e 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -294,6 +294,14 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE),
 			indices)) {
+		/*
+		 * Pagevec array has exceptional entries and we may also fail
+		 * to lock some pages. So we store pages that can be deleted
+		 * in an extra array.
+		 */
+		struct page *pages[PAGEVEC_SIZE];
+		int batch_count = 0;
+
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -315,9 +323,17 @@ void truncate_inode_pages_range(struct address_space *mapping,
 				unlock_page(page);
 				continue;
 			}
-			truncate_inode_page(mapping, page);
-			unlock_page(page);
+			if (page->mapping != mapping) {
+				unlock_page(page);
+				continue;
+			}
+			pages[batch_count++] = page;
 		}
+		for (i = 0; i < batch_count; i++)
+			truncate_cleanup_page(mapping, pages[i]);
+		delete_from_page_cache_batch(mapping, batch_count, pages);
+		for (i = 0; i < batch_count; i++)
+			unlock_page(pages[i]);
 		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		cond_resched();
-- 
2.12.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-10 15:19 [PATCH 0/7 v1] Speed up page cache truncation Jan Kara
                   ` (6 preceding siblings ...)
  2017-10-10 15:19 ` [PATCH 7/7] mm: Batch radix tree operations when truncating pages Jan Kara
@ 2017-10-10 17:25 ` Andi Kleen
  2017-10-11  8:06   ` Jan Kara
  7 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2017-10-10 17:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-mm, Mel Gorman, Kirill A. Shutemov, linux-fsdevel

Jan Kara <jack@suse.cz> writes:

> when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we
> have noticed a regression in bonnie++ benchmark when deleting files.
> Eventually we have tracked this down to a fact that page cache truncation got
> slower by about 10%. There were both gains and losses in the above interval of
> kernels but we have been able to identify that commit 83929372f629 "filemap:
> prepare find and delete operations for huge pages" caused about 10% regression
> on its own.

It's odd that just checking if some pages are huge should be that
expensive, but ok ..
>
> Patch 1 is an easy speedup of cancel_dirty_page(). Patches 2-6 refactor page
> cache truncation code so that it is easier to batch radix tree operations.
> Patch 7 implements batching of deletes from the radix tree which more than
> makes up for the original regression.
>
> What do people think about this series?

Batching locks is always a good idea. You'll likely see far more benefits
under lock contention on larger systems.

>From a quick read it looks good to me.

Reviewed-by: Andi Kleen <ak@linux.intel.com>


-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/7] mm: Batch radix tree operations when truncating pages
  2017-10-10 15:19 ` [PATCH 7/7] mm: Batch radix tree operations when truncating pages Jan Kara
@ 2017-10-11  7:39   ` Mel Gorman
  2017-10-17 23:05   ` Andrew Morton
  1 sibling, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2017-10-11  7:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-mm, Kirill A. Shutemov, linux-fsdevel

On Tue, Oct 10, 2017 at 05:19:37PM +0200, Jan Kara wrote:
> Currently we remove pages from the radix tree one by one. To speed up
> page cache truncation, lock several pages at once and free them in one
> go. This allows us to batch radix tree operations in a more efficient
> way and also save round-trips on mapping->tree_lock. As a result we gain
> about 20% speed improvement in page cache truncation.
> 
> Data from a simple benchmark timing 10000 truncates of 1024 pages (on
> ext4 on ramdisk but the filesystem is barely visible in the profiles).
> The range shows 1% and 95% percentiles of the measured times:
> 
> 4.14-rc2	4.14-rc2 + batched truncation
> 248-256		209-219
> 249-258		209-217
> 248-255		211-239
> 248-255		209-217
> 247-256		210-218
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-10 17:25 ` [PATCH 0/7 v1] Speed up page cache truncation Andi Kleen
@ 2017-10-11  8:06   ` Jan Kara
  2017-10-11 16:51     ` Andi Kleen
  2017-10-11 17:34     ` Dave Hansen
  0 siblings, 2 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-11  8:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jan Kara, linux-mm, Mel Gorman, Kirill A. Shutemov, linux-fsdevel

On Tue 10-10-17 10:25:13, Andi Kleen wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> > when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we
> > have noticed a regression in bonnie++ benchmark when deleting files.
> > Eventually we have tracked this down to a fact that page cache truncation got
> > slower by about 10%. There were both gains and losses in the above interval of
> > kernels but we have been able to identify that commit 83929372f629 "filemap:
> > prepare find and delete operations for huge pages" caused about 10% regression
> > on its own.
> 
> It's odd that just checking if some pages are huge should be that
> expensive, but ok ..

Yeah, I was surprised as well but profiles were pretty clear on this - part
of the slowdown was caused by loads of page->_compound_head (PageTail()
and page_compound() use that) which we previously didn't have to load at
all, part was in hpage_nr_pages() function and its use.

> > Patch 1 is an easy speedup of cancel_dirty_page(). Patches 2-6 refactor page
> > cache truncation code so that it is easier to batch radix tree operations.
> > Patch 7 implements batching of deletes from the radix tree which more than
> > makes up for the original regression.
> >
> > What do people think about this series?
> 
> Batching locks is always a good idea. You'll likely see far more benefits
> under lock contention on larger systems.
> 
> From a quick read it looks good to me.
> 
> Reviewed-by: Andi Kleen <ak@linux.intel.com>

Thanks for having a look!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-11  8:06   ` Jan Kara
@ 2017-10-11 16:51     ` Andi Kleen
  2017-10-11 17:34     ` Dave Hansen
  1 sibling, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2017-10-11 16:51 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-mm, Mel Gorman, Kirill A. Shutemov, linux-fsdevel

> > It's odd that just checking if some pages are huge should be that
> > expensive, but ok ..
> 
> Yeah, I was surprised as well but profiles were pretty clear on this - part
> of the slowdown was caused by loads of page->_compound_head (PageTail()
> and page_compound() use that) which we previously didn't have to load at
> all, part was in hpage_nr_pages() function and its use.

A strategic early prefetch may help.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-11  8:06   ` Jan Kara
  2017-10-11 16:51     ` Andi Kleen
@ 2017-10-11 17:34     ` Dave Hansen
  2017-10-11 17:59       ` Mel Gorman
  2017-10-11 21:06       ` Jan Kara
  1 sibling, 2 replies; 22+ messages in thread
From: Dave Hansen @ 2017-10-11 17:34 UTC (permalink / raw)
  To: Jan Kara, Andi Kleen
  Cc: linux-mm, Mel Gorman, Kirill A. Shutemov, linux-fsdevel

On 10/11/2017 01:06 AM, Jan Kara wrote:
>>> when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we
>>> have noticed a regression in bonnie++ benchmark when deleting files.
>>> Eventually we have tracked this down to a fact that page cache truncation got
>>> slower by about 10%. There were both gains and losses in the above interval of
>>> kernels but we have been able to identify that commit 83929372f629 "filemap:
>>> prepare find and delete operations for huge pages" caused about 10% regression
>>> on its own.
>> It's odd that just checking if some pages are huge should be that
>> expensive, but ok ..
> Yeah, I was surprised as well but profiles were pretty clear on this - part
> of the slowdown was caused by loads of page->_compound_head (PageTail()
> and page_compound() use that) which we previously didn't have to load at
> all, part was in hpage_nr_pages() function and its use.

Well, page->_compound_head is part of the same cacheline as the rest of
the page, and the page is surely getting touched during truncation at
_some_ point.  The hpage_nr_pages() might cause the cacheline to get
loaded earlier than before, but I can't imagine that it's that expensive.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-11 17:34     ` Dave Hansen
@ 2017-10-11 17:59       ` Mel Gorman
  2017-10-11 18:37         ` Andi Kleen
  2017-10-11 21:06       ` Jan Kara
  1 sibling, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2017-10-11 17:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Jan Kara, Andi Kleen, linux-mm, Kirill A. Shutemov, linux-fsdevel

On Wed, Oct 11, 2017 at 10:34:47AM -0700, Dave Hansen wrote:
> On 10/11/2017 01:06 AM, Jan Kara wrote:
> >>> when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we
> >>> have noticed a regression in bonnie++ benchmark when deleting files.
> >>> Eventually we have tracked this down to a fact that page cache truncation got
> >>> slower by about 10%. There were both gains and losses in the above interval of
> >>> kernels but we have been able to identify that commit 83929372f629 "filemap:
> >>> prepare find and delete operations for huge pages" caused about 10% regression
> >>> on its own.
> >> It's odd that just checking if some pages are huge should be that
> >> expensive, but ok ..
> > Yeah, I was surprised as well but profiles were pretty clear on this - part
> > of the slowdown was caused by loads of page->_compound_head (PageTail()
> > and page_compound() use that) which we previously didn't have to load at
> > all, part was in hpage_nr_pages() function and its use.
> 
> Well, page->_compound_head is part of the same cacheline as the rest of
> the page, and the page is surely getting touched during truncation at
> _some_ point.  The hpage_nr_pages() might cause the cacheline to get
> loaded earlier than before, but I can't imagine that it's that expensive.

Profiles appear to disagree but regardless of the explanation, the fact
is that the series improves truncation quite a bit on my tests. From three
separate machines running bonnie, I see the following gains.

                                      4.14.0-rc4             4.14.0-rc4
                                         vanilla          janbatch-v1r1
Hmean     SeqCreate del      21313.45 (   0.00%)    24963.95 (  17.13%)
Hmean     RandCreate del     19974.03 (   0.00%)    23377.66 (  17.04%)

                                      4.14.0-rc4             4.14.0-rc4
                                         vanilla          janbatch-v1r1
Hmean     SeqCreate del       4408.80 (   0.00%)     5074.91 (  15.11%)
Hmean     RandCreate del      4161.52 (   0.00%)     4879.15 (  17.24%)

                                      4.14.0-rc4             4.14.0-rc4
                                         vanilla          janbatch-v1r1
Hmean     SeqCreate del      11639.73 (   0.00%)    13648.20 (  17.26%)
Hmean     RandCreate del     10979.90 (   0.00%)    12818.99 (  16.75%)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-11 17:59       ` Mel Gorman
@ 2017-10-11 18:37         ` Andi Kleen
  0 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2017-10-11 18:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Dave Hansen, Jan Kara, linux-mm, Kirill A. Shutemov, linux-fsdevel

> Profiles appear to disagree but regardless of the explanation, the fact
> is that the series improves truncation quite a bit on my tests. From three
> separate machines running bonnie, I see the following gains.

The batching patches are a good idea in any case. I was just wondering if we could
get rid of the original 10% too. But it's not critical.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-11 17:34     ` Dave Hansen
  2017-10-11 17:59       ` Mel Gorman
@ 2017-10-11 21:06       ` Jan Kara
  2017-10-11 21:24         ` Dave Chinner
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-10-11 21:06 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Jan Kara, Andi Kleen, linux-mm, Mel Gorman, Kirill A. Shutemov,
	linux-fsdevel

On Wed 11-10-17 10:34:47, Dave Hansen wrote:
> On 10/11/2017 01:06 AM, Jan Kara wrote:
> >>> when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we
> >>> have noticed a regression in bonnie++ benchmark when deleting files.
> >>> Eventually we have tracked this down to a fact that page cache truncation got
> >>> slower by about 10%. There were both gains and losses in the above interval of
> >>> kernels but we have been able to identify that commit 83929372f629 "filemap:
> >>> prepare find and delete operations for huge pages" caused about 10% regression
> >>> on its own.
> >> It's odd that just checking if some pages are huge should be that
> >> expensive, but ok ..
> > Yeah, I was surprised as well but profiles were pretty clear on this - part
> > of the slowdown was caused by loads of page->_compound_head (PageTail()
> > and page_compound() use that) which we previously didn't have to load at
> > all, part was in hpage_nr_pages() function and its use.
> 
> Well, page->_compound_head is part of the same cacheline as the rest of
> the page, and the page is surely getting touched during truncation at
> _some_ point.  The hpage_nr_pages() might cause the cacheline to get
> loaded earlier than before, but I can't imagine that it's that expensive.

Then my intuition matches yours ;) but profiles disagree. That being said
I'm not really expert in CPU microoptimizations and profiling so feel free
to gather perf profiles yourself before and after commit 83929372f629 and
get better explanation of where the cost is - I would be really curious
what you come up with because the explanation I have disagrees with my
intuition as well...
								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-11 21:06       ` Jan Kara
@ 2017-10-11 21:24         ` Dave Chinner
  2017-10-12  9:09           ` Mel Gorman
  2017-10-12 14:07           ` Jan Kara
  0 siblings, 2 replies; 22+ messages in thread
From: Dave Chinner @ 2017-10-11 21:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dave Hansen, Andi Kleen, linux-mm, Mel Gorman,
	Kirill A. Shutemov, linux-fsdevel

On Wed, Oct 11, 2017 at 11:06:13PM +0200, Jan Kara wrote:
> On Wed 11-10-17 10:34:47, Dave Hansen wrote:
> > On 10/11/2017 01:06 AM, Jan Kara wrote:
> > >>> when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we
> > >>> have noticed a regression in bonnie++ benchmark when deleting files.
> > >>> Eventually we have tracked this down to a fact that page cache truncation got
> > >>> slower by about 10%. There were both gains and losses in the above interval of
> > >>> kernels but we have been able to identify that commit 83929372f629 "filemap:
> > >>> prepare find and delete operations for huge pages" caused about 10% regression
> > >>> on its own.
> > >> It's odd that just checking if some pages are huge should be that
> > >> expensive, but ok ..
> > > Yeah, I was surprised as well but profiles were pretty clear on this - part
> > > of the slowdown was caused by loads of page->_compound_head (PageTail()
> > > and page_compound() use that) which we previously didn't have to load at
> > > all, part was in hpage_nr_pages() function and its use.
> > 
> > Well, page->_compound_head is part of the same cacheline as the rest of
> > the page, and the page is surely getting touched during truncation at
> > _some_ point.  The hpage_nr_pages() might cause the cacheline to get
> > loaded earlier than before, but I can't imagine that it's that expensive.
> 
> Then my intuition matches yours ;) but profiles disagree.

Do you get the same benefit across different filesystems?

> That being said
> I'm not really expert in CPU microoptimizations and profiling so feel free
> to gather perf profiles yourself before and after commit 83929372f629 and
> get better explanation of where the cost is - I would be really curious
> what you come up with because the explanation I have disagrees with my
> intuition as well...

When I see this sort of stuff my immediate thought is "what is the
change in the icache footprint of the hot codepath"? There's a
few IO benchmarks (e.g. IOZone) that are l1/l2 cache footprint
sensitive on XFS, and can see up to 10% differences in performance
from kernel build to kernel build that have no code changes in the
IO paths or l1/l2 dcache footprint.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-11 21:24         ` Dave Chinner
@ 2017-10-12  9:09           ` Mel Gorman
  2017-10-12 14:07           ` Jan Kara
  1 sibling, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2017-10-12  9:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Dave Hansen, Andi Kleen, linux-mm, Kirill A. Shutemov,
	linux-fsdevel

On Thu, Oct 12, 2017 at 08:24:01AM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 11:06:13PM +0200, Jan Kara wrote:
> > On Wed 11-10-17 10:34:47, Dave Hansen wrote:
> > > On 10/11/2017 01:06 AM, Jan Kara wrote:
> > > >>> when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we
> > > >>> have noticed a regression in bonnie++ benchmark when deleting files.
> > > >>> Eventually we have tracked this down to a fact that page cache truncation got
> > > >>> slower by about 10%. There were both gains and losses in the above interval of
> > > >>> kernels but we have been able to identify that commit 83929372f629 "filemap:
> > > >>> prepare find and delete operations for huge pages" caused about 10% regression
> > > >>> on its own.
> > > >> It's odd that just checking if some pages are huge should be that
> > > >> expensive, but ok ..
> > > > Yeah, I was surprised as well but profiles were pretty clear on this - part
> > > > of the slowdown was caused by loads of page->_compound_head (PageTail()
> > > > and page_compound() use that) which we previously didn't have to load at
> > > > all, part was in hpage_nr_pages() function and its use.
> > > 
> > > Well, page->_compound_head is part of the same cacheline as the rest of
> > > the page, and the page is surely getting touched during truncation at
> > > _some_ point.  The hpage_nr_pages() might cause the cacheline to get
> > > loaded earlier than before, but I can't imagine that it's that expensive.
> > 
> > Then my intuition matches yours ;) but profiles disagree.
> 
> Do you get the same benefit across different filesystems?
> 

I don't know about Jan's testing but benefit is different on XFS.
Unfortunately, only one machine I was using for testing a follow-on series
covered XFS but still;

bonnie
                                      4.14.0-rc4             4.14.0-rc4
                                         vanilla          janbatch-v1r1
Hmean     SeqCreate del      17164.80 (   0.00%)    18638.45 (   8.59%)
Hmean     RandCreate del     15025.81 (   0.00%)    16485.69 (   9.72%)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7 v1] Speed up page cache truncation
  2017-10-11 21:24         ` Dave Chinner
  2017-10-12  9:09           ` Mel Gorman
@ 2017-10-12 14:07           ` Jan Kara
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-12 14:07 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Dave Hansen, Andi Kleen, linux-mm, Mel Gorman,
	Kirill A. Shutemov, linux-fsdevel

On Thu 12-10-17 08:24:01, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 11:06:13PM +0200, Jan Kara wrote:
> > On Wed 11-10-17 10:34:47, Dave Hansen wrote:
> > > On 10/11/2017 01:06 AM, Jan Kara wrote:
> > > >>> when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we
> > > >>> have noticed a regression in bonnie++ benchmark when deleting files.
> > > >>> Eventually we have tracked this down to a fact that page cache truncation got
> > > >>> slower by about 10%. There were both gains and losses in the above interval of
> > > >>> kernels but we have been able to identify that commit 83929372f629 "filemap:
> > > >>> prepare find and delete operations for huge pages" caused about 10% regression
> > > >>> on its own.
> > > >> It's odd that just checking if some pages are huge should be that
> > > >> expensive, but ok ..
> > > > Yeah, I was surprised as well but profiles were pretty clear on this - part
> > > > of the slowdown was caused by loads of page->_compound_head (PageTail()
> > > > and page_compound() use that) which we previously didn't have to load at
> > > > all, part was in hpage_nr_pages() function and its use.
> > > 
> > > Well, page->_compound_head is part of the same cacheline as the rest of
> > > the page, and the page is surely getting touched during truncation at
> > > _some_ point.  The hpage_nr_pages() might cause the cacheline to get
> > > loaded earlier than before, but I can't imagine that it's that expensive.
> > 
> > Then my intuition matches yours ;) but profiles disagree.
> 
> Do you get the same benefit across different filesystems?

Mel has answered this already.

> > That being said
> > I'm not really expert in CPU microoptimizations and profiling so feel free
> > to gather perf profiles yourself before and after commit 83929372f629 and
> > get better explanation of where the cost is - I would be really curious
> > what you come up with because the explanation I have disagrees with my
> > intuition as well...
> 
> When I see this sort of stuff my immediate thought is "what is the
> change in the icache footprint of the hot codepath"? There's a
> few IO benchmarks (e.g. IOZone) that are l1/l2 cache footprint
> sensitive on XFS, and can see up to 10% differences in performance
> from kernel build to kernel build that have no code changes in the
> IO paths or l1/l2 dcache footprint.

Yeah, icache footprint could be part of the reason commit 83929372f629
makes things slower but it definitely isn't the only reason. I have
experimented with modifications of THP handling so that we can discern
normal and THP pages from just looking at page flags (currently we have to
look at both page flags and page->_compound_head) and it did bring about
half of the regression back. But in the end I've discarded that because
those changes were likely to slow down splitting of THPs significantly.

WRT build-to-build variance of the benchmark: I saw build-to-build variance
in the measured truncate times around 2% on that machine. So it is
not negligible but small enough so that I'm confident measured differences
are not just a noise...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/7] mm: Batch radix tree operations when truncating pages
  2017-10-10 15:19 ` [PATCH 7/7] mm: Batch radix tree operations when truncating pages Jan Kara
  2017-10-11  7:39   ` Mel Gorman
@ 2017-10-17 23:05   ` Andrew Morton
  2017-10-18 10:44     ` Jan Kara
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2017-10-17 23:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-mm, Mel Gorman, Kirill A. Shutemov, linux-fsdevel

On Tue, 10 Oct 2017 17:19:37 +0200 Jan Kara <jack@suse.cz> wrote:

> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -294,6 +294,14 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
>  			min(end - index, (pgoff_t)PAGEVEC_SIZE),
>  			indices)) {
> +		/*
> +		 * Pagevec array has exceptional entries and we may also fail
> +		 * to lock some pages. So we store pages that can be deleted
> +		 * in an extra array.
> +		 */
> +		struct page *pages[PAGEVEC_SIZE];
> +		int batch_count = 0;

OK, but we could still use a new pagevec here.  Then
delete_from_page_cache_batch() and page_cache_tree_delete_batch() would
take one less argument.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/7] mm: Batch radix tree operations when truncating pages
  2017-10-17 23:05   ` Andrew Morton
@ 2017-10-18 10:44     ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-18 10:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, linux-mm, Mel Gorman, Kirill A. Shutemov, linux-fsdevel

On Tue 17-10-17 16:05:21, Andrew Morton wrote:
> On Tue, 10 Oct 2017 17:19:37 +0200 Jan Kara <jack@suse.cz> wrote:
> 
> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
> > @@ -294,6 +294,14 @@ void truncate_inode_pages_range(struct address_space *mapping,
> >  	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
> >  			min(end - index, (pgoff_t)PAGEVEC_SIZE),
> >  			indices)) {
> > +		/*
> > +		 * Pagevec array has exceptional entries and we may also fail
> > +		 * to lock some pages. So we store pages that can be deleted
> > +		 * in an extra array.
> > +		 */
> > +		struct page *pages[PAGEVEC_SIZE];
> > +		int batch_count = 0;
> 
> OK, but we could still use a new pagevec here.  Then
> delete_from_page_cache_batch() and page_cache_tree_delete_batch() would
> take one less argument.

Originally, I didn't want to manually construct new pagevec outside of
pagevec code. But now I see there are clean helpers for that. I'll send you
a patch to fold. Thanks for the suggestion!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 5/7] mm: Move clearing of page->mapping to page_cache_tree_delete()
  2017-10-17 16:21 [PATCH 0/7 v2] " Jan Kara
@ 2017-10-17 16:21 ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-17 16:21 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-fsdevel, Mel Gorman, Kirill A. Shutemov, Jan Kara

Clearing of page->mapping makes sense in page_cache_tree_delete() as
well and it will help us with batching things this way.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index c58ccd26bbe6..c866a84bd45c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -165,6 +165,9 @@ static void page_cache_tree_delete(struct address_space *mapping,
 				     workingset_update_node, mapping);
 	}
 
+	page->mapping = NULL;
+	/* Leave page->index set: truncation lookup relies upon it */
+
 	if (shadow) {
 		mapping->nrexceptional += nr;
 		/*
@@ -250,9 +253,6 @@ void __delete_from_page_cache(struct page *page, void *shadow)
 					     inode_to_wb(mapping->host));
 	}
 	page_cache_tree_delete(mapping, page, shadow);
-
-	page->mapping = NULL;
-	/* Leave page->index set: truncation lookup relies upon it */
 }
 
 static void page_cache_free_page(struct address_space *mapping,
-- 
2.12.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-10-18 10:44 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 15:19 [PATCH 0/7 v1] Speed up page cache truncation Jan Kara
2017-10-10 15:19 ` [PATCH 1/7] mm: Speedup cancel_dirty_page() for clean pages Jan Kara
2017-10-10 15:19 ` [PATCH 2/7] mm: Refactor truncate_complete_page() Jan Kara
2017-10-10 15:19 ` [PATCH 3/7] mm: Factor out page cache page freeing into a separate function Jan Kara
2017-10-10 15:19 ` [PATCH 4/7] mm: Move accounting updates before page_cache_tree_delete() Jan Kara
2017-10-10 15:19 ` [PATCH 5/7] mm: Move clearing of page->mapping to page_cache_tree_delete() Jan Kara
2017-10-10 15:19 ` [PATCH 6/7] mm: Factor out checks and accounting from __delete_from_page_cache() Jan Kara
2017-10-10 15:19 ` [PATCH 7/7] mm: Batch radix tree operations when truncating pages Jan Kara
2017-10-11  7:39   ` Mel Gorman
2017-10-17 23:05   ` Andrew Morton
2017-10-18 10:44     ` Jan Kara
2017-10-10 17:25 ` [PATCH 0/7 v1] Speed up page cache truncation Andi Kleen
2017-10-11  8:06   ` Jan Kara
2017-10-11 16:51     ` Andi Kleen
2017-10-11 17:34     ` Dave Hansen
2017-10-11 17:59       ` Mel Gorman
2017-10-11 18:37         ` Andi Kleen
2017-10-11 21:06       ` Jan Kara
2017-10-11 21:24         ` Dave Chinner
2017-10-12  9:09           ` Mel Gorman
2017-10-12 14:07           ` Jan Kara
2017-10-17 16:21 [PATCH 0/7 v2] " Jan Kara
2017-10-17 16:21 ` [PATCH 5/7] mm: Move clearing of page->mapping to page_cache_tree_delete() Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).