All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@linux.intel.com>,
	Dave Hansen <dave@sr71.net>
Cc: Jan Kara <jack@suse.cz>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Hugh Dickins <hughd@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Hillf Danton <dhillf@gmail.com>, Ning Qu <quning@google.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 01/23] radix-tree: implement preload for multiple contiguous elements
Date: Thu,  8 Aug 2013 11:45:05 +0300 (EEST)	[thread overview]
Message-ID: <20130808084505.31EACE0090@blue.fi.intel.com> (raw)
In-Reply-To: <20130807213736.AC732E0090@blue.fi.intel.com>

Kirill A. Shutemov wrote:
> In this case it should use 39 nodes, but it uses only 38. I can't understand why. :(

Okay, I've got it. We share one 2nd level node.

Patch is below. Please review.

>From 35ba5687ea7aea98645da34ddd0be01a9de8b32d Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Wed, 9 Jan 2013 13:25:47 +0200
Subject: [PATCH] radix-tree: implement preload for multiple contiguous
 elements

The radix tree is variable-height, so an insert operation not only has
to build the branch to its corresponding item, it also has to build the
branch to existing items if the size has to be increased (by
radix_tree_extend).

The worst case is a zero height tree with just a single item at index 0,
and then inserting an item at index ULONG_MAX. This requires 2 new branches
of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.

Radix tree is usually protected by spin lock. It means we want to
pre-allocate required memory before taking the lock.

Currently radix_tree_preload() only guarantees enough nodes to insert
one element. It's a hard limit. For transparent huge page cache we want
to insert HPAGE_PMD_NR (512 on x86-64) entries to address_space at once.

This patch introduces radix_tree_preload_count(). It allows to
preallocate nodes enough to insert a number of *contiguous* elements.
The feature costs about 9KiB per-CPU on x86_64, details below.

Preload uses per-CPU array to store nodes. The total cost of preload is
"array size" * sizeof(void*) * NR_CPUS. We want to increase array size
to be able to handle 512 entries at once.

Size of array depends on system bitness and on RADIX_TREE_MAP_SHIFT.

We have three possible RADIX_TREE_MAP_SHIFT:

 #ifdef __KERNEL__
 #define RADIX_TREE_MAP_SHIFT	(CONFIG_BASE_SMALL ? 4 : 6)
 #else
 #define RADIX_TREE_MAP_SHIFT	3	/* For more stressful testing */
 #endif

We are not going to use transparent huge page cache on small machines or
in userspace, so we are interested in RADIX_TREE_MAP_SHIFT=6.

On 64-bit system old array size is 21, new is 37. Per-CPU feature
overhead is
 for preload array:
   (37 - 21) * sizeof(void*) = 128 bytes
 plus, if the preload array is full
   (37 - 21) * sizeof(struct radix_tree_node) = 16 * 560 = 8960 bytes
 total: 9088 bytes

On 32-bit system old array size is 11, new is 22. Per-CPU feature
overhead is
 for preload array:
   (22 - 11) * sizeof(void*) = 44 bytes
 plus, if the preload array is full
   (22 - 11) * sizeof(struct radix_tree_node) = 11 * 296 = 3256 bytes
 total: 3300 bytes

Since only THP uses batched preload at the moment, we disable (set max
preload to 1) it if !CONFIG_TRANSPARENT_HUGEPAGE_PAGECACHE. This can be
changed in the future.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 include/linux/radix-tree.h | 11 ++++++
 lib/radix-tree.c           | 89 +++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 4039407..3bf0b3e 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -83,6 +83,16 @@ do {									\
 	(root)->rnode = NULL;						\
 } while (0)
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE_PAGECACHE
+/*
+ * At the moment only THP uses preload for more then on item for batched
+ * pagecache manipulations.
+ */
+#define RADIX_TREE_PRELOAD_NR	512
+#else
+#define RADIX_TREE_PRELOAD_NR	1
+#endif
+
 /**
  * Radix-tree synchronization
  *
@@ -232,6 +242,7 @@ unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
 				unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 int radix_tree_maybe_preload(gfp_t gfp_mask);
+int radix_tree_maybe_preload_contig(unsigned size, gfp_t gfp_mask);
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,
 			unsigned long index, unsigned int tag);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 7811ed3..980e4c4 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -84,14 +84,54 @@ static struct kmem_cache *radix_tree_node_cachep;
  * of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.
  * Hence:
  */
-#define RADIX_TREE_PRELOAD_SIZE (RADIX_TREE_MAX_PATH * 2 - 1)
+#define RADIX_TREE_PRELOAD_MIN (RADIX_TREE_MAX_PATH * 2 - 1)
+
+/*
+ * Inserting N contiguous items is more complex. To simplify calculation, let's
+ * limit N (validated in radix_tree_init()):
+ *  - N is multiplier of RADIX_TREE_MAP_SIZE (or 1);
+ *  - N <= number of items 2-level tree can contain:
+ *    1UL << (2 * RADIX_TREE_MAP_SHIFT).
+ *
+ * No limitation on insert index alignment.
+ *
+ * Then the worst case is tree with only one element at index 0 and we add N
+ * items where at least one index requires max tree high and we cross boundary
+ * between items in root node.
+ *
+ * Basically, at least one index is less then
+ *
+ * 1UL << ((RADIX_TREE_MAX_PATH - 1) * RADIX_TREE_MAP_SHIFT)
+ *
+ * and one is equal to.
+ *
+ * In this case we need:
+ *
+ * - RADIX_TREE_MAX_PATH nodes to build new path to item with index 0;
+ * - N / RADIX_TREE_MAP_SIZE + 1 nodes for last level nodes for new items:
+ *    - +1 is for misalinged case;
+ * - 2 * (RADIX_TREE_MAX_PATH - 2) - 1 nodes to build path to last level nodes:
+ *    - -2, because root node and last level nodes are already accounted);
+ *    - -1, because one second level node is shared between item with index 0
+ *      and item below the boundary;
+ *
+ * Hence:
+ */
+#if RADIX_TREE_PRELOAD_NR > 1
+#define RADIX_TREE_PRELOAD_MAX \
+	( RADIX_TREE_MAX_PATH + \
+	  RADIX_TREE_PRELOAD_NR / RADIX_TREE_MAP_SIZE + 1 + \
+	  2 * (RADIX_TREE_MAX_PATH - 2) - 1)
+#else
+#define RADIX_TREE_PRELOAD_MAX RADIX_TREE_PRELOAD_MIN
+#endif
 
 /*
  * Per-cpu pool of preloaded nodes
  */
 struct radix_tree_preload {
 	int nr;
-	struct radix_tree_node *nodes[RADIX_TREE_PRELOAD_SIZE];
+	struct radix_tree_node *nodes[RADIX_TREE_PRELOAD_MAX];
 };
 static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, };
 
@@ -263,29 +303,35 @@ radix_tree_node_free(struct radix_tree_node *node)
 
 /*
  * Load up this CPU's radix_tree_node buffer with sufficient objects to
- * ensure that the addition of a single element in the tree cannot fail.  On
- * success, return zero, with preemption disabled.  On error, return -ENOMEM
+ * ensure that the addition of *contiguous* items in the tree cannot fail.
+ * On success, return zero, with preemption disabled.  On error, return -ENOMEM
  * with preemption not disabled.
  *
  * To make use of this facility, the radix tree must be initialised without
  * __GFP_WAIT being passed to INIT_RADIX_TREE().
  */
-static int __radix_tree_preload(gfp_t gfp_mask)
+static int __radix_tree_preload_contig(unsigned size, gfp_t gfp_mask)
 {
 	struct radix_tree_preload *rtp;
 	struct radix_tree_node *node;
 	int ret = -ENOMEM;
+	int preload_target = RADIX_TREE_PRELOAD_MIN +
+		DIV_ROUND_UP(size - 1, RADIX_TREE_MAP_SIZE);
+
+	if (WARN_ONCE(size > RADIX_TREE_PRELOAD_NR,
+				"too large preload requested"))
+		return -ENOMEM;
 
 	preempt_disable();
 	rtp = &__get_cpu_var(radix_tree_preloads);
-	while (rtp->nr < ARRAY_SIZE(rtp->nodes)) {
+	while (rtp->nr < preload_target) {
 		preempt_enable();
 		node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
 		if (node == NULL)
 			goto out;
 		preempt_disable();
 		rtp = &__get_cpu_var(radix_tree_preloads);
-		if (rtp->nr < ARRAY_SIZE(rtp->nodes))
+		if (rtp->nr < preload_target)
 			rtp->nodes[rtp->nr++] = node;
 		else
 			kmem_cache_free(radix_tree_node_cachep, node);
@@ -308,7 +354,7 @@ int radix_tree_preload(gfp_t gfp_mask)
 {
 	/* Warn on non-sensical use... */
 	WARN_ON_ONCE(!(gfp_mask & __GFP_WAIT));
-	return __radix_tree_preload(gfp_mask);
+	return __radix_tree_preload_contig(1, gfp_mask);
 }
 EXPORT_SYMBOL(radix_tree_preload);
 
@@ -320,13 +366,22 @@ EXPORT_SYMBOL(radix_tree_preload);
 int radix_tree_maybe_preload(gfp_t gfp_mask)
 {
 	if (gfp_mask & __GFP_WAIT)
-		return __radix_tree_preload(gfp_mask);
+		return __radix_tree_preload_contig(1, gfp_mask);
 	/* Preloading doesn't help anything with this gfp mask, skip it */
 	preempt_disable();
 	return 0;
 }
 EXPORT_SYMBOL(radix_tree_maybe_preload);
 
+int radix_tree_maybe_preload_contig(unsigned size, gfp_t gfp_mask)
+{
+	if (gfp_mask & __GFP_WAIT)
+		return __radix_tree_preload_contig(size, gfp_mask);
+	/* Preloading doesn't help anything with this gfp mask, skip it */
+	preempt_disable();
+	return 0;
+}
+
 /*
  *	Return the maximum key which can be store into a
  *	radix tree with height HEIGHT.
@@ -1483,6 +1538,22 @@ static int radix_tree_callback(struct notifier_block *nfb,
 
 void __init radix_tree_init(void)
 {
+	/*
+	 * Restrictions on RADIX_TREE_PRELOAD_NR simplify RADIX_TREE_PRELOAD_MAX
+	 * calculation, it's already complex enough:
+	 *  - it must be multiplier of RADIX_TREE_MAP_SIZE, otherwise we will
+	 *    have to round it up to next RADIX_TREE_MAP_SIZE multiplier and we
+	 *    don't win anything;
+	 *  - must be less then number of items 2-level tree can contain.
+	 *    It's easier to calculate number of internal nodes required
+	 *    this way.
+	 */
+	if (RADIX_TREE_PRELOAD_NR != 1) {
+		BUILD_BUG_ON(RADIX_TREE_PRELOAD_NR % RADIX_TREE_MAP_SIZE != 0);
+		BUILD_BUG_ON(RADIX_TREE_PRELOAD_NR >
+				1UL << (2 * RADIX_TREE_MAP_SHIFT));
+	}
+
 	radix_tree_node_cachep = kmem_cache_create("radix_tree_node",
 			sizeof(struct radix_tree_node), 0,
 			SLAB_PANIC | SLAB_RECLAIM_ACCOUNT,
-- 
1.8.3.2

-- 
 Kirill A. Shutemov

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@linux.intel.com>,
	Dave Hansen <dave@sr71.net>
Cc: Jan Kara <jack@suse.cz>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Hugh Dickins <hughd@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Hillf Danton <dhillf@gmail.com>, Ning Qu <quning@google.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 01/23] radix-tree: implement preload for multiple contiguous elements
Date: Thu,  8 Aug 2013 11:45:05 +0300 (EEST)	[thread overview]
Message-ID: <20130808084505.31EACE0090@blue.fi.intel.com> (raw)
In-Reply-To: <20130807213736.AC732E0090@blue.fi.intel.com>

Kirill A. Shutemov wrote:
> In this case it should use 39 nodes, but it uses only 38. I can't understand why. :(

Okay, I've got it. We share one 2nd level node.

Patch is below. Please review.

>From 35ba5687ea7aea98645da34ddd0be01a9de8b32d Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Wed, 9 Jan 2013 13:25:47 +0200
Subject: [PATCH] radix-tree: implement preload for multiple contiguous
 elements

The radix tree is variable-height, so an insert operation not only has
to build the branch to its corresponding item, it also has to build the
branch to existing items if the size has to be increased (by
radix_tree_extend).

The worst case is a zero height tree with just a single item at index 0,
and then inserting an item at index ULONG_MAX. This requires 2 new branches
of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.

Radix tree is usually protected by spin lock. It means we want to
pre-allocate required memory before taking the lock.

Currently radix_tree_preload() only guarantees enough nodes to insert
one element. It's a hard limit. For transparent huge page cache we want
to insert HPAGE_PMD_NR (512 on x86-64) entries to address_space at once.

This patch introduces radix_tree_preload_count(). It allows to
preallocate nodes enough to insert a number of *contiguous* elements.
The feature costs about 9KiB per-CPU on x86_64, details below.

Preload uses per-CPU array to store nodes. The total cost of preload is
"array size" * sizeof(void*) * NR_CPUS. We want to increase array size
to be able to handle 512 entries at once.

Size of array depends on system bitness and on RADIX_TREE_MAP_SHIFT.

We have three possible RADIX_TREE_MAP_SHIFT:

 #ifdef __KERNEL__
 #define RADIX_TREE_MAP_SHIFT	(CONFIG_BASE_SMALL ? 4 : 6)
 #else
 #define RADIX_TREE_MAP_SHIFT	3	/* For more stressful testing */
 #endif

We are not going to use transparent huge page cache on small machines or
in userspace, so we are interested in RADIX_TREE_MAP_SHIFT=6.

On 64-bit system old array size is 21, new is 37. Per-CPU feature
overhead is
 for preload array:
   (37 - 21) * sizeof(void*) = 128 bytes
 plus, if the preload array is full
   (37 - 21) * sizeof(struct radix_tree_node) = 16 * 560 = 8960 bytes
 total: 9088 bytes

On 32-bit system old array size is 11, new is 22. Per-CPU feature
overhead is
 for preload array:
   (22 - 11) * sizeof(void*) = 44 bytes
 plus, if the preload array is full
   (22 - 11) * sizeof(struct radix_tree_node) = 11 * 296 = 3256 bytes
 total: 3300 bytes

Since only THP uses batched preload at the moment, we disable (set max
preload to 1) it if !CONFIG_TRANSPARENT_HUGEPAGE_PAGECACHE. This can be
changed in the future.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 include/linux/radix-tree.h | 11 ++++++
 lib/radix-tree.c           | 89 +++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 4039407..3bf0b3e 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -83,6 +83,16 @@ do {									\
 	(root)->rnode = NULL;						\
 } while (0)
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE_PAGECACHE
+/*
+ * At the moment only THP uses preload for more then on item for batched
+ * pagecache manipulations.
+ */
+#define RADIX_TREE_PRELOAD_NR	512
+#else
+#define RADIX_TREE_PRELOAD_NR	1
+#endif
+
 /**
  * Radix-tree synchronization
  *
@@ -232,6 +242,7 @@ unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
 				unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 int radix_tree_maybe_preload(gfp_t gfp_mask);
+int radix_tree_maybe_preload_contig(unsigned size, gfp_t gfp_mask);
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,
 			unsigned long index, unsigned int tag);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 7811ed3..980e4c4 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -84,14 +84,54 @@ static struct kmem_cache *radix_tree_node_cachep;
  * of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.
  * Hence:
  */
-#define RADIX_TREE_PRELOAD_SIZE (RADIX_TREE_MAX_PATH * 2 - 1)
+#define RADIX_TREE_PRELOAD_MIN (RADIX_TREE_MAX_PATH * 2 - 1)
+
+/*
+ * Inserting N contiguous items is more complex. To simplify calculation, let's
+ * limit N (validated in radix_tree_init()):
+ *  - N is multiplier of RADIX_TREE_MAP_SIZE (or 1);
+ *  - N <= number of items 2-level tree can contain:
+ *    1UL << (2 * RADIX_TREE_MAP_SHIFT).
+ *
+ * No limitation on insert index alignment.
+ *
+ * Then the worst case is tree with only one element at index 0 and we add N
+ * items where at least one index requires max tree high and we cross boundary
+ * between items in root node.
+ *
+ * Basically, at least one index is less then
+ *
+ * 1UL << ((RADIX_TREE_MAX_PATH - 1) * RADIX_TREE_MAP_SHIFT)
+ *
+ * and one is equal to.
+ *
+ * In this case we need:
+ *
+ * - RADIX_TREE_MAX_PATH nodes to build new path to item with index 0;
+ * - N / RADIX_TREE_MAP_SIZE + 1 nodes for last level nodes for new items:
+ *    - +1 is for misalinged case;
+ * - 2 * (RADIX_TREE_MAX_PATH - 2) - 1 nodes to build path to last level nodes:
+ *    - -2, because root node and last level nodes are already accounted);
+ *    - -1, because one second level node is shared between item with index 0
+ *      and item below the boundary;
+ *
+ * Hence:
+ */
+#if RADIX_TREE_PRELOAD_NR > 1
+#define RADIX_TREE_PRELOAD_MAX \
+	( RADIX_TREE_MAX_PATH + \
+	  RADIX_TREE_PRELOAD_NR / RADIX_TREE_MAP_SIZE + 1 + \
+	  2 * (RADIX_TREE_MAX_PATH - 2) - 1)
+#else
+#define RADIX_TREE_PRELOAD_MAX RADIX_TREE_PRELOAD_MIN
+#endif
 
 /*
  * Per-cpu pool of preloaded nodes
  */
 struct radix_tree_preload {
 	int nr;
-	struct radix_tree_node *nodes[RADIX_TREE_PRELOAD_SIZE];
+	struct radix_tree_node *nodes[RADIX_TREE_PRELOAD_MAX];
 };
 static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, };
 
@@ -263,29 +303,35 @@ radix_tree_node_free(struct radix_tree_node *node)
 
 /*
  * Load up this CPU's radix_tree_node buffer with sufficient objects to
- * ensure that the addition of a single element in the tree cannot fail.  On
- * success, return zero, with preemption disabled.  On error, return -ENOMEM
+ * ensure that the addition of *contiguous* items in the tree cannot fail.
+ * On success, return zero, with preemption disabled.  On error, return -ENOMEM
  * with preemption not disabled.
  *
  * To make use of this facility, the radix tree must be initialised without
  * __GFP_WAIT being passed to INIT_RADIX_TREE().
  */
-static int __radix_tree_preload(gfp_t gfp_mask)
+static int __radix_tree_preload_contig(unsigned size, gfp_t gfp_mask)
 {
 	struct radix_tree_preload *rtp;
 	struct radix_tree_node *node;
 	int ret = -ENOMEM;
+	int preload_target = RADIX_TREE_PRELOAD_MIN +
+		DIV_ROUND_UP(size - 1, RADIX_TREE_MAP_SIZE);
+
+	if (WARN_ONCE(size > RADIX_TREE_PRELOAD_NR,
+				"too large preload requested"))
+		return -ENOMEM;
 
 	preempt_disable();
 	rtp = &__get_cpu_var(radix_tree_preloads);
-	while (rtp->nr < ARRAY_SIZE(rtp->nodes)) {
+	while (rtp->nr < preload_target) {
 		preempt_enable();
 		node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
 		if (node == NULL)
 			goto out;
 		preempt_disable();
 		rtp = &__get_cpu_var(radix_tree_preloads);
-		if (rtp->nr < ARRAY_SIZE(rtp->nodes))
+		if (rtp->nr < preload_target)
 			rtp->nodes[rtp->nr++] = node;
 		else
 			kmem_cache_free(radix_tree_node_cachep, node);
@@ -308,7 +354,7 @@ int radix_tree_preload(gfp_t gfp_mask)
 {
 	/* Warn on non-sensical use... */
 	WARN_ON_ONCE(!(gfp_mask & __GFP_WAIT));
-	return __radix_tree_preload(gfp_mask);
+	return __radix_tree_preload_contig(1, gfp_mask);
 }
 EXPORT_SYMBOL(radix_tree_preload);
 
@@ -320,13 +366,22 @@ EXPORT_SYMBOL(radix_tree_preload);
 int radix_tree_maybe_preload(gfp_t gfp_mask)
 {
 	if (gfp_mask & __GFP_WAIT)
-		return __radix_tree_preload(gfp_mask);
+		return __radix_tree_preload_contig(1, gfp_mask);
 	/* Preloading doesn't help anything with this gfp mask, skip it */
 	preempt_disable();
 	return 0;
 }
 EXPORT_SYMBOL(radix_tree_maybe_preload);
 
+int radix_tree_maybe_preload_contig(unsigned size, gfp_t gfp_mask)
+{
+	if (gfp_mask & __GFP_WAIT)
+		return __radix_tree_preload_contig(size, gfp_mask);
+	/* Preloading doesn't help anything with this gfp mask, skip it */
+	preempt_disable();
+	return 0;
+}
+
 /*
  *	Return the maximum key which can be store into a
  *	radix tree with height HEIGHT.
@@ -1483,6 +1538,22 @@ static int radix_tree_callback(struct notifier_block *nfb,
 
 void __init radix_tree_init(void)
 {
+	/*
+	 * Restrictions on RADIX_TREE_PRELOAD_NR simplify RADIX_TREE_PRELOAD_MAX
+	 * calculation, it's already complex enough:
+	 *  - it must be multiplier of RADIX_TREE_MAP_SIZE, otherwise we will
+	 *    have to round it up to next RADIX_TREE_MAP_SIZE multiplier and we
+	 *    don't win anything;
+	 *  - must be less then number of items 2-level tree can contain.
+	 *    It's easier to calculate number of internal nodes required
+	 *    this way.
+	 */
+	if (RADIX_TREE_PRELOAD_NR != 1) {
+		BUILD_BUG_ON(RADIX_TREE_PRELOAD_NR % RADIX_TREE_MAP_SIZE != 0);
+		BUILD_BUG_ON(RADIX_TREE_PRELOAD_NR >
+				1UL << (2 * RADIX_TREE_MAP_SHIFT));
+	}
+
 	radix_tree_node_cachep = kmem_cache_create("radix_tree_node",
 			sizeof(struct radix_tree_node), 0,
 			SLAB_PANIC | SLAB_RECLAIM_ACCOUNT,
-- 
1.8.3.2

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@linux.intel.com>,
	Dave Hansen <dave@sr71.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Hugh Dickins <hughd@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Hillf Danton <dhillf@gmail.com>, Ning Qu <quning@google.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 01/23] radix-tree: implement preload for multiple contiguous elements
Date: Thu,  8 Aug 2013 11:45:05 +0300 (EEST)	[thread overview]
Message-ID: <20130808084505.31EACE0090@blue.fi.intel.com> (raw)
In-Reply-To: <20130807213736.AC732E0090@blue.fi.intel.com>

Kirill A. Shutemov wrote:
> In this case it should use 39 nodes, but it uses only 38. I can't understand why. :(

Okay, I've got it. We share one 2nd level node.

Patch is below. Please review.

  reply	other threads:[~2013-08-08  8:42 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-04  2:17 [PATCHv5 00/23] Transparent huge page cache: phase 1, everything but mmap() Kirill A. Shutemov
2013-08-04  2:17 ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 01/23] radix-tree: implement preload for multiple contiguous elements Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-05 11:17   ` Jan Kara
2013-08-05 11:17     ` Jan Kara
2013-08-06 16:34     ` Matthew Wilcox
2013-08-06 16:34       ` Matthew Wilcox
2013-08-06 20:17       ` Jan Kara
2013-08-06 20:17         ` Jan Kara
2013-08-07 16:32     ` Kirill A. Shutemov
2013-08-07 16:32       ` Kirill A. Shutemov
2013-08-07 16:32       ` Kirill A. Shutemov
2013-08-07 20:00       ` Jan Kara
2013-08-07 20:00         ` Jan Kara
2013-08-07 20:24         ` Kirill A. Shutemov
2013-08-07 20:24           ` Kirill A. Shutemov
2013-08-07 20:24           ` Kirill A. Shutemov
2013-08-07 20:36           ` Jan Kara
2013-08-07 20:36             ` Jan Kara
2013-08-07 21:37             ` Kirill A. Shutemov
2013-08-07 21:37               ` Kirill A. Shutemov
2013-08-07 21:37               ` Kirill A. Shutemov
2013-08-08  8:45               ` Kirill A. Shutemov [this message]
2013-08-08  8:45                 ` Kirill A. Shutemov
2013-08-08  8:45                 ` Kirill A. Shutemov
2013-08-08 10:04                 ` Jan Kara
2013-08-08 10:04                   ` Jan Kara
2013-08-09 11:13                   ` Kirill A. Shutemov
2013-08-09 11:13                     ` Kirill A. Shutemov
2013-08-09 11:13                     ` Kirill A. Shutemov
2013-08-09 11:36                     ` Jan Kara
2013-08-09 11:36                       ` Jan Kara
2013-08-04  2:17 ` [PATCH 02/23] memcg, thp: charge huge cache pages Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  8:25   ` Michal Hocko
2013-08-04  8:25     ` Michal Hocko
2013-08-04  2:17 ` [PATCH 03/23] thp: compile-time and sysfs knob for thp pagecache Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-09-05 21:53   ` Ning Qu
2013-09-06 11:33     ` Kirill A. Shutemov
2013-09-06 11:33       ` Kirill A. Shutemov
2013-09-06 11:33       ` Kirill A. Shutemov
2013-09-06 17:14       ` Ning Qu
2013-08-04  2:17 ` [PATCH 04/23] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 05/23] thp: represent file thp pages in meminfo and friends Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-30 22:16   ` Ning Qu
2013-09-02 11:36     ` Kirill A. Shutemov
2013-09-02 11:36       ` Kirill A. Shutemov
2013-09-02 20:05       ` Ning Qu
2013-08-04  2:17 ` [PATCH 06/23] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 07/23] mm: trace filemap: dump page order Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 08/23] block: implement add_bdi_stat() Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-05 11:21   ` Jan Kara
2013-08-05 11:21     ` Jan Kara
2013-08-04  2:17 ` [PATCH 09/23] thp, mm: rewrite delete_from_page_cache() to support huge pages Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 10/23] thp, mm: warn if we try to use replace_page_cache_page() with THP Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 11/23] thp, mm: handle tail pages in page_cache_get_speculative() Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 12/23] thp, mm: add event counters for huge page alloc on file write or read Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 13/23] thp, mm: allocate huge pages in grab_cache_page_write_begin() Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 14/23] thp, mm: naive support of thp in generic_perform_write Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 15/23] mm, fs: avoid page allocation beyond i_size on read Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-05  0:29   ` NeilBrown
2013-08-05  0:29     ` NeilBrown
2013-08-04  2:17 ` [PATCH 16/23] thp, mm: handle transhuge pages in do_generic_file_read() Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 17/23] thp, libfs: initial thp support Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 18/23] thp: libfs: introduce simple_thp_release() Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 19/23] truncate: support huge pages Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-05 13:29   ` Jan Kara
2013-08-05 13:29     ` Jan Kara
2013-08-06 20:23   ` Dave Hansen
2013-08-06 20:23     ` Dave Hansen
2013-08-06 20:57     ` Kirill A. Shutemov
2013-08-06 20:57       ` Kirill A. Shutemov
2013-08-06 21:55   ` Dave Hansen
2013-08-06 21:55     ` Dave Hansen
2013-08-09 14:39     ` Kirill A. Shutemov
2013-08-09 14:39       ` Kirill A. Shutemov
2013-08-09 14:39       ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 20/23] thp: handle file pages in split_huge_page() Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-06 19:09   ` Ning Qu
2013-08-06 21:09     ` Ning Qu
2013-08-06 21:47       ` Ning Qu
2013-08-09 14:46         ` Kirill A. Shutemov
2013-08-09 14:46           ` Kirill A. Shutemov
2013-08-09 14:46           ` Kirill A. Shutemov
2013-08-09 14:49           ` Ning Qu
2013-08-09 21:24             ` Ning Qu
2013-08-09 21:24               ` Ning Qu
2013-08-04  2:17 ` [PATCH 21/23] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 22/23] thp, mm: split huge page on mmap file page Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov
2013-08-08 20:49   ` Khalid Aziz
2013-08-08 20:49     ` Khalid Aziz
2013-08-09 14:50     ` Kirill A. Shutemov
2013-08-09 14:50       ` Kirill A. Shutemov
2013-08-04  2:17 ` [PATCH 23/23] ramfs: enable transparent huge page cache Kirill A. Shutemov
2013-08-04  2:17   ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130808084505.31EACE0090@blue.fi.intel.com \
    --to=kirill.shutemov@linux.intel.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@sr71.net \
    --cc=dhillf@gmail.com \
    --cc=fengguang.wu@intel.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=quning@google.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.