All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] slub: bulk alloc and free for slub allocator
@ 2015-06-15 15:51 ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:51 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

With this patchset SLUB allocator now both have bulk alloc and free
implemented.

(This patchset is based on DaveM's net-next tree on-top of commit
c3eee1fb1d308.  Tested patchset applied on-top of volatile linux-next
commit aa036f86e1bf ("slub bulk alloc: extract objects from the per
cpu slab"))

This mostly optimizes the "fastpath" where objects are available on
the per CPU fastpath page.  This mostly amortize the less-heavy
none-locked cmpxchg_double used on fastpath.

The "fallback bulking" (e.g __kmem_cache_free_bulk) provides a good
basis for comparison, but to avoid counting the overhead of the
function call in benchmarking[1] I've used an inlined versions of
these.

Tested on (very fast) CPU i7-4790K @ 4.00GHz, thus look at cycles
count (as nanosec measurements are very low given the clock rate).

Baseline normal fastpath (alloc+free cost): 43 cycles(tsc) 10.814 ns

Bulk - Fallback bulking           - fastpath-bulking
   1 -  47 cycles(tsc) 11.921 ns  -  45 cycles(tsc) 11.461 ns   improved  4.3%
   2 -  46 cycles(tsc) 11.649 ns  -  28 cycles(tsc)  7.023 ns   improved 39.1%
   3 -  46 cycles(tsc) 11.550 ns  -  22 cycles(tsc)  5.671 ns   improved 52.2%
   4 -  45 cycles(tsc) 11.398 ns  -  19 cycles(tsc)  4.967 ns   improved 57.8%
   8 -  45 cycles(tsc) 11.303 ns  -  17 cycles(tsc)  4.298 ns   improved 62.2%
  16 -  44 cycles(tsc) 11.221 ns  -  17 cycles(tsc)  4.423 ns   improved 61.4%
  30 -  75 cycles(tsc) 18.894 ns  -  57 cycles(tsc) 14.497 ns   improved 24.0%
  32 -  73 cycles(tsc) 18.491 ns  -  56 cycles(tsc) 14.227 ns   improved 23.3%
  34 -  75 cycles(tsc) 18.962 ns  -  58 cycles(tsc) 14.638 ns   improved 22.7%
  48 -  80 cycles(tsc) 20.049 ns  -  64 cycles(tsc) 16.247 ns   improved 20.0%
  64 -  87 cycles(tsc) 21.929 ns  -  74 cycles(tsc) 18.598 ns   improved 14.9%
 128 -  98 cycles(tsc) 24.511 ns  -  89 cycles(tsc) 22.295 ns   improved  9.2%
 158 - 101 cycles(tsc) 25.389 ns  -  93 cycles(tsc) 23.390 ns   improved  7.9%
 250 - 104 cycles(tsc) 26.170 ns  - 100 cycles(tsc) 25.112 ns   improved  3.8%

Benchmarking shows impressive improvements in the "fastpath" with a
small number of objects in the working set.  Once the working set
increases, resulting in activating the "slowpath" (that contains the
heavier locked cmpxchg_double) the improvement decreases.

I'm currently working on also optimizing the "slowpath" (as network
stack use-case hits this), but this patchset should provide a good
foundation for further improvements.

Rest of my patch queue in this area needs some more work, but
preliminary results are good.  I'm attending Netfilter Workshop[2]
next week, and I'll hopefully return working on further improvements
in this area.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c
[2] http://workshop.netfilter.org/2015/
---

Christoph Lameter (2):
      slab: infrastructure for bulk object allocation and freeing
      slub bulk alloc: extract objects from the per cpu slab

Jesper Dangaard Brouer (5):
      slub: reduce indention level in kmem_cache_alloc_bulk()
      slub: fix error path bug in kmem_cache_alloc_bulk
      slub: kmem_cache_alloc_bulk() move clearing outside IRQ disabled section
      slub: improve bulk alloc strategy
      slub: initial bulk free implementation


 include/linux/slab.h |   10 +++++
 mm/slab.c            |   13 +++++++
 mm/slab.h            |    9 +++++
 mm/slab_common.c     |   23 ++++++++++++
 mm/slob.c            |   13 +++++++
 mm/slub.c            |   93 ++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 161 insertions(+)

--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 0/7] slub: bulk alloc and free for slub allocator
@ 2015-06-15 15:51 ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:51 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

With this patchset SLUB allocator now both have bulk alloc and free
implemented.

(This patchset is based on DaveM's net-next tree on-top of commit
c3eee1fb1d308.  Tested patchset applied on-top of volatile linux-next
commit aa036f86e1bf ("slub bulk alloc: extract objects from the per
cpu slab"))

This mostly optimizes the "fastpath" where objects are available on
the per CPU fastpath page.  This mostly amortize the less-heavy
none-locked cmpxchg_double used on fastpath.

The "fallback bulking" (e.g __kmem_cache_free_bulk) provides a good
basis for comparison, but to avoid counting the overhead of the
function call in benchmarking[1] I've used an inlined versions of
these.

Tested on (very fast) CPU i7-4790K @ 4.00GHz, thus look at cycles
count (as nanosec measurements are very low given the clock rate).

Baseline normal fastpath (alloc+free cost): 43 cycles(tsc) 10.814 ns

Bulk - Fallback bulking           - fastpath-bulking
   1 -  47 cycles(tsc) 11.921 ns  -  45 cycles(tsc) 11.461 ns   improved  4.3%
   2 -  46 cycles(tsc) 11.649 ns  -  28 cycles(tsc)  7.023 ns   improved 39.1%
   3 -  46 cycles(tsc) 11.550 ns  -  22 cycles(tsc)  5.671 ns   improved 52.2%
   4 -  45 cycles(tsc) 11.398 ns  -  19 cycles(tsc)  4.967 ns   improved 57.8%
   8 -  45 cycles(tsc) 11.303 ns  -  17 cycles(tsc)  4.298 ns   improved 62.2%
  16 -  44 cycles(tsc) 11.221 ns  -  17 cycles(tsc)  4.423 ns   improved 61.4%
  30 -  75 cycles(tsc) 18.894 ns  -  57 cycles(tsc) 14.497 ns   improved 24.0%
  32 -  73 cycles(tsc) 18.491 ns  -  56 cycles(tsc) 14.227 ns   improved 23.3%
  34 -  75 cycles(tsc) 18.962 ns  -  58 cycles(tsc) 14.638 ns   improved 22.7%
  48 -  80 cycles(tsc) 20.049 ns  -  64 cycles(tsc) 16.247 ns   improved 20.0%
  64 -  87 cycles(tsc) 21.929 ns  -  74 cycles(tsc) 18.598 ns   improved 14.9%
 128 -  98 cycles(tsc) 24.511 ns  -  89 cycles(tsc) 22.295 ns   improved  9.2%
 158 - 101 cycles(tsc) 25.389 ns  -  93 cycles(tsc) 23.390 ns   improved  7.9%
 250 - 104 cycles(tsc) 26.170 ns  - 100 cycles(tsc) 25.112 ns   improved  3.8%

Benchmarking shows impressive improvements in the "fastpath" with a
small number of objects in the working set.  Once the working set
increases, resulting in activating the "slowpath" (that contains the
heavier locked cmpxchg_double) the improvement decreases.

I'm currently working on also optimizing the "slowpath" (as network
stack use-case hits this), but this patchset should provide a good
foundation for further improvements.

Rest of my patch queue in this area needs some more work, but
preliminary results are good.  I'm attending Netfilter Workshop[2]
next week, and I'll hopefully return working on further improvements
in this area.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c
[2] http://workshop.netfilter.org/2015/
---

Christoph Lameter (2):
      slab: infrastructure for bulk object allocation and freeing
      slub bulk alloc: extract objects from the per cpu slab

Jesper Dangaard Brouer (5):
      slub: reduce indention level in kmem_cache_alloc_bulk()
      slub: fix error path bug in kmem_cache_alloc_bulk
      slub: kmem_cache_alloc_bulk() move clearing outside IRQ disabled section
      slub: improve bulk alloc strategy
      slub: initial bulk free implementation


 include/linux/slab.h |   10 +++++
 mm/slab.c            |   13 +++++++
 mm/slab.h            |    9 +++++
 mm/slab_common.c     |   23 ++++++++++++
 mm/slob.c            |   13 +++++++
 mm/slub.c            |   93 ++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 161 insertions(+)

--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing
  2015-06-15 15:51 ` Jesper Dangaard Brouer
@ 2015-06-15 15:51   ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:51 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

From: Christoph Lameter <cl@linux.com>

[NOTICE: Already in AKPM's quilt-queue]

Add the basic infrastructure for alloc/free operations on pointer arrays.
It includes a generic function in the common slab code that is used in
this infrastructure patch to create the unoptimized functionality for slab
bulk operations.

Allocators can then provide optimized allocation functions for situations
in which large numbers of objects are needed.  These optimization may
avoid taking locks repeatedly and bypass metadata creation if all objects
in slab pages can be used to provide the objects required.

Allocators can extend the skeletons provided and add their own code to the
bulk alloc and free functions.  They can keep the generic allocation and
freeing and just fall back to those if optimizations would not work (like
for example when debugging is on).

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/slab.h |   10 ++++++++++
 mm/slab.c            |   13 +++++++++++++
 mm/slab.h            |    9 +++++++++
 mm/slab_common.c     |   23 +++++++++++++++++++++++
 mm/slob.c            |   13 +++++++++++++
 mm/slub.c            |   14 ++++++++++++++
 6 files changed, 82 insertions(+)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ffd24c830151..5db59c950ef7 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -290,6 +290,16 @@ void *__kmalloc(size_t size, gfp_t flags);
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags);
 void kmem_cache_free(struct kmem_cache *, void *);
 
+/*
+ * Bulk allocation and freeing operations. These are accellerated in an
+ * allocator specific way to avoid taking locks repeatedly or building
+ * metadata structures unnecessarily.
+ *
+ * Note that interrupts must be enabled when calling these functions.
+ */
+void kmem_cache_free_bulk(struct kmem_cache *, size_t, void **);
+bool kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **);
+
 #ifdef CONFIG_NUMA
 void *__kmalloc_node(size_t size, gfp_t flags, int node);
 void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
diff --git a/mm/slab.c b/mm/slab.c
index 7eb38dd1cefa..c799d7ed0e18 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3415,6 +3415,19 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
 }
 EXPORT_SYMBOL(kmem_cache_alloc);
 
+void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+{
+	__kmem_cache_free_bulk(s, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_free_bulk);
+
+bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
+								void **p)
+{
+	return kmem_cache_alloc_bulk(s, flags, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+
 #ifdef CONFIG_TRACING
 void *
 kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
diff --git a/mm/slab.h b/mm/slab.h
index 4c3ac12dd644..6a427a74cca5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -162,6 +162,15 @@ void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s);
 ssize_t slabinfo_write(struct file *file, const char __user *buffer,
 		       size_t count, loff_t *ppos);
 
+/*
+ * Generic implementation of bulk operations
+ * These are useful for situations in which the allocator cannot
+ * perform optimizations. In that case segments of the objecct listed
+ * may be allocated or freed using these operations.
+ */
+void __kmem_cache_free_bulk(struct kmem_cache *, size_t, void **);
+bool __kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **);
+
 #ifdef CONFIG_MEMCG_KMEM
 /*
  * Iterate over all memcg caches of the given root cache. The caller must hold
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 999bb3424d44..f8acc2bdb88b 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -105,6 +105,29 @@ static inline int kmem_cache_sanity_check(const char *name, size_t size)
 }
 #endif
 
+void __kmem_cache_free_bulk(struct kmem_cache *s, size_t nr, void **p)
+{
+	size_t i;
+
+	for (i = 0; i < nr; i++)
+		kmem_cache_free(s, p[i]);
+}
+
+bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
+								void **p)
+{
+	size_t i;
+
+	for (i = 0; i < nr; i++) {
+		void *x = p[i] = kmem_cache_alloc(s, flags);
+		if (!x) {
+			__kmem_cache_free_bulk(s, i, p);
+			return false;
+		}
+	}
+	return true;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 void slab_init_memcg_params(struct kmem_cache *s)
 {
diff --git a/mm/slob.c b/mm/slob.c
index 4765f65019c7..495df8e006ec 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -611,6 +611,19 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
 }
 EXPORT_SYMBOL(kmem_cache_free);
 
+void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+{
+	__kmem_cache_free_bulk(s, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_free_bulk);
+
+bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
+								void **p)
+{
+	return kmem_cache_alloc_bulk(s, flags, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+
 int __kmem_cache_shutdown(struct kmem_cache *c)
 {
 	/* No way to check for remaining objects */
diff --git a/mm/slub.c b/mm/slub.c
index 54c0876b43d5..80f17403e503 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2750,6 +2750,20 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
 }
 EXPORT_SYMBOL(kmem_cache_free);
 
+void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+{
+	__kmem_cache_free_bulk(s, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_free_bulk);
+
+bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
+								void **p)
+{
+	return kmem_cache_alloc_bulk(s, flags, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+
+
 /*
  * Object placement in a slab is made very easy because we always start at
  * offset 0. If we tune the size of the object to the alignment then we can

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing
@ 2015-06-15 15:51   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:51 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

From: Christoph Lameter <cl@linux.com>

[NOTICE: Already in AKPM's quilt-queue]

Add the basic infrastructure for alloc/free operations on pointer arrays.
It includes a generic function in the common slab code that is used in
this infrastructure patch to create the unoptimized functionality for slab
bulk operations.

Allocators can then provide optimized allocation functions for situations
in which large numbers of objects are needed.  These optimization may
avoid taking locks repeatedly and bypass metadata creation if all objects
in slab pages can be used to provide the objects required.

Allocators can extend the skeletons provided and add their own code to the
bulk alloc and free functions.  They can keep the generic allocation and
freeing and just fall back to those if optimizations would not work (like
for example when debugging is on).

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/slab.h |   10 ++++++++++
 mm/slab.c            |   13 +++++++++++++
 mm/slab.h            |    9 +++++++++
 mm/slab_common.c     |   23 +++++++++++++++++++++++
 mm/slob.c            |   13 +++++++++++++
 mm/slub.c            |   14 ++++++++++++++
 6 files changed, 82 insertions(+)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ffd24c830151..5db59c950ef7 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -290,6 +290,16 @@ void *__kmalloc(size_t size, gfp_t flags);
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags);
 void kmem_cache_free(struct kmem_cache *, void *);
 
+/*
+ * Bulk allocation and freeing operations. These are accellerated in an
+ * allocator specific way to avoid taking locks repeatedly or building
+ * metadata structures unnecessarily.
+ *
+ * Note that interrupts must be enabled when calling these functions.
+ */
+void kmem_cache_free_bulk(struct kmem_cache *, size_t, void **);
+bool kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **);
+
 #ifdef CONFIG_NUMA
 void *__kmalloc_node(size_t size, gfp_t flags, int node);
 void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
diff --git a/mm/slab.c b/mm/slab.c
index 7eb38dd1cefa..c799d7ed0e18 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3415,6 +3415,19 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
 }
 EXPORT_SYMBOL(kmem_cache_alloc);
 
+void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+{
+	__kmem_cache_free_bulk(s, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_free_bulk);
+
+bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
+								void **p)
+{
+	return kmem_cache_alloc_bulk(s, flags, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+
 #ifdef CONFIG_TRACING
 void *
 kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
diff --git a/mm/slab.h b/mm/slab.h
index 4c3ac12dd644..6a427a74cca5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -162,6 +162,15 @@ void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s);
 ssize_t slabinfo_write(struct file *file, const char __user *buffer,
 		       size_t count, loff_t *ppos);
 
+/*
+ * Generic implementation of bulk operations
+ * These are useful for situations in which the allocator cannot
+ * perform optimizations. In that case segments of the objecct listed
+ * may be allocated or freed using these operations.
+ */
+void __kmem_cache_free_bulk(struct kmem_cache *, size_t, void **);
+bool __kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **);
+
 #ifdef CONFIG_MEMCG_KMEM
 /*
  * Iterate over all memcg caches of the given root cache. The caller must hold
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 999bb3424d44..f8acc2bdb88b 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -105,6 +105,29 @@ static inline int kmem_cache_sanity_check(const char *name, size_t size)
 }
 #endif
 
+void __kmem_cache_free_bulk(struct kmem_cache *s, size_t nr, void **p)
+{
+	size_t i;
+
+	for (i = 0; i < nr; i++)
+		kmem_cache_free(s, p[i]);
+}
+
+bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
+								void **p)
+{
+	size_t i;
+
+	for (i = 0; i < nr; i++) {
+		void *x = p[i] = kmem_cache_alloc(s, flags);
+		if (!x) {
+			__kmem_cache_free_bulk(s, i, p);
+			return false;
+		}
+	}
+	return true;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 void slab_init_memcg_params(struct kmem_cache *s)
 {
diff --git a/mm/slob.c b/mm/slob.c
index 4765f65019c7..495df8e006ec 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -611,6 +611,19 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
 }
 EXPORT_SYMBOL(kmem_cache_free);
 
+void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+{
+	__kmem_cache_free_bulk(s, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_free_bulk);
+
+bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
+								void **p)
+{
+	return kmem_cache_alloc_bulk(s, flags, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+
 int __kmem_cache_shutdown(struct kmem_cache *c)
 {
 	/* No way to check for remaining objects */
diff --git a/mm/slub.c b/mm/slub.c
index 54c0876b43d5..80f17403e503 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2750,6 +2750,20 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
 }
 EXPORT_SYMBOL(kmem_cache_free);
 
+void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+{
+	__kmem_cache_free_bulk(s, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_free_bulk);
+
+bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
+								void **p)
+{
+	return kmem_cache_alloc_bulk(s, flags, size, p);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+
+
 /*
  * Object placement in a slab is made very easy because we always start at
  * offset 0. If we tune the size of the object to the alignment then we can

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab
  2015-06-15 15:51 ` Jesper Dangaard Brouer
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

From: Christoph Lameter <cl@linux.com>

[NOTICE: Already in AKPM's quilt-queue]

First piece: acceleration of retrieval of per cpu objects

If we are allocating lots of objects then it is advantageous to disable
interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
faster.

Note that we cannot do the fast operation if debugging is enabled, because
we would have to add extra code to do all the debugging checks.  And it
would not be fast anyway.

Note also that the requirement of having interrupts disabled
avoids having to do processor flag operations.

Allocate as many objects as possible in the fast way and then fall back to
the generic implementation for the rest of the objects.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slub.c |   27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 80f17403e503..d18f8e195ac4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
 bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 								void **p)
 {
-	return kmem_cache_alloc_bulk(s, flags, size, p);
+	if (!kmem_cache_debug(s)) {
+		struct kmem_cache_cpu *c;
+
+		/* Drain objects in the per cpu slab */
+		local_irq_disable();
+		c = this_cpu_ptr(s->cpu_slab);
+
+		while (size) {
+			void *object = c->freelist;
+
+			if (!object)
+				break;
+
+			c->freelist = get_freepointer(s, object);
+			*p++ = object;
+			size--;
+
+			if (unlikely(flags & __GFP_ZERO))
+				memset(object, 0, s->object_size);
+		}
+		c->tid = next_tid(c->tid);
+
+		local_irq_enable();
+	}
+
+	return __kmem_cache_alloc_bulk(s, flags, size, p);
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);
 

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

From: Christoph Lameter <cl@linux.com>

[NOTICE: Already in AKPM's quilt-queue]

First piece: acceleration of retrieval of per cpu objects

If we are allocating lots of objects then it is advantageous to disable
interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
faster.

Note that we cannot do the fast operation if debugging is enabled, because
we would have to add extra code to do all the debugging checks.  And it
would not be fast anyway.

Note also that the requirement of having interrupts disabled
avoids having to do processor flag operations.

Allocate as many objects as possible in the fast way and then fall back to
the generic implementation for the rest of the objects.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/slub.c |   27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 80f17403e503..d18f8e195ac4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
 bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 								void **p)
 {
-	return kmem_cache_alloc_bulk(s, flags, size, p);
+	if (!kmem_cache_debug(s)) {
+		struct kmem_cache_cpu *c;
+
+		/* Drain objects in the per cpu slab */
+		local_irq_disable();
+		c = this_cpu_ptr(s->cpu_slab);
+
+		while (size) {
+			void *object = c->freelist;
+
+			if (!object)
+				break;
+
+			c->freelist = get_freepointer(s, object);
+			*p++ = object;
+			size--;
+
+			if (unlikely(flags & __GFP_ZERO))
+				memset(object, 0, s->object_size);
+		}
+		c->tid = next_tid(c->tid);
+
+		local_irq_enable();
+	}
+
+	return __kmem_cache_alloc_bulk(s, flags, size, p);
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/7] slub: reduce indention level in kmem_cache_alloc_bulk()
  2015-06-15 15:51 ` Jesper Dangaard Brouer
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

Use kernel early return style to reduce indention level,
by testing for kmem_cache_debug() and fallback to
none-optimized bulking via __kmem_cache_alloc_bulk().

This also make it easier to fix a bug in the current
implementation, in the next patch.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d18f8e195ac4..753f88bd8b40 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2757,32 +2757,33 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 EXPORT_SYMBOL(kmem_cache_free_bulk);
 
 bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
-								void **p)
+			   void **p)
 {
-	if (!kmem_cache_debug(s)) {
-		struct kmem_cache_cpu *c;
+	struct kmem_cache_cpu *c;
 
-		/* Drain objects in the per cpu slab */
-		local_irq_disable();
-		c = this_cpu_ptr(s->cpu_slab);
+	/* Debugging fallback to generic bulk */
+	if (kmem_cache_debug(s))
+		return __kmem_cache_alloc_bulk(s, flags, size, p);
 
-		while (size) {
-			void *object = c->freelist;
+	/* Drain objects in the per cpu slab */
+	local_irq_disable();
+	c = this_cpu_ptr(s->cpu_slab);
 
-			if (!object)
-				break;
+	while (size) {
+		void *object = c->freelist;
 
-			c->freelist = get_freepointer(s, object);
-			*p++ = object;
-			size--;
+		if (!object)
+			break;
 
-			if (unlikely(flags & __GFP_ZERO))
-				memset(object, 0, s->object_size);
-		}
-		c->tid = next_tid(c->tid);
+		c->freelist = get_freepointer(s, object);
+		*p++ = object;
+		size--;
 
-		local_irq_enable();
+		if (unlikely(flags & __GFP_ZERO))
+			memset(object, 0, s->object_size);
 	}
+	c->tid = next_tid(c->tid);
+	local_irq_enable();
 
 	return __kmem_cache_alloc_bulk(s, flags, size, p);
 }

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/7] slub: reduce indention level in kmem_cache_alloc_bulk()
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

Use kernel early return style to reduce indention level,
by testing for kmem_cache_debug() and fallback to
none-optimized bulking via __kmem_cache_alloc_bulk().

This also make it easier to fix a bug in the current
implementation, in the next patch.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d18f8e195ac4..753f88bd8b40 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2757,32 +2757,33 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 EXPORT_SYMBOL(kmem_cache_free_bulk);
 
 bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
-								void **p)
+			   void **p)
 {
-	if (!kmem_cache_debug(s)) {
-		struct kmem_cache_cpu *c;
+	struct kmem_cache_cpu *c;
 
-		/* Drain objects in the per cpu slab */
-		local_irq_disable();
-		c = this_cpu_ptr(s->cpu_slab);
+	/* Debugging fallback to generic bulk */
+	if (kmem_cache_debug(s))
+		return __kmem_cache_alloc_bulk(s, flags, size, p);
 
-		while (size) {
-			void *object = c->freelist;
+	/* Drain objects in the per cpu slab */
+	local_irq_disable();
+	c = this_cpu_ptr(s->cpu_slab);
 
-			if (!object)
-				break;
+	while (size) {
+		void *object = c->freelist;
 
-			c->freelist = get_freepointer(s, object);
-			*p++ = object;
-			size--;
+		if (!object)
+			break;
 
-			if (unlikely(flags & __GFP_ZERO))
-				memset(object, 0, s->object_size);
-		}
-		c->tid = next_tid(c->tid);
+		c->freelist = get_freepointer(s, object);
+		*p++ = object;
+		size--;
 
-		local_irq_enable();
+		if (unlikely(flags & __GFP_ZERO))
+			memset(object, 0, s->object_size);
 	}
+	c->tid = next_tid(c->tid);
+	local_irq_enable();
 
 	return __kmem_cache_alloc_bulk(s, flags, size, p);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 4/7] slub: fix error path bug in kmem_cache_alloc_bulk
  2015-06-15 15:51 ` Jesper Dangaard Brouer
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

The current kmem_cache/SLAB bulking API need to release all objects
in case the layer cannot satisfy the full request.

If __kmem_cache_alloc_bulk() fails, all allocated objects in array
should be freed, but, __kmem_cache_alloc_bulk() can't know
about objects allocated by this slub specific kmem_cache_alloc_bulk()
function.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 753f88bd8b40..d10de5a33c03 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2760,24 +2760,27 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			   void **p)
 {
 	struct kmem_cache_cpu *c;
+	int i;
 
 	/* Debugging fallback to generic bulk */
 	if (kmem_cache_debug(s))
 		return __kmem_cache_alloc_bulk(s, flags, size, p);
 
-	/* Drain objects in the per cpu slab */
+	/* Drain objects in the per cpu slab, while disabling local
+	 * IRQs, which protects against PREEMPT and interrupts
+	 * handlers invoking normal fastpath.
+	 */
 	local_irq_disable();
 	c = this_cpu_ptr(s->cpu_slab);
 
-	while (size) {
+	for (i = 0; i < size; i++) {
 		void *object = c->freelist;
 
 		if (!object)
 			break;
 
 		c->freelist = get_freepointer(s, object);
-		*p++ = object;
-		size--;
+		p[i] = object;
 
 		if (unlikely(flags & __GFP_ZERO))
 			memset(object, 0, s->object_size);
@@ -2785,7 +2788,15 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
 
-	return __kmem_cache_alloc_bulk(s, flags, size, p);
+	/* Fallback to single elem alloc */
+	for (; i < size; i++) {
+		void *x = p[i] = kmem_cache_alloc(s, flags);
+		if (unlikely(!x)) {
+			__kmem_cache_free_bulk(s, i, p);
+			return false;
+		}
+	}
+	return true;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);
 

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 4/7] slub: fix error path bug in kmem_cache_alloc_bulk
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

The current kmem_cache/SLAB bulking API need to release all objects
in case the layer cannot satisfy the full request.

If __kmem_cache_alloc_bulk() fails, all allocated objects in array
should be freed, but, __kmem_cache_alloc_bulk() can't know
about objects allocated by this slub specific kmem_cache_alloc_bulk()
function.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 753f88bd8b40..d10de5a33c03 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2760,24 +2760,27 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			   void **p)
 {
 	struct kmem_cache_cpu *c;
+	int i;
 
 	/* Debugging fallback to generic bulk */
 	if (kmem_cache_debug(s))
 		return __kmem_cache_alloc_bulk(s, flags, size, p);
 
-	/* Drain objects in the per cpu slab */
+	/* Drain objects in the per cpu slab, while disabling local
+	 * IRQs, which protects against PREEMPT and interrupts
+	 * handlers invoking normal fastpath.
+	 */
 	local_irq_disable();
 	c = this_cpu_ptr(s->cpu_slab);
 
-	while (size) {
+	for (i = 0; i < size; i++) {
 		void *object = c->freelist;
 
 		if (!object)
 			break;
 
 		c->freelist = get_freepointer(s, object);
-		*p++ = object;
-		size--;
+		p[i] = object;
 
 		if (unlikely(flags & __GFP_ZERO))
 			memset(object, 0, s->object_size);
@@ -2785,7 +2788,15 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
 
-	return __kmem_cache_alloc_bulk(s, flags, size, p);
+	/* Fallback to single elem alloc */
+	for (; i < size; i++) {
+		void *x = p[i] = kmem_cache_alloc(s, flags);
+		if (unlikely(!x)) {
+			__kmem_cache_free_bulk(s, i, p);
+			return false;
+		}
+	}
+	return true;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 5/7] slub: kmem_cache_alloc_bulk() move clearing outside IRQ disabled section
  2015-06-15 15:51 ` Jesper Dangaard Brouer
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

Move clearing of objects outside IRQ disabled section,
to minimize time spend with local IRQs off.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d10de5a33c03..26f64005a347 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2781,13 +2781,18 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 
 		c->freelist = get_freepointer(s, object);
 		p[i] = object;
-
-		if (unlikely(flags & __GFP_ZERO))
-			memset(object, 0, s->object_size);
 	}
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
 
+	/* Clear memory outside IRQ disabled fastpath loop */
+	if (unlikely(flags & __GFP_ZERO)) {
+		int j;
+
+		for (j = 0; j < i; j++)
+			memset(p[j], 0, s->object_size);
+	}
+
 	/* Fallback to single elem alloc */
 	for (; i < size; i++) {
 		void *x = p[i] = kmem_cache_alloc(s, flags);

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 5/7] slub: kmem_cache_alloc_bulk() move clearing outside IRQ disabled section
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

Move clearing of objects outside IRQ disabled section,
to minimize time spend with local IRQs off.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d10de5a33c03..26f64005a347 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2781,13 +2781,18 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 
 		c->freelist = get_freepointer(s, object);
 		p[i] = object;
-
-		if (unlikely(flags & __GFP_ZERO))
-			memset(object, 0, s->object_size);
 	}
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
 
+	/* Clear memory outside IRQ disabled fastpath loop */
+	if (unlikely(flags & __GFP_ZERO)) {
+		int j;
+
+		for (j = 0; j < i; j++)
+			memset(p[j], 0, s->object_size);
+	}
+
 	/* Fallback to single elem alloc */
 	for (; i < size; i++) {
 		void *x = p[i] = kmem_cache_alloc(s, flags);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 6/7] slub: improve bulk alloc strategy
  2015-06-15 15:51 ` Jesper Dangaard Brouer
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

Call slowpath __slab_alloc() from within the bulk loop, as the
side-effect of this call likely repopulates c->freelist.

Choose to reenable local IRQs while calling slowpath.

Saving some optimizations for later.  E.g. it is possible to
extract parts of __slab_alloc() and avoid the unnecessary and
expensive (37 cycles) local_irq_{save,restore}.  For now, be
happy calling __slab_alloc() this lower icache impact of this
func and I don't have to worry about correctness.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 26f64005a347..98d0e6f73ec1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2776,8 +2776,23 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	for (i = 0; i < size; i++) {
 		void *object = c->freelist;
 
-		if (!object)
-			break;
+		if (unlikely(!object)) {
+			c->tid = next_tid(c->tid);
+			local_irq_enable();
+
+			/* Invoke slow path one time, then retry fastpath
+			 * as side-effect have updated c->freelist
+			 */
+			p[i] = __slab_alloc(s, flags, NUMA_NO_NODE,
+					    _RET_IP_, c);
+			if (unlikely(!p[i])) {
+				__kmem_cache_free_bulk(s, i, p);
+				return false;
+			}
+			local_irq_disable();
+			c = this_cpu_ptr(s->cpu_slab);
+			continue; /* goto for-loop */
+		}
 
 		c->freelist = get_freepointer(s, object);
 		p[i] = object;
@@ -2793,14 +2808,6 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			memset(p[j], 0, s->object_size);
 	}
 
-	/* Fallback to single elem alloc */
-	for (; i < size; i++) {
-		void *x = p[i] = kmem_cache_alloc(s, flags);
-		if (unlikely(!x)) {
-			__kmem_cache_free_bulk(s, i, p);
-			return false;
-		}
-	}
 	return true;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 6/7] slub: improve bulk alloc strategy
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

Call slowpath __slab_alloc() from within the bulk loop, as the
side-effect of this call likely repopulates c->freelist.

Choose to reenable local IRQs while calling slowpath.

Saving some optimizations for later.  E.g. it is possible to
extract parts of __slab_alloc() and avoid the unnecessary and
expensive (37 cycles) local_irq_{save,restore}.  For now, be
happy calling __slab_alloc() this lower icache impact of this
func and I don't have to worry about correctness.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 26f64005a347..98d0e6f73ec1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2776,8 +2776,23 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	for (i = 0; i < size; i++) {
 		void *object = c->freelist;
 
-		if (!object)
-			break;
+		if (unlikely(!object)) {
+			c->tid = next_tid(c->tid);
+			local_irq_enable();
+
+			/* Invoke slow path one time, then retry fastpath
+			 * as side-effect have updated c->freelist
+			 */
+			p[i] = __slab_alloc(s, flags, NUMA_NO_NODE,
+					    _RET_IP_, c);
+			if (unlikely(!p[i])) {
+				__kmem_cache_free_bulk(s, i, p);
+				return false;
+			}
+			local_irq_disable();
+			c = this_cpu_ptr(s->cpu_slab);
+			continue; /* goto for-loop */
+		}
 
 		c->freelist = get_freepointer(s, object);
 		p[i] = object;
@@ -2793,14 +2808,6 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			memset(p[j], 0, s->object_size);
 	}
 
-	/* Fallback to single elem alloc */
-	for (; i < size; i++) {
-		void *x = p[i] = kmem_cache_alloc(s, flags);
-		if (unlikely(!x)) {
-			__kmem_cache_free_bulk(s, i, p);
-			return false;
-		}
-	}
 	return true;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 7/7] slub: initial bulk free implementation
  2015-06-15 15:51 ` Jesper Dangaard Brouer
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
now both have bulk alloc and free implemented.

Play nice and reenable local IRQs while calling slowpath.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 98d0e6f73ec1..cc4f870677bb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
 
 void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 {
-	__kmem_cache_free_bulk(s, size, p);
+	struct kmem_cache_cpu *c;
+	struct page *page;
+	int i;
+
+	local_irq_disable();
+	c = this_cpu_ptr(s->cpu_slab);
+
+	for (i = 0; i < size; i++) {
+		void *object = p[i];
+
+		if (unlikely(!object))
+			continue; // HOW ABOUT BUG_ON()???
+
+		page = virt_to_head_page(object);
+		BUG_ON(s != page->slab_cache); /* Check if valid slab page */
+
+		if (c->page == page) {
+			/* Fastpath: local CPU free */
+			set_freepointer(s, object, c->freelist);
+			c->freelist = object;
+		} else {
+			c->tid = next_tid(c->tid);
+			local_irq_enable();
+			/* Slowpath: overhead locked cmpxchg_double_slab */
+			__slab_free(s, page, object, _RET_IP_);
+			local_irq_disable();
+			c = this_cpu_ptr(s->cpu_slab);
+		}
+	}
+	c->tid = next_tid(c->tid);
+	local_irq_enable();
 }
 EXPORT_SYMBOL(kmem_cache_free_bulk);
 

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 7/7] slub: initial bulk free implementation
@ 2015-06-15 15:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-15 15:52 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, Andrew Morton
  Cc: netdev, Alexander Duyck, Jesper Dangaard Brouer

This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
now both have bulk alloc and free implemented.

Play nice and reenable local IRQs while calling slowpath.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 98d0e6f73ec1..cc4f870677bb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
 
 void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 {
-	__kmem_cache_free_bulk(s, size, p);
+	struct kmem_cache_cpu *c;
+	struct page *page;
+	int i;
+
+	local_irq_disable();
+	c = this_cpu_ptr(s->cpu_slab);
+
+	for (i = 0; i < size; i++) {
+		void *object = p[i];
+
+		if (unlikely(!object))
+			continue; // HOW ABOUT BUG_ON()???
+
+		page = virt_to_head_page(object);
+		BUG_ON(s != page->slab_cache); /* Check if valid slab page */
+
+		if (c->page == page) {
+			/* Fastpath: local CPU free */
+			set_freepointer(s, object, c->freelist);
+			c->freelist = object;
+		} else {
+			c->tid = next_tid(c->tid);
+			local_irq_enable();
+			/* Slowpath: overhead locked cmpxchg_double_slab */
+			__slab_free(s, page, object, _RET_IP_);
+			local_irq_disable();
+			c = this_cpu_ptr(s->cpu_slab);
+		}
+	}
+	c->tid = next_tid(c->tid);
+	local_irq_enable();
 }
 EXPORT_SYMBOL(kmem_cache_free_bulk);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-15 15:52   ` Jesper Dangaard Brouer
  (?)
@ 2015-06-15 16:34   ` Christoph Lameter
  2015-06-16  8:04     ` Jesper Dangaard Brouer
  -1 siblings, 1 reply; 53+ messages in thread
From: Christoph Lameter @ 2015-06-15 16:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: linux-mm, Andrew Morton, netdev, Alexander Duyck

On Mon, 15 Jun 2015, Jesper Dangaard Brouer wrote:

> +	for (i = 0; i < size; i++) {
> +		void *object = p[i];
> +
> +		if (unlikely(!object))
> +			continue; // HOW ABOUT BUG_ON()???

Sure BUG_ON would be fitting here.

> +
> +		page = virt_to_head_page(object);
> +		BUG_ON(s != page->slab_cache); /* Check if valid slab page */

This is the check if the slab page belongs to the slab cache we are
interested in.

> +
> +		if (c->page == page) {
> +			/* Fastpath: local CPU free */
> +			set_freepointer(s, object, c->freelist);
> +			c->freelist = object;
> +		} else {
> +			c->tid = next_tid(c->tid);

tids are only useful for the fastpath. No need to fiddle around with them
for the slowpath.

> +			local_irq_enable();
> +			/* Slowpath: overhead locked cmpxchg_double_slab */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] slub: improve bulk alloc strategy
  2015-06-15 15:52   ` Jesper Dangaard Brouer
  (?)
@ 2015-06-15 16:36   ` Christoph Lameter
  -1 siblings, 0 replies; 53+ messages in thread
From: Christoph Lameter @ 2015-06-15 16:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: linux-mm, Andrew Morton, netdev, Alexander Duyck

On Mon, 15 Jun 2015, Jesper Dangaard Brouer wrote:

> -			break;
> +		if (unlikely(!object)) {
> +			c->tid = next_tid(c->tid);

tid increment is not needed here since the per cpu information is not
modified.

> +			local_irq_enable();
> +
> +			/* Invoke slow path one time, then retry fastpath
> +			 * as side-effect have updated c->freelist
> +			 */
> +			p[i] = __slab_alloc(s, flags, NUMA_NO_NODE,
> +					    _RET_IP_, c);
> +			if (unlikely(!p[i])) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing
  2015-06-15 15:51   ` Jesper Dangaard Brouer
@ 2015-06-15 16:45     ` Alexander Duyck
  -1 siblings, 0 replies; 53+ messages in thread
From: Alexander Duyck @ 2015-06-15 16:45 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, linux-mm, Christoph Lameter, Andrew Morton; +Cc: netdev

On 06/15/2015 08:51 AM, Jesper Dangaard Brouer wrote:
> From: Christoph Lameter <cl@linux.com>
>
> [NOTICE: Already in AKPM's quilt-queue]
>
> Add the basic infrastructure for alloc/free operations on pointer arrays.
> It includes a generic function in the common slab code that is used in
> this infrastructure patch to create the unoptimized functionality for slab
> bulk operations.
>
> Allocators can then provide optimized allocation functions for situations
> in which large numbers of objects are needed.  These optimization may
> avoid taking locks repeatedly and bypass metadata creation if all objects
> in slab pages can be used to provide the objects required.
>
> Allocators can extend the skeletons provided and add their own code to the
> bulk alloc and free functions.  They can keep the generic allocation and
> freeing and just fall back to those if optimizations would not work (like
> for example when debugging is on).
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>   include/linux/slab.h |   10 ++++++++++
>   mm/slab.c            |   13 +++++++++++++
>   mm/slab.h            |    9 +++++++++
>   mm/slab_common.c     |   23 +++++++++++++++++++++++
>   mm/slob.c            |   13 +++++++++++++
>   mm/slub.c            |   14 ++++++++++++++
>   6 files changed, 82 insertions(+)

So I can see the motivation behind bulk allocation, but I cannot see the 
motivation behind bulk freeing.  In the case of freeing the likelihood 
of the memory regions all belonging to the same page just isn't as high.

> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index ffd24c830151..5db59c950ef7 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -290,6 +290,16 @@ void *__kmalloc(size_t size, gfp_t flags);
>   void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags);
>   void kmem_cache_free(struct kmem_cache *, void *);
>   
> +/*
> + * Bulk allocation and freeing operations. These are accellerated in an
> + * allocator specific way to avoid taking locks repeatedly or building
> + * metadata structures unnecessarily.
> + *
> + * Note that interrupts must be enabled when calling these functions.
> + */
> +void kmem_cache_free_bulk(struct kmem_cache *, size_t, void **);
> +bool kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **);
> +
>   #ifdef CONFIG_NUMA
>   void *__kmalloc_node(size_t size, gfp_t flags, int node);
>   void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);

Instead of having the bulk allocation return true, why not just return 
the number of entries you were able to obtain beyond the first and cut 
out most of the complexity.  For example if you want to allocate 4 
entries, and you succeeded in allocating 4 entries you would return 3.  
If you wanted 1 entry and succeeded you would return zero, else you just 
return a negative value indicating you failed.

The general idea is to do a best effort allocation.  So if you only have 
3 entries available in the percpu freelist and you want 4 then maybe you 
should only grab 3 entries instead of trying to force more entries out 
then what is there by grabbing from multiple SLUB/SLAB caches.

Also I wouldn't use a size_t to track the number of entities requested.  
An int should be enough.  The size_t is confusing as that is normally 
what you would use to specify the size of the memory region, not the 
number of copies of it to request.

> diff --git a/mm/slab.c b/mm/slab.c
> index 7eb38dd1cefa..c799d7ed0e18 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3415,6 +3415,19 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
>   }
>   EXPORT_SYMBOL(kmem_cache_alloc);
>   
> +void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> +{
> +	__kmem_cache_free_bulk(s, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_free_bulk);
> +
> +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> +								void **p)
> +{
> +	return kmem_cache_alloc_bulk(s, flags, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> +
>   #ifdef CONFIG_TRACING
>   void *
>   kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
> diff --git a/mm/slab.h b/mm/slab.h
> index 4c3ac12dd644..6a427a74cca5 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -162,6 +162,15 @@ void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s);
>   ssize_t slabinfo_write(struct file *file, const char __user *buffer,
>   		       size_t count, loff_t *ppos);
>   
> +/*
> + * Generic implementation of bulk operations
> + * These are useful for situations in which the allocator cannot
> + * perform optimizations. In that case segments of the objecct listed
> + * may be allocated or freed using these operations.
> + */
> +void __kmem_cache_free_bulk(struct kmem_cache *, size_t, void **);
> +bool __kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **);
> +
>   #ifdef CONFIG_MEMCG_KMEM
>   /*
>    * Iterate over all memcg caches of the given root cache. The caller must hold
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 999bb3424d44..f8acc2bdb88b 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -105,6 +105,29 @@ static inline int kmem_cache_sanity_check(const char *name, size_t size)
>   }
>   #endif
>   
> +void __kmem_cache_free_bulk(struct kmem_cache *s, size_t nr, void **p)
> +{
> +	size_t i;
> +
> +	for (i = 0; i < nr; i++)
> +		kmem_cache_free(s, p[i]);
> +}
> +
> +bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
> +								void **p)
> +{
> +	size_t i;
> +
> +	for (i = 0; i < nr; i++) {
> +		void *x = p[i] = kmem_cache_alloc(s, flags);
> +		if (!x) {
> +			__kmem_cache_free_bulk(s, i, p);
> +			return false;
> +		}
> +	}
> +	return true;
> +}
> +
>   #ifdef CONFIG_MEMCG_KMEM
>   void slab_init_memcg_params(struct kmem_cache *s)
>   {

I don't really see the reason why you should be populating arrays. SLUB 
uses a linked list and I don't see this implemented for SLOB or SLAB so 
maybe you should look at making this into a linked list. Also as I 
stated in the other comment maybe you should not do bulk allocation if 
you don't support it in SLAB/SLOB and instead change this so that you 
return a count indicating that only 1 value was allocated in this pass.

> diff --git a/mm/slob.c b/mm/slob.c
> index 4765f65019c7..495df8e006ec 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -611,6 +611,19 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
>   }
>   EXPORT_SYMBOL(kmem_cache_free);
>   
> +void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> +{
> +	__kmem_cache_free_bulk(s, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_free_bulk);
> +
> +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> +								void **p)
> +{
> +	return kmem_cache_alloc_bulk(s, flags, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> +
>   int __kmem_cache_shutdown(struct kmem_cache *c)
>   {
>   	/* No way to check for remaining objects */
> diff --git a/mm/slub.c b/mm/slub.c
> index 54c0876b43d5..80f17403e503 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2750,6 +2750,20 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
>   }
>   EXPORT_SYMBOL(kmem_cache_free);
>   
> +void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> +{
> +	__kmem_cache_free_bulk(s, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_free_bulk);
> +
> +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> +								void **p)
> +{
> +	return kmem_cache_alloc_bulk(s, flags, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> +
> +
>   /*
>    * Object placement in a slab is made very easy because we always start at
>    * offset 0. If we tune the size of the object to the alignment then we can
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing
@ 2015-06-15 16:45     ` Alexander Duyck
  0 siblings, 0 replies; 53+ messages in thread
From: Alexander Duyck @ 2015-06-15 16:45 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, linux-mm, Christoph Lameter, Andrew Morton; +Cc: netdev

On 06/15/2015 08:51 AM, Jesper Dangaard Brouer wrote:
> From: Christoph Lameter <cl@linux.com>
>
> [NOTICE: Already in AKPM's quilt-queue]
>
> Add the basic infrastructure for alloc/free operations on pointer arrays.
> It includes a generic function in the common slab code that is used in
> this infrastructure patch to create the unoptimized functionality for slab
> bulk operations.
>
> Allocators can then provide optimized allocation functions for situations
> in which large numbers of objects are needed.  These optimization may
> avoid taking locks repeatedly and bypass metadata creation if all objects
> in slab pages can be used to provide the objects required.
>
> Allocators can extend the skeletons provided and add their own code to the
> bulk alloc and free functions.  They can keep the generic allocation and
> freeing and just fall back to those if optimizations would not work (like
> for example when debugging is on).
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>   include/linux/slab.h |   10 ++++++++++
>   mm/slab.c            |   13 +++++++++++++
>   mm/slab.h            |    9 +++++++++
>   mm/slab_common.c     |   23 +++++++++++++++++++++++
>   mm/slob.c            |   13 +++++++++++++
>   mm/slub.c            |   14 ++++++++++++++
>   6 files changed, 82 insertions(+)

So I can see the motivation behind bulk allocation, but I cannot see the 
motivation behind bulk freeing.  In the case of freeing the likelihood 
of the memory regions all belonging to the same page just isn't as high.

> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index ffd24c830151..5db59c950ef7 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -290,6 +290,16 @@ void *__kmalloc(size_t size, gfp_t flags);
>   void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags);
>   void kmem_cache_free(struct kmem_cache *, void *);
>   
> +/*
> + * Bulk allocation and freeing operations. These are accellerated in an
> + * allocator specific way to avoid taking locks repeatedly or building
> + * metadata structures unnecessarily.
> + *
> + * Note that interrupts must be enabled when calling these functions.
> + */
> +void kmem_cache_free_bulk(struct kmem_cache *, size_t, void **);
> +bool kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **);
> +
>   #ifdef CONFIG_NUMA
>   void *__kmalloc_node(size_t size, gfp_t flags, int node);
>   void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);

Instead of having the bulk allocation return true, why not just return 
the number of entries you were able to obtain beyond the first and cut 
out most of the complexity.  For example if you want to allocate 4 
entries, and you succeeded in allocating 4 entries you would return 3.  
If you wanted 1 entry and succeeded you would return zero, else you just 
return a negative value indicating you failed.

The general idea is to do a best effort allocation.  So if you only have 
3 entries available in the percpu freelist and you want 4 then maybe you 
should only grab 3 entries instead of trying to force more entries out 
then what is there by grabbing from multiple SLUB/SLAB caches.

Also I wouldn't use a size_t to track the number of entities requested.  
An int should be enough.  The size_t is confusing as that is normally 
what you would use to specify the size of the memory region, not the 
number of copies of it to request.

> diff --git a/mm/slab.c b/mm/slab.c
> index 7eb38dd1cefa..c799d7ed0e18 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3415,6 +3415,19 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
>   }
>   EXPORT_SYMBOL(kmem_cache_alloc);
>   
> +void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> +{
> +	__kmem_cache_free_bulk(s, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_free_bulk);
> +
> +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> +								void **p)
> +{
> +	return kmem_cache_alloc_bulk(s, flags, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> +
>   #ifdef CONFIG_TRACING
>   void *
>   kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
> diff --git a/mm/slab.h b/mm/slab.h
> index 4c3ac12dd644..6a427a74cca5 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -162,6 +162,15 @@ void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s);
>   ssize_t slabinfo_write(struct file *file, const char __user *buffer,
>   		       size_t count, loff_t *ppos);
>   
> +/*
> + * Generic implementation of bulk operations
> + * These are useful for situations in which the allocator cannot
> + * perform optimizations. In that case segments of the objecct listed
> + * may be allocated or freed using these operations.
> + */
> +void __kmem_cache_free_bulk(struct kmem_cache *, size_t, void **);
> +bool __kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **);
> +
>   #ifdef CONFIG_MEMCG_KMEM
>   /*
>    * Iterate over all memcg caches of the given root cache. The caller must hold
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 999bb3424d44..f8acc2bdb88b 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -105,6 +105,29 @@ static inline int kmem_cache_sanity_check(const char *name, size_t size)
>   }
>   #endif
>   
> +void __kmem_cache_free_bulk(struct kmem_cache *s, size_t nr, void **p)
> +{
> +	size_t i;
> +
> +	for (i = 0; i < nr; i++)
> +		kmem_cache_free(s, p[i]);
> +}
> +
> +bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
> +								void **p)
> +{
> +	size_t i;
> +
> +	for (i = 0; i < nr; i++) {
> +		void *x = p[i] = kmem_cache_alloc(s, flags);
> +		if (!x) {
> +			__kmem_cache_free_bulk(s, i, p);
> +			return false;
> +		}
> +	}
> +	return true;
> +}
> +
>   #ifdef CONFIG_MEMCG_KMEM
>   void slab_init_memcg_params(struct kmem_cache *s)
>   {

I don't really see the reason why you should be populating arrays. SLUB 
uses a linked list and I don't see this implemented for SLOB or SLAB so 
maybe you should look at making this into a linked list. Also as I 
stated in the other comment maybe you should not do bulk allocation if 
you don't support it in SLAB/SLOB and instead change this so that you 
return a count indicating that only 1 value was allocated in this pass.

> diff --git a/mm/slob.c b/mm/slob.c
> index 4765f65019c7..495df8e006ec 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -611,6 +611,19 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
>   }
>   EXPORT_SYMBOL(kmem_cache_free);
>   
> +void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> +{
> +	__kmem_cache_free_bulk(s, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_free_bulk);
> +
> +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> +								void **p)
> +{
> +	return kmem_cache_alloc_bulk(s, flags, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> +
>   int __kmem_cache_shutdown(struct kmem_cache *c)
>   {
>   	/* No way to check for remaining objects */
> diff --git a/mm/slub.c b/mm/slub.c
> index 54c0876b43d5..80f17403e503 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2750,6 +2750,20 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
>   }
>   EXPORT_SYMBOL(kmem_cache_free);
>   
> +void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> +{
> +	__kmem_cache_free_bulk(s, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_free_bulk);
> +
> +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> +								void **p)
> +{
> +	return kmem_cache_alloc_bulk(s, flags, size, p);
> +}
> +EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> +
> +
>   /*
>    * Object placement in a slab is made very easy because we always start at
>    * offset 0. If we tune the size of the object to the alignment then we can
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing
  2015-06-15 16:45     ` Alexander Duyck
  (?)
@ 2015-06-15 16:50     ` Christoph Lameter
  -1 siblings, 0 replies; 53+ messages in thread
From: Christoph Lameter @ 2015-06-15 16:50 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Jesper Dangaard Brouer, linux-mm, Andrew Morton, netdev

On Mon, 15 Jun 2015, Alexander Duyck wrote:

> So I can see the motivation behind bulk allocation, but I cannot see the
> motivation behind bulk freeing.  In the case of freeing the likelihood of the
> memory regions all belonging to the same page just isn't as high.

The likelyhood is high if the object are allocated in batch as well. In
that case SLUB ensures that all objects from the same page are first
allocated.

> I don't really see the reason why you should be populating arrays. SLUB uses a
> linked list and I don't see this implemented for SLOB or SLAB so maybe you
> should look at making this into a linked list. Also as I stated in the other
> comment maybe you should not do bulk allocation if you don't support it in
> SLAB/SLOB and instead change this so that you return a count indicating that
> only 1 value was allocated in this pass.

It is extremely easy to just take the linked list off a page or a per cpu
structure. Basically we would have constant cycle count for taking N
objects available in a slab page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-15 15:52   ` Jesper Dangaard Brouer
  (?)
  (?)
@ 2015-06-15 17:04   ` Alexander Duyck
  -1 siblings, 0 replies; 53+ messages in thread
From: Alexander Duyck @ 2015-06-15 17:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, linux-mm, Christoph Lameter, Andrew Morton; +Cc: netdev

On 06/15/2015 08:52 AM, Jesper Dangaard Brouer wrote:
> This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
> now both have bulk alloc and free implemented.
>
> Play nice and reenable local IRQs while calling slowpath.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   mm/slub.c |   32 +++++++++++++++++++++++++++++++-
>   1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 98d0e6f73ec1..cc4f870677bb 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
>   
>   void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
>   {
> -	__kmem_cache_free_bulk(s, size, p);
> +	struct kmem_cache_cpu *c;
> +	struct page *page;
> +	int i;
> +
> +	local_irq_disable();
> +	c = this_cpu_ptr(s->cpu_slab);
> +
> +	for (i = 0; i < size; i++) {
> +		void *object = p[i];
> +
> +		if (unlikely(!object))
> +			continue; // HOW ABOUT BUG_ON()???
> +
> +		page = virt_to_head_page(object);
> +		BUG_ON(s != page->slab_cache); /* Check if valid slab page */
> +
> +		if (c->page == page) {
> +			/* Fastpath: local CPU free */
> +			set_freepointer(s, object, c->freelist);
> +			c->freelist = object;
> +		} else {
> +			c->tid = next_tid(c->tid);
> +			local_irq_enable();
> +			/* Slowpath: overhead locked cmpxchg_double_slab */
> +			__slab_free(s, page, object, _RET_IP_);
> +			local_irq_disable();
> +			c = this_cpu_ptr(s->cpu_slab);
> +		}
> +	}
> +	c->tid = next_tid(c->tid);
> +	local_irq_enable();
>   }
>   EXPORT_SYMBOL(kmem_cache_free_bulk);

So if the idea is to batch the freeing maybe you should look at doing 
the freeing in two passes.  The first would be to free all those buffers 
that share their page with the percpu slab.  Then you could just free 
everything else in the second pass after you have re-enabled IRQs.

- Alex

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab
  2015-06-15 15:52   ` Jesper Dangaard Brouer
@ 2015-06-16  7:21     ` Joonsoo Kim
  -1 siblings, 0 replies; 53+ messages in thread
From: Joonsoo Kim @ 2015-06-16  7:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, Andrew Morton, netdev, Alexander Duyck

On Mon, Jun 15, 2015 at 05:52:07PM +0200, Jesper Dangaard Brouer wrote:
> From: Christoph Lameter <cl@linux.com>
> 
> [NOTICE: Already in AKPM's quilt-queue]
> 
> First piece: acceleration of retrieval of per cpu objects
> 
> If we are allocating lots of objects then it is advantageous to disable
> interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
> faster.
> 
> Note that we cannot do the fast operation if debugging is enabled, because
> we would have to add extra code to do all the debugging checks.  And it
> would not be fast anyway.
> 
> Note also that the requirement of having interrupts disabled
> avoids having to do processor flag operations.
> 
> Allocate as many objects as possible in the fast way and then fall back to
> the generic implementation for the rest of the objects.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>  mm/slub.c |   27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 80f17403e503..d18f8e195ac4 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
>  bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>  								void **p)
>  {
> -	return kmem_cache_alloc_bulk(s, flags, size, p);
> +	if (!kmem_cache_debug(s)) {
> +		struct kmem_cache_cpu *c;
> +
> +		/* Drain objects in the per cpu slab */
> +		local_irq_disable();
> +		c = this_cpu_ptr(s->cpu_slab);
> +
> +		while (size) {
> +			void *object = c->freelist;
> +
> +			if (!object)
> +				break;
> +
> +			c->freelist = get_freepointer(s, object);
> +			*p++ = object;
> +			size--;
> +
> +			if (unlikely(flags & __GFP_ZERO))
> +				memset(object, 0, s->object_size);
> +		}
> +		c->tid = next_tid(c->tid);
> +
> +		local_irq_enable();
> +	}
> +
> +	return __kmem_cache_alloc_bulk(s, flags, size, p);
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_bulk);

Now I found that we need to call slab_pre_alloc_hook() before any operation
on kmem_cache to support kmemcg accounting. And, we need to call
slab_post_alloc_hook() on every allocated objects to support many
debugging features like as kasan and kmemleak

Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab
@ 2015-06-16  7:21     ` Joonsoo Kim
  0 siblings, 0 replies; 53+ messages in thread
From: Joonsoo Kim @ 2015-06-16  7:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, Andrew Morton, netdev, Alexander Duyck

On Mon, Jun 15, 2015 at 05:52:07PM +0200, Jesper Dangaard Brouer wrote:
> From: Christoph Lameter <cl@linux.com>
> 
> [NOTICE: Already in AKPM's quilt-queue]
> 
> First piece: acceleration of retrieval of per cpu objects
> 
> If we are allocating lots of objects then it is advantageous to disable
> interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
> faster.
> 
> Note that we cannot do the fast operation if debugging is enabled, because
> we would have to add extra code to do all the debugging checks.  And it
> would not be fast anyway.
> 
> Note also that the requirement of having interrupts disabled
> avoids having to do processor flag operations.
> 
> Allocate as many objects as possible in the fast way and then fall back to
> the generic implementation for the rest of the objects.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>  mm/slub.c |   27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 80f17403e503..d18f8e195ac4 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
>  bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>  								void **p)
>  {
> -	return kmem_cache_alloc_bulk(s, flags, size, p);
> +	if (!kmem_cache_debug(s)) {
> +		struct kmem_cache_cpu *c;
> +
> +		/* Drain objects in the per cpu slab */
> +		local_irq_disable();
> +		c = this_cpu_ptr(s->cpu_slab);
> +
> +		while (size) {
> +			void *object = c->freelist;
> +
> +			if (!object)
> +				break;
> +
> +			c->freelist = get_freepointer(s, object);
> +			*p++ = object;
> +			size--;
> +
> +			if (unlikely(flags & __GFP_ZERO))
> +				memset(object, 0, s->object_size);
> +		}
> +		c->tid = next_tid(c->tid);
> +
> +		local_irq_enable();
> +	}
> +
> +	return __kmem_cache_alloc_bulk(s, flags, size, p);
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_bulk);

Now I found that we need to call slab_pre_alloc_hook() before any operation
on kmem_cache to support kmemcg accounting. And, we need to call
slab_post_alloc_hook() on every allocated objects to support many
debugging features like as kasan and kmemleak

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-15 15:52   ` Jesper Dangaard Brouer
@ 2015-06-16  7:23     ` Joonsoo Kim
  -1 siblings, 0 replies; 53+ messages in thread
From: Joonsoo Kim @ 2015-06-16  7:23 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, Andrew Morton, netdev, Alexander Duyck

On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote:
> This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
> now both have bulk alloc and free implemented.
> 
> Play nice and reenable local IRQs while calling slowpath.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  mm/slub.c |   32 +++++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 98d0e6f73ec1..cc4f870677bb 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
>  
>  void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
>  {
> -	__kmem_cache_free_bulk(s, size, p);
> +	struct kmem_cache_cpu *c;
> +	struct page *page;
> +	int i;
> +
> +	local_irq_disable();
> +	c = this_cpu_ptr(s->cpu_slab);
> +
> +	for (i = 0; i < size; i++) {
> +		void *object = p[i];
> +
> +		if (unlikely(!object))
> +			continue; // HOW ABOUT BUG_ON()???
> +
> +		page = virt_to_head_page(object);
> +		BUG_ON(s != page->slab_cache); /* Check if valid slab page */

You need to use cache_from_objt() to support kmemcg accounting.
And, slab_free_hook() should be called before free.

Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
@ 2015-06-16  7:23     ` Joonsoo Kim
  0 siblings, 0 replies; 53+ messages in thread
From: Joonsoo Kim @ 2015-06-16  7:23 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, Andrew Morton, netdev, Alexander Duyck

On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote:
> This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
> now both have bulk alloc and free implemented.
> 
> Play nice and reenable local IRQs while calling slowpath.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  mm/slub.c |   32 +++++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 98d0e6f73ec1..cc4f870677bb 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
>  
>  void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
>  {
> -	__kmem_cache_free_bulk(s, size, p);
> +	struct kmem_cache_cpu *c;
> +	struct page *page;
> +	int i;
> +
> +	local_irq_disable();
> +	c = this_cpu_ptr(s->cpu_slab);
> +
> +	for (i = 0; i < size; i++) {
> +		void *object = p[i];
> +
> +		if (unlikely(!object))
> +			continue; // HOW ABOUT BUG_ON()???
> +
> +		page = virt_to_head_page(object);
> +		BUG_ON(s != page->slab_cache); /* Check if valid slab page */

You need to use cache_from_objt() to support kmemcg accounting.
And, slab_free_hook() should be called before free.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-15 15:52   ` Jesper Dangaard Brouer
                     ` (3 preceding siblings ...)
  (?)
@ 2015-06-16  7:28   ` Joonsoo Kim
  2015-06-16  8:21       ` Jesper Dangaard Brouer
  -1 siblings, 1 reply; 53+ messages in thread
From: Joonsoo Kim @ 2015-06-16  7:28 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, Andrew Morton, netdev, Alexander Duyck

On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote:
> This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
> now both have bulk alloc and free implemented.
> 
> Play nice and reenable local IRQs while calling slowpath.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  mm/slub.c |   32 +++++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 98d0e6f73ec1..cc4f870677bb 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
>  
>  void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
>  {
> -	__kmem_cache_free_bulk(s, size, p);
> +	struct kmem_cache_cpu *c;
> +	struct page *page;
> +	int i;
> +
> +	local_irq_disable();
> +	c = this_cpu_ptr(s->cpu_slab);
> +
> +	for (i = 0; i < size; i++) {
> +		void *object = p[i];
> +
> +		if (unlikely(!object))
> +			continue; // HOW ABOUT BUG_ON()???
> +
> +		page = virt_to_head_page(object);
> +		BUG_ON(s != page->slab_cache); /* Check if valid slab page */
> +
> +		if (c->page == page) {
> +			/* Fastpath: local CPU free */
> +			set_freepointer(s, object, c->freelist);
> +			c->freelist = object;
> +		} else {
> +			c->tid = next_tid(c->tid);
> +			local_irq_enable();
> +			/* Slowpath: overhead locked cmpxchg_double_slab */
> +			__slab_free(s, page, object, _RET_IP_);
> +			local_irq_disable();
> +			c = this_cpu_ptr(s->cpu_slab);

SLUB free path doesn't need to irq management in many cases although
it uses cmpxchg_doule_slab. Is this really better than just calling
__kmem_cache_free_bulk()?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-15 16:34   ` Christoph Lameter
@ 2015-06-16  8:04     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-16  8:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, Andrew Morton, netdev, Alexander Duyck, brouer

On Mon, 15 Jun 2015 11:34:44 -0500 (CDT)
Christoph Lameter <cl@linux.com> wrote:

> On Mon, 15 Jun 2015, Jesper Dangaard Brouer wrote:
> 
> > +	for (i = 0; i < size; i++) {
> > +		void *object = p[i];
> > +
> > +		if (unlikely(!object))
> > +			continue; // HOW ABOUT BUG_ON()???
> 
> Sure BUG_ON would be fitting here.

Okay, will do in V2.

> > +
> > +		page = virt_to_head_page(object);
> > +		BUG_ON(s != page->slab_cache); /* Check if valid slab page */
> 
> This is the check if the slab page belongs to the slab cache we are
> interested in.

Is this appropriate to keep on this fastpath? (I copied the check from
one of your earlier patches)

> > +
> > +		if (c->page == page) {
> > +			/* Fastpath: local CPU free */
> > +			set_freepointer(s, object, c->freelist);
> > +			c->freelist = object;
> > +		} else {
> > +			c->tid = next_tid(c->tid);
> 
> tids are only useful for the fastpath. No need to fiddle around with them
> for the slowpath.

Okay, understood.

> > +			local_irq_enable();
> > +			/* Slowpath: overhead locked cmpxchg_double_slab */


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16  7:28   ` Joonsoo Kim
@ 2015-06-16  8:21       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-16  8:21 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Christoph Lameter, Andrew Morton, netdev,
	Alexander Duyck, brouer


On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:

> Is this really better than just calling __kmem_cache_free_bulk()?

Yes, as can be seen by cover-letter, but my cover-letter does not seem
to have reached mm-list.

Measurements for the entire patchset:

Bulk - Fallback bulking           - fastpath-bulking
   1 -  47 cycles(tsc) 11.921 ns  -  45 cycles(tsc) 11.461 ns   improved  4.3%
   2 -  46 cycles(tsc) 11.649 ns  -  28 cycles(tsc)  7.023 ns   improved 39.1%
   3 -  46 cycles(tsc) 11.550 ns  -  22 cycles(tsc)  5.671 ns   improved 52.2%
   4 -  45 cycles(tsc) 11.398 ns  -  19 cycles(tsc)  4.967 ns   improved 57.8%
   8 -  45 cycles(tsc) 11.303 ns  -  17 cycles(tsc)  4.298 ns   improved 62.2%
  16 -  44 cycles(tsc) 11.221 ns  -  17 cycles(tsc)  4.423 ns   improved 61.4%
  30 -  75 cycles(tsc) 18.894 ns  -  57 cycles(tsc) 14.497 ns   improved 24.0%
  32 -  73 cycles(tsc) 18.491 ns  -  56 cycles(tsc) 14.227 ns   improved 23.3%
  34 -  75 cycles(tsc) 18.962 ns  -  58 cycles(tsc) 14.638 ns   improved 22.7%
  48 -  80 cycles(tsc) 20.049 ns  -  64 cycles(tsc) 16.247 ns   improved 20.0%
  64 -  87 cycles(tsc) 21.929 ns  -  74 cycles(tsc) 18.598 ns   improved 14.9%
 128 -  98 cycles(tsc) 24.511 ns  -  89 cycles(tsc) 22.295 ns   improved  9.2%
 158 - 101 cycles(tsc) 25.389 ns  -  93 cycles(tsc) 23.390 ns   improved  7.9%
 250 - 104 cycles(tsc) 26.170 ns  - 100 cycles(tsc) 25.112 ns   improved  3.8%

I'll do a compare against the previous patch, and post the results.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
@ 2015-06-16  8:21       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-16  8:21 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Christoph Lameter, Andrew Morton, netdev,
	Alexander Duyck, brouer


On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:

> Is this really better than just calling __kmem_cache_free_bulk()?

Yes, as can be seen by cover-letter, but my cover-letter does not seem
to have reached mm-list.

Measurements for the entire patchset:

Bulk - Fallback bulking           - fastpath-bulking
   1 -  47 cycles(tsc) 11.921 ns  -  45 cycles(tsc) 11.461 ns   improved  4.3%
   2 -  46 cycles(tsc) 11.649 ns  -  28 cycles(tsc)  7.023 ns   improved 39.1%
   3 -  46 cycles(tsc) 11.550 ns  -  22 cycles(tsc)  5.671 ns   improved 52.2%
   4 -  45 cycles(tsc) 11.398 ns  -  19 cycles(tsc)  4.967 ns   improved 57.8%
   8 -  45 cycles(tsc) 11.303 ns  -  17 cycles(tsc)  4.298 ns   improved 62.2%
  16 -  44 cycles(tsc) 11.221 ns  -  17 cycles(tsc)  4.423 ns   improved 61.4%
  30 -  75 cycles(tsc) 18.894 ns  -  57 cycles(tsc) 14.497 ns   improved 24.0%
  32 -  73 cycles(tsc) 18.491 ns  -  56 cycles(tsc) 14.227 ns   improved 23.3%
  34 -  75 cycles(tsc) 18.962 ns  -  58 cycles(tsc) 14.638 ns   improved 22.7%
  48 -  80 cycles(tsc) 20.049 ns  -  64 cycles(tsc) 16.247 ns   improved 20.0%
  64 -  87 cycles(tsc) 21.929 ns  -  74 cycles(tsc) 18.598 ns   improved 14.9%
 128 -  98 cycles(tsc) 24.511 ns  -  89 cycles(tsc) 22.295 ns   improved  9.2%
 158 - 101 cycles(tsc) 25.389 ns  -  93 cycles(tsc) 23.390 ns   improved  7.9%
 250 - 104 cycles(tsc) 26.170 ns  - 100 cycles(tsc) 25.112 ns   improved  3.8%

I'll do a compare against the previous patch, and post the results.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16  8:21       ` Jesper Dangaard Brouer
  (?)
@ 2015-06-16  8:57       ` Jesper Dangaard Brouer
  2015-06-16 12:05         ` Joonsoo Kim
  -1 siblings, 1 reply; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-16  8:57 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Christoph Lameter, Andrew Morton, netdev,
	Alexander Duyck, brouer

On Tue, 16 Jun 2015 10:21:10 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> 
> On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> 
> > Is this really better than just calling __kmem_cache_free_bulk()?
> 
> Yes, as can be seen by cover-letter, but my cover-letter does not seem
> to have reached mm-list.
> 
> Measurements for the entire patchset:
> 
> Bulk - Fallback bulking           - fastpath-bulking
>    1 -  47 cycles(tsc) 11.921 ns  -  45 cycles(tsc) 11.461 ns   improved  4.3%
>    2 -  46 cycles(tsc) 11.649 ns  -  28 cycles(tsc)  7.023 ns   improved 39.1%
>    3 -  46 cycles(tsc) 11.550 ns  -  22 cycles(tsc)  5.671 ns   improved 52.2%
>    4 -  45 cycles(tsc) 11.398 ns  -  19 cycles(tsc)  4.967 ns   improved 57.8%
>    8 -  45 cycles(tsc) 11.303 ns  -  17 cycles(tsc)  4.298 ns   improved 62.2%
>   16 -  44 cycles(tsc) 11.221 ns  -  17 cycles(tsc)  4.423 ns   improved 61.4%
>   30 -  75 cycles(tsc) 18.894 ns  -  57 cycles(tsc) 14.497 ns   improved 24.0%
>   32 -  73 cycles(tsc) 18.491 ns  -  56 cycles(tsc) 14.227 ns   improved 23.3%
>   34 -  75 cycles(tsc) 18.962 ns  -  58 cycles(tsc) 14.638 ns   improved 22.7%
>   48 -  80 cycles(tsc) 20.049 ns  -  64 cycles(tsc) 16.247 ns   improved 20.0%
>   64 -  87 cycles(tsc) 21.929 ns  -  74 cycles(tsc) 18.598 ns   improved 14.9%
>  128 -  98 cycles(tsc) 24.511 ns  -  89 cycles(tsc) 22.295 ns   improved  9.2%
>  158 - 101 cycles(tsc) 25.389 ns  -  93 cycles(tsc) 23.390 ns   improved  7.9%
>  250 - 104 cycles(tsc) 26.170 ns  - 100 cycles(tsc) 25.112 ns   improved  3.8%
> 
> I'll do a compare against the previous patch, and post the results.

Compare against previous patch:

Run:   previous-patch            - this patch
  1 -   49 cycles(tsc) 12.378 ns -  43 cycles(tsc) 10.775 ns  improved 12.2%
  2 -   37 cycles(tsc)  9.297 ns -  26 cycles(tsc)  6.652 ns  improved 29.7%
  3 -   33 cycles(tsc)  8.348 ns -  21 cycles(tsc)  5.347 ns  improved 36.4%
  4 -   31 cycles(tsc)  7.930 ns -  18 cycles(tsc)  4.669 ns  improved 41.9%
  8 -   30 cycles(tsc)  7.693 ns -  17 cycles(tsc)  4.404 ns  improved 43.3%
 16 -   32 cycles(tsc)  8.059 ns -  17 cycles(tsc)  4.493 ns  improved 46.9%
 30 -   65 cycles(tsc) 16.345 ns -  59 cycles(tsc) 14.858 ns  improved  9.2%
 32 -   64 cycles(tsc) 16.170 ns -  56 cycles(tsc) 14.074 ns  improved 12.5%
 34 -   66 cycles(tsc) 16.645 ns -  55 cycles(tsc) 13.882 ns  improved 16.7%
 48 -   78 cycles(tsc) 19.581 ns -  65 cycles(tsc) 16.266 ns  improved 16.7%
 64 -   81 cycles(tsc) 20.428 ns -  77 cycles(tsc) 19.432 ns  improved  4.9%
128 -   92 cycles(tsc) 23.030 ns -  78 cycles(tsc) 19.650 ns  improved 15.2%
158 -   94 cycles(tsc) 23.581 ns -  83 cycles(tsc) 20.953 ns  improved 11.7%
250 -   96 cycles(tsc) 24.175 ns -  93 cycles(tsc) 23.444 ns  improved  3.1%

This mostly amortize the less-heavy none-locked cmpxchg_double used on
fastpath.  As I demonstrated earlier this fastpath cmpxchg is approx 38%
of the fastpath cost.

Thus, yes, doing bulk free this way is faster ;-)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16  7:23     ` Joonsoo Kim
  (?)
@ 2015-06-16  9:20     ` Jesper Dangaard Brouer
  2015-06-16 12:00         ` Joonsoo Kim
  -1 siblings, 1 reply; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-16  9:20 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Christoph Lameter, Andrew Morton, netdev,
	Alexander Duyck, brouer

On Tue, 16 Jun 2015 16:23:28 +0900
Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:

> On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote:
> > This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
> > now both have bulk alloc and free implemented.
> > 
> > Play nice and reenable local IRQs while calling slowpath.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  mm/slub.c |   32 +++++++++++++++++++++++++++++++-
> >  1 file changed, 31 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 98d0e6f73ec1..cc4f870677bb 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
> >  
> >  void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> >  {
> > -	__kmem_cache_free_bulk(s, size, p);
> > +	struct kmem_cache_cpu *c;
> > +	struct page *page;
> > +	int i;
> > +
> > +	local_irq_disable();
> > +	c = this_cpu_ptr(s->cpu_slab);
> > +
> > +	for (i = 0; i < size; i++) {
> > +		void *object = p[i];
> > +
> > +		if (unlikely(!object))
> > +			continue; // HOW ABOUT BUG_ON()???
> > +
> > +		page = virt_to_head_page(object);
> > +		BUG_ON(s != page->slab_cache); /* Check if valid slab page */
> 
> You need to use cache_from_objt() to support kmemcg accounting.
> And, slab_free_hook() should be called before free.

Okay, but Christoph choose to not support kmem_cache_debug() in patch2/7.

Should we/I try to add kmem cache debugging support?

If adding these, then I would also need to add those on alloc path...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16  9:20     ` Jesper Dangaard Brouer
@ 2015-06-16 12:00         ` Joonsoo Kim
  0 siblings, 0 replies; 53+ messages in thread
From: Joonsoo Kim @ 2015-06-16 12:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Joonsoo Kim, Linux Memory Management List, Christoph Lameter,
	Andrew Morton, Linux-Netdev, Alexander Duyck

2015-06-16 18:20 GMT+09:00 Jesper Dangaard Brouer <brouer@redhat.com>:
> On Tue, 16 Jun 2015 16:23:28 +0900
> Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>
>> On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote:
>> > This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
>> > now both have bulk alloc and free implemented.
>> >
>> > Play nice and reenable local IRQs while calling slowpath.
>> >
>> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> > ---
>> >  mm/slub.c |   32 +++++++++++++++++++++++++++++++-
>> >  1 file changed, 31 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/mm/slub.c b/mm/slub.c
>> > index 98d0e6f73ec1..cc4f870677bb 100644
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
>> >
>> >  void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
>> >  {
>> > -   __kmem_cache_free_bulk(s, size, p);
>> > +   struct kmem_cache_cpu *c;
>> > +   struct page *page;
>> > +   int i;
>> > +
>> > +   local_irq_disable();
>> > +   c = this_cpu_ptr(s->cpu_slab);
>> > +
>> > +   for (i = 0; i < size; i++) {
>> > +           void *object = p[i];
>> > +
>> > +           if (unlikely(!object))
>> > +                   continue; // HOW ABOUT BUG_ON()???
>> > +
>> > +           page = virt_to_head_page(object);
>> > +           BUG_ON(s != page->slab_cache); /* Check if valid slab page */
>>
>> You need to use cache_from_objt() to support kmemcg accounting.
>> And, slab_free_hook() should be called before free.
>
> Okay, but Christoph choose to not support kmem_cache_debug() in patch2/7.
>
> Should we/I try to add kmem cache debugging support?

kmem_cache_debug() is the check for slab internal debugging feature.
slab_free_hook() and others mentioned from me are also related to external
debugging features like as kasan and kmemleak. So, even if
debugged kmem_cache isn't supported by bulk API, external debugging
feature should be supported.

> If adding these, then I would also need to add those on alloc path...

Yes, please.

Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
@ 2015-06-16 12:00         ` Joonsoo Kim
  0 siblings, 0 replies; 53+ messages in thread
From: Joonsoo Kim @ 2015-06-16 12:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Joonsoo Kim, Linux Memory Management List, Christoph Lameter,
	Andrew Morton, Linux-Netdev, Alexander Duyck

2015-06-16 18:20 GMT+09:00 Jesper Dangaard Brouer <brouer@redhat.com>:
> On Tue, 16 Jun 2015 16:23:28 +0900
> Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>
>> On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote:
>> > This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
>> > now both have bulk alloc and free implemented.
>> >
>> > Play nice and reenable local IRQs while calling slowpath.
>> >
>> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> > ---
>> >  mm/slub.c |   32 +++++++++++++++++++++++++++++++-
>> >  1 file changed, 31 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/mm/slub.c b/mm/slub.c
>> > index 98d0e6f73ec1..cc4f870677bb 100644
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
>> >
>> >  void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
>> >  {
>> > -   __kmem_cache_free_bulk(s, size, p);
>> > +   struct kmem_cache_cpu *c;
>> > +   struct page *page;
>> > +   int i;
>> > +
>> > +   local_irq_disable();
>> > +   c = this_cpu_ptr(s->cpu_slab);
>> > +
>> > +   for (i = 0; i < size; i++) {
>> > +           void *object = p[i];
>> > +
>> > +           if (unlikely(!object))
>> > +                   continue; // HOW ABOUT BUG_ON()???
>> > +
>> > +           page = virt_to_head_page(object);
>> > +           BUG_ON(s != page->slab_cache); /* Check if valid slab page */
>>
>> You need to use cache_from_objt() to support kmemcg accounting.
>> And, slab_free_hook() should be called before free.
>
> Okay, but Christoph choose to not support kmem_cache_debug() in patch2/7.
>
> Should we/I try to add kmem cache debugging support?

kmem_cache_debug() is the check for slab internal debugging feature.
slab_free_hook() and others mentioned from me are also related to external
debugging features like as kasan and kmemleak. So, even if
debugged kmem_cache isn't supported by bulk API, external debugging
feature should be supported.

> If adding these, then I would also need to add those on alloc path...

Yes, please.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16  8:57       ` Jesper Dangaard Brouer
@ 2015-06-16 12:05         ` Joonsoo Kim
  2015-06-16 15:10           ` Christoph Lameter
  0 siblings, 1 reply; 53+ messages in thread
From: Joonsoo Kim @ 2015-06-16 12:05 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Joonsoo Kim, Linux Memory Management List, Christoph Lameter,
	Andrew Morton, Linux-Netdev, Alexander Duyck

2015-06-16 17:57 GMT+09:00 Jesper Dangaard Brouer <brouer@redhat.com>:
> On Tue, 16 Jun 2015 10:21:10 +0200
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
>>
>> On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>>
>> > Is this really better than just calling __kmem_cache_free_bulk()?
>>
>> Yes, as can be seen by cover-letter, but my cover-letter does not seem
>> to have reached mm-list.
>>
>> Measurements for the entire patchset:
>>
>> Bulk - Fallback bulking           - fastpath-bulking
>>    1 -  47 cycles(tsc) 11.921 ns  -  45 cycles(tsc) 11.461 ns   improved  4.3%
>>    2 -  46 cycles(tsc) 11.649 ns  -  28 cycles(tsc)  7.023 ns   improved 39.1%
>>    3 -  46 cycles(tsc) 11.550 ns  -  22 cycles(tsc)  5.671 ns   improved 52.2%
>>    4 -  45 cycles(tsc) 11.398 ns  -  19 cycles(tsc)  4.967 ns   improved 57.8%
>>    8 -  45 cycles(tsc) 11.303 ns  -  17 cycles(tsc)  4.298 ns   improved 62.2%
>>   16 -  44 cycles(tsc) 11.221 ns  -  17 cycles(tsc)  4.423 ns   improved 61.4%
>>   30 -  75 cycles(tsc) 18.894 ns  -  57 cycles(tsc) 14.497 ns   improved 24.0%
>>   32 -  73 cycles(tsc) 18.491 ns  -  56 cycles(tsc) 14.227 ns   improved 23.3%
>>   34 -  75 cycles(tsc) 18.962 ns  -  58 cycles(tsc) 14.638 ns   improved 22.7%
>>   48 -  80 cycles(tsc) 20.049 ns  -  64 cycles(tsc) 16.247 ns   improved 20.0%
>>   64 -  87 cycles(tsc) 21.929 ns  -  74 cycles(tsc) 18.598 ns   improved 14.9%
>>  128 -  98 cycles(tsc) 24.511 ns  -  89 cycles(tsc) 22.295 ns   improved  9.2%
>>  158 - 101 cycles(tsc) 25.389 ns  -  93 cycles(tsc) 23.390 ns   improved  7.9%
>>  250 - 104 cycles(tsc) 26.170 ns  - 100 cycles(tsc) 25.112 ns   improved  3.8%
>>
>> I'll do a compare against the previous patch, and post the results.
>
> Compare against previous patch:
>
> Run:   previous-patch            - this patch
>   1 -   49 cycles(tsc) 12.378 ns -  43 cycles(tsc) 10.775 ns  improved 12.2%
>   2 -   37 cycles(tsc)  9.297 ns -  26 cycles(tsc)  6.652 ns  improved 29.7%
>   3 -   33 cycles(tsc)  8.348 ns -  21 cycles(tsc)  5.347 ns  improved 36.4%
>   4 -   31 cycles(tsc)  7.930 ns -  18 cycles(tsc)  4.669 ns  improved 41.9%
>   8 -   30 cycles(tsc)  7.693 ns -  17 cycles(tsc)  4.404 ns  improved 43.3%
>  16 -   32 cycles(tsc)  8.059 ns -  17 cycles(tsc)  4.493 ns  improved 46.9%

So, in your test, most of objects may come from one or two slabs and your
algorithm is well optimized for this case. But, is this workload normal case?
If most of objects comes from many different slabs, bulk free API does
enabling/disabling interrupt very much so I guess it work worse than
just calling __kmem_cache_free_bulk(). Could you test this case?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16 12:00         ` Joonsoo Kim
  (?)
@ 2015-06-16 13:58         ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-16 13:58 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Joonsoo Kim, Linux Memory Management List, Christoph Lameter,
	Andrew Morton, Alexander Duyck, brouer


On Tue, 16 Jun 2015 21:00:39 +0900 Joonsoo Kim <js1304@gmail.com> wrote:

> > Okay, but Christoph choose to not support kmem_cache_debug() in patch2/7.
> >
> > Should we/I try to add kmem cache debugging support?
> 
> kmem_cache_debug() is the check for slab internal debugging feature.
> slab_free_hook() and others mentioned from me are also related to external
> debugging features like as kasan and kmemleak. So, even if
> debugged kmem_cache isn't supported by bulk API, external debugging
> feature should be supported.
> 
> > If adding these, then I would also need to add those on alloc path...
> 
> Yes, please.

I've added a patch 8 to my queue, that (tries to) implement this.
Currently trying to figure out how to "use" these debugging features,
so I activate that code path.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab
  2015-06-16  7:21     ` Joonsoo Kim
  (?)
@ 2015-06-16 15:05     ` Christoph Lameter
  -1 siblings, 0 replies; 53+ messages in thread
From: Christoph Lameter @ 2015-06-16 15:05 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Jesper Dangaard Brouer, linux-mm, Andrew Morton, netdev, Alexander Duyck

On Tue, 16 Jun 2015, Joonsoo Kim wrote:

> Now I found that we need to call slab_pre_alloc_hook() before any operation
> on kmem_cache to support kmemcg accounting. And, we need to call
> slab_post_alloc_hook() on every allocated objects to support many
> debugging features like as kasan and kmemleak

Use the fallback function for any debugging avoids that. This needs to be
fast. If the performance is not wanted (debugging etc active) then the
fallback should be fine.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16 12:00         ` Joonsoo Kim
  (?)
  (?)
@ 2015-06-16 15:06         ` Christoph Lameter
  -1 siblings, 0 replies; 53+ messages in thread
From: Christoph Lameter @ 2015-06-16 15:06 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Jesper Dangaard Brouer, Joonsoo Kim,
	Linux Memory Management List, Andrew Morton, Linux-Netdev,
	Alexander Duyck

On Tue, 16 Jun 2015, Joonsoo Kim wrote:

> > If adding these, then I would also need to add those on alloc path...
>
> Yes, please.

Lets fall back to the generic implementation for any of these things. We
need to focus on maximum performance in these functions. The more special
cases we have to handle the more all of this gets compromised.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16 12:05         ` Joonsoo Kim
@ 2015-06-16 15:10           ` Christoph Lameter
  2015-06-16 15:52               ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 53+ messages in thread
From: Christoph Lameter @ 2015-06-16 15:10 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Jesper Dangaard Brouer, Joonsoo Kim,
	Linux Memory Management List, Andrew Morton, Linux-Netdev,
	Alexander Duyck

On Tue, 16 Jun 2015, Joonsoo Kim wrote:

> So, in your test, most of objects may come from one or two slabs and your
> algorithm is well optimized for this case. But, is this workload normal case?

It is normal if the objects were bulk allocated because SLUB ensures that
all objects are first allocated from one page before moving to another.

> If most of objects comes from many different slabs, bulk free API does
> enabling/disabling interrupt very much so I guess it work worse than
> just calling __kmem_cache_free_bulk(). Could you test this case?

In case of SLAB this would be an issue since the queueing mechanism
destroys spatial locality. This is much less an issue for SLUB.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16 15:10           ` Christoph Lameter
@ 2015-06-16 15:52               ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-16 15:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Joonsoo Kim, Linux Memory Management List,
	Andrew Morton, Linux-Netdev, Alexander Duyck, brouer

On Tue, 16 Jun 2015 10:10:25 -0500 (CDT)
Christoph Lameter <cl@linux.com> wrote:

> On Tue, 16 Jun 2015, Joonsoo Kim wrote:
> 
> > So, in your test, most of objects may come from one or two slabs and your
> > algorithm is well optimized for this case. But, is this workload normal case?
> 
> It is normal if the objects were bulk allocated because SLUB ensures that
> all objects are first allocated from one page before moving to another.

Yes, exactly.  Maybe SLAB is different? If so, then we can handle that
in the SLAB specific bulk implementation.


> > If most of objects comes from many different slabs, bulk free API does
> > enabling/disabling interrupt very much so I guess it work worse than
> > just calling __kmem_cache_free_bulk(). Could you test this case?
> 
> In case of SLAB this would be an issue since the queueing mechanism
> destroys spatial locality. This is much less an issue for SLUB.

I think Kim is worried about the cost of the enable/disable calls, when
the slowpath gets called.  But it is not a problem because the cost of
local_irq_{disable,enable} is very low (total cost 7 cycles).

It is very important that everybody realizes that the save+restore
variant is very expensive, this is key:

CPU: i7-4790K CPU @ 4.00GHz
 * local_irq_{disable,enable}:  7 cycles(tsc) - 1.821 ns
 * local_irq_{save,restore}  : 37 cycles(tsc) - 9.443 ns

Even if EVERY object need to call slowpath/__slab_free() it will be
faster than calling the fallback.  Because I've demonstrated the call
this_cpu_cmpxchg_double() costs 9 cycles.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

p.s. for comparison[1] a function call cost is 5-6 cycles, and a function
pointer call cost is 6-10 cycles, depending on CPU.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
@ 2015-06-16 15:52               ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-16 15:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Joonsoo Kim, Linux Memory Management List,
	Andrew Morton, Linux-Netdev, Alexander Duyck, brouer

On Tue, 16 Jun 2015 10:10:25 -0500 (CDT)
Christoph Lameter <cl@linux.com> wrote:

> On Tue, 16 Jun 2015, Joonsoo Kim wrote:
> 
> > So, in your test, most of objects may come from one or two slabs and your
> > algorithm is well optimized for this case. But, is this workload normal case?
> 
> It is normal if the objects were bulk allocated because SLUB ensures that
> all objects are first allocated from one page before moving to another.

Yes, exactly.  Maybe SLAB is different? If so, then we can handle that
in the SLAB specific bulk implementation.


> > If most of objects comes from many different slabs, bulk free API does
> > enabling/disabling interrupt very much so I guess it work worse than
> > just calling __kmem_cache_free_bulk(). Could you test this case?
> 
> In case of SLAB this would be an issue since the queueing mechanism
> destroys spatial locality. This is much less an issue for SLUB.

I think Kim is worried about the cost of the enable/disable calls, when
the slowpath gets called.  But it is not a problem because the cost of
local_irq_{disable,enable} is very low (total cost 7 cycles).

It is very important that everybody realizes that the save+restore
variant is very expensive, this is key:

CPU: i7-4790K CPU @ 4.00GHz
 * local_irq_{disable,enable}:  7 cycles(tsc) - 1.821 ns
 * local_irq_{save,restore}  : 37 cycles(tsc) - 9.443 ns

Even if EVERY object need to call slowpath/__slab_free() it will be
faster than calling the fallback.  Because I've demonstrated the call
this_cpu_cmpxchg_double() costs 9 cycles.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

p.s. for comparison[1] a function call cost is 5-6 cycles, and a function
pointer call cost is 6-10 cycles, depending on CPU.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
  2015-06-16 15:52               ` Jesper Dangaard Brouer
@ 2015-06-16 16:04                 ` Christoph Lameter
  -1 siblings, 0 replies; 53+ messages in thread
From: Christoph Lameter @ 2015-06-16 16:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Joonsoo Kim, Joonsoo Kim, Linux Memory Management List,
	Andrew Morton, Linux-Netdev, Alexander Duyck

On Tue, 16 Jun 2015, Jesper Dangaard Brouer wrote:

> It is very important that everybody realizes that the save+restore
> variant is very expensive, this is key:
>
> CPU: i7-4790K CPU @ 4.00GHz
>  * local_irq_{disable,enable}:  7 cycles(tsc) - 1.821 ns
>  * local_irq_{save,restore}  : 37 cycles(tsc) - 9.443 ns
>
> Even if EVERY object need to call slowpath/__slab_free() it will be
> faster than calling the fallback.  Because I've demonstrated the call
> this_cpu_cmpxchg_double() costs 9 cycles.

But the cmpxchg also stores a value. You need to add the cost of the store
to the cycles.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] slub: initial bulk free implementation
@ 2015-06-16 16:04                 ` Christoph Lameter
  0 siblings, 0 replies; 53+ messages in thread
From: Christoph Lameter @ 2015-06-16 16:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Joonsoo Kim, Joonsoo Kim, Linux Memory Management List,
	Andrew Morton, Linux-Netdev, Alexander Duyck

On Tue, 16 Jun 2015, Jesper Dangaard Brouer wrote:

> It is very important that everybody realizes that the save+restore
> variant is very expensive, this is key:
>
> CPU: i7-4790K CPU @ 4.00GHz
>  * local_irq_{disable,enable}:  7 cycles(tsc) - 1.821 ns
>  * local_irq_{save,restore}  : 37 cycles(tsc) - 9.443 ns
>
> Even if EVERY object need to call slowpath/__slab_free() it will be
> faster than calling the fallback.  Because I've demonstrated the call
> this_cpu_cmpxchg_double() costs 9 cycles.

But the cmpxchg also stores a value. You need to add the cost of the store
to the cycles.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing
  2015-06-15 15:51   ` Jesper Dangaard Brouer
@ 2015-06-16 21:44     ` Andrew Morton
  -1 siblings, 0 replies; 53+ messages in thread
From: Andrew Morton @ 2015-06-16 21:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck

On Mon, 15 Jun 2015 17:51:56 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> +								void **p)
> +{
> +	return kmem_cache_alloc_bulk(s, flags, size, p);
> +}

hm, any call to this function is going to be nasty, brutal and short.

--- a/mm/slab.c~slab-infrastructure-for-bulk-object-allocation-and-freeing-v3-fix
+++ a/mm/slab.c
@@ -3425,7 +3425,7 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
 bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 								void **p)
 {
-	return kmem_cache_alloc_bulk(s, flags, size, p);
+	return __kmem_cache_alloc_bulk(s, flags, size, p);
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);
 
_

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing
@ 2015-06-16 21:44     ` Andrew Morton
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Morton @ 2015-06-16 21:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck

On Mon, 15 Jun 2015 17:51:56 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> +								void **p)
> +{
> +	return kmem_cache_alloc_bulk(s, flags, size, p);
> +}

hm, any call to this function is going to be nasty, brutal and short.

--- a/mm/slab.c~slab-infrastructure-for-bulk-object-allocation-and-freeing-v3-fix
+++ a/mm/slab.c
@@ -3425,7 +3425,7 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
 bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 								void **p)
 {
-	return kmem_cache_alloc_bulk(s, flags, size, p);
+	return __kmem_cache_alloc_bulk(s, flags, size, p);
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab
  2015-06-15 15:52   ` Jesper Dangaard Brouer
@ 2015-06-16 21:48     ` Andrew Morton
  -1 siblings, 0 replies; 53+ messages in thread
From: Andrew Morton @ 2015-06-16 21:48 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck

On Mon, 15 Jun 2015 17:52:07 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> From: Christoph Lameter <cl@linux.com>
> 
> [NOTICE: Already in AKPM's quilt-queue]
> 
> First piece: acceleration of retrieval of per cpu objects
> 
> If we are allocating lots of objects then it is advantageous to disable
> interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
> faster.
> 
> Note that we cannot do the fast operation if debugging is enabled, because
> we would have to add extra code to do all the debugging checks.  And it
> would not be fast anyway.
> 
> Note also that the requirement of having interrupts disabled
> avoids having to do processor flag operations.
> 
> Allocate as many objects as possible in the fast way and then fall back to
> the generic implementation for the rest of the objects.
> 
> ...
>
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
>  bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>  								void **p)
>  {
> -	return kmem_cache_alloc_bulk(s, flags, size, p);
> +	if (!kmem_cache_debug(s)) {
> +		struct kmem_cache_cpu *c;
> +
> +		/* Drain objects in the per cpu slab */
> +		local_irq_disable();
> +		c = this_cpu_ptr(s->cpu_slab);
> +
> +		while (size) {
> +			void *object = c->freelist;
> +
> +			if (!object)
> +				break;
> +
> +			c->freelist = get_freepointer(s, object);
> +			*p++ = object;
> +			size--;
> +
> +			if (unlikely(flags & __GFP_ZERO))
> +				memset(object, 0, s->object_size);
> +		}
> +		c->tid = next_tid(c->tid);
> +
> +		local_irq_enable();

It might be worth adding

		if (!size)
			return true;

here.  To avoid the pointless call to __kmem_cache_alloc_bulk().

It depends on the typical success rate of this allocation loop.  Do you
know what this is?

> +	}
> +
> +	return __kmem_cache_alloc_bulk(s, flags, size, p);
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_bulk);

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab
@ 2015-06-16 21:48     ` Andrew Morton
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Morton @ 2015-06-16 21:48 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck

On Mon, 15 Jun 2015 17:52:07 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> From: Christoph Lameter <cl@linux.com>
> 
> [NOTICE: Already in AKPM's quilt-queue]
> 
> First piece: acceleration of retrieval of per cpu objects
> 
> If we are allocating lots of objects then it is advantageous to disable
> interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
> faster.
> 
> Note that we cannot do the fast operation if debugging is enabled, because
> we would have to add extra code to do all the debugging checks.  And it
> would not be fast anyway.
> 
> Note also that the requirement of having interrupts disabled
> avoids having to do processor flag operations.
> 
> Allocate as many objects as possible in the fast way and then fall back to
> the generic implementation for the rest of the objects.
> 
> ...
>
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
>  bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>  								void **p)
>  {
> -	return kmem_cache_alloc_bulk(s, flags, size, p);
> +	if (!kmem_cache_debug(s)) {
> +		struct kmem_cache_cpu *c;
> +
> +		/* Drain objects in the per cpu slab */
> +		local_irq_disable();
> +		c = this_cpu_ptr(s->cpu_slab);
> +
> +		while (size) {
> +			void *object = c->freelist;
> +
> +			if (!object)
> +				break;
> +
> +			c->freelist = get_freepointer(s, object);
> +			*p++ = object;
> +			size--;
> +
> +			if (unlikely(flags & __GFP_ZERO))
> +				memset(object, 0, s->object_size);
> +		}
> +		c->tid = next_tid(c->tid);
> +
> +		local_irq_enable();

It might be worth adding

		if (!size)
			return true;

here.  To avoid the pointless call to __kmem_cache_alloc_bulk().

It depends on the typical success rate of this allocation loop.  Do you
know what this is?

> +	}
> +
> +	return __kmem_cache_alloc_bulk(s, flags, size, p);
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_bulk);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/7] slub: fix error path bug in kmem_cache_alloc_bulk
  2015-06-15 15:52   ` Jesper Dangaard Brouer
  (?)
@ 2015-06-16 21:51   ` Andrew Morton
  2015-06-17  6:25     ` Jesper Dangaard Brouer
  -1 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2015-06-16 21:51 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck

On Mon, 15 Jun 2015 17:52:26 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> The current kmem_cache/SLAB bulking API need to release all objects
> in case the layer cannot satisfy the full request.
> 
> If __kmem_cache_alloc_bulk() fails, all allocated objects in array
> should be freed, but, __kmem_cache_alloc_bulk() can't know
> about objects allocated by this slub specific kmem_cache_alloc_bulk()
> function.

Can we fold patches 2, 3 and 4 into a single patch?

And maybe patch 5 as well.  I don't think we need all these
development-time increments in the permanent record.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] slub: improve bulk alloc strategy
  2015-06-15 15:52   ` Jesper Dangaard Brouer
  (?)
  (?)
@ 2015-06-16 21:53   ` Andrew Morton
  2015-06-17  6:29       ` Jesper Dangaard Brouer
  -1 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2015-06-16 21:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck

On Mon, 15 Jun 2015 17:52:46 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> Call slowpath __slab_alloc() from within the bulk loop, as the
> side-effect of this call likely repopulates c->freelist.
> 
> Choose to reenable local IRQs while calling slowpath.
> 
> Saving some optimizations for later.  E.g. it is possible to
> extract parts of __slab_alloc() and avoid the unnecessary and
> expensive (37 cycles) local_irq_{save,restore}.  For now, be
> happy calling __slab_alloc() this lower icache impact of this
> func and I don't have to worry about correctness.
> 
> ...
>
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2776,8 +2776,23 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>  	for (i = 0; i < size; i++) {
>  		void *object = c->freelist;
>  
> -		if (!object)
> -			break;
> +		if (unlikely(!object)) {
> +			c->tid = next_tid(c->tid);
> +			local_irq_enable();
> +
> +			/* Invoke slow path one time, then retry fastpath
> +			 * as side-effect have updated c->freelist
> +			 */

That isn't very grammatical.

Block comments are formatted

	/*
	 * like this
	 */

please.


> +			p[i] = __slab_alloc(s, flags, NUMA_NO_NODE,
> +					    _RET_IP_, c);
> +			if (unlikely(!p[i])) {
> +				__kmem_cache_free_bulk(s, i, p);
> +				return false;
> +			}
> +			local_irq_disable();
> +			c = this_cpu_ptr(s->cpu_slab);
> +			continue; /* goto for-loop */
> +		}
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab
  2015-06-16 21:48     ` Andrew Morton
  (?)
@ 2015-06-17  6:24     ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-17  6:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck, brouer

On Tue, 16 Jun 2015 14:48:40 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 15 Jun 2015 17:52:07 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> > From: Christoph Lameter <cl@linux.com>
> > 
> > [NOTICE: Already in AKPM's quilt-queue]
> > 
> > First piece: acceleration of retrieval of per cpu objects
> > 
> > If we are allocating lots of objects then it is advantageous to disable
> > interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
> > faster.
> > 
> > Note that we cannot do the fast operation if debugging is enabled, because
> > we would have to add extra code to do all the debugging checks.  And it
> > would not be fast anyway.
> > 
> > Note also that the requirement of having interrupts disabled
> > avoids having to do processor flag operations.
> > 
> > Allocate as many objects as possible in the fast way and then fall back to
> > the generic implementation for the rest of the objects.
> > 
> > ...
> >
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
> >  bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> >  								void **p)
> >  {
> > -	return kmem_cache_alloc_bulk(s, flags, size, p);
> > +	if (!kmem_cache_debug(s)) {
> > +		struct kmem_cache_cpu *c;
> > +
> > +		/* Drain objects in the per cpu slab */
> > +		local_irq_disable();
> > +		c = this_cpu_ptr(s->cpu_slab);
> > +
> > +		while (size) {
> > +			void *object = c->freelist;
> > +
> > +			if (!object)
> > +				break;
> > +
> > +			c->freelist = get_freepointer(s, object);
> > +			*p++ = object;
> > +			size--;
> > +
> > +			if (unlikely(flags & __GFP_ZERO))
> > +				memset(object, 0, s->object_size);
> > +		}
> > +		c->tid = next_tid(c->tid);
> > +
> > +		local_irq_enable();
> 
> It might be worth adding
> 
> 		if (!size)
> 			return true;
> 
> here.  To avoid the pointless call to __kmem_cache_alloc_bulk().

The pointless call did present a measurable performance hit (2ns), and
I've removed it in the next patches, which fixes the error/exit path.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/7] slub: fix error path bug in kmem_cache_alloc_bulk
  2015-06-16 21:51   ` Andrew Morton
@ 2015-06-17  6:25     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-17  6:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck, brouer

On Tue, 16 Jun 2015 14:51:09 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 15 Jun 2015 17:52:26 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> > The current kmem_cache/SLAB bulking API need to release all objects
> > in case the layer cannot satisfy the full request.
> > 
> > If __kmem_cache_alloc_bulk() fails, all allocated objects in array
> > should be freed, but, __kmem_cache_alloc_bulk() can't know
> > about objects allocated by this slub specific kmem_cache_alloc_bulk()
> > function.
> 
> Can we fold patches 2, 3 and 4 into a single patch?
> 
> And maybe patch 5 as well.  I don't think we need all these
> development-time increments in the permanent record.

I agree.  I'll fold them when submitting V2

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] slub: improve bulk alloc strategy
  2015-06-16 21:53   ` Andrew Morton
@ 2015-06-17  6:29       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-17  6:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck, brouer

On Tue, 16 Jun 2015 14:53:36 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 15 Jun 2015 17:52:46 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
[...]
> > +			/* Invoke slow path one time, then retry fastpath
> > +			 * as side-effect have updated c->freelist
> > +			 */
> 
> That isn't very grammatical.
> 
> Block comments are formatted
> 
> 	/*
> 	 * like this
> 	 */
> 
> please.

Sure, old habit, this comment style is just that we use in net/ tree.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] slub: improve bulk alloc strategy
@ 2015-06-17  6:29       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2015-06-17  6:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Christoph Lameter, netdev, Alexander Duyck, brouer

On Tue, 16 Jun 2015 14:53:36 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 15 Jun 2015 17:52:46 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
[...]
> > +			/* Invoke slow path one time, then retry fastpath
> > +			 * as side-effect have updated c->freelist
> > +			 */
> 
> That isn't very grammatical.
> 
> Block comments are formatted
> 
> 	/*
> 	 * like this
> 	 */
> 
> please.

Sure, old habit, this comment style is just that we use in net/ tree.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2015-06-17  6:29 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-15 15:51 [PATCH 0/7] slub: bulk alloc and free for slub allocator Jesper Dangaard Brouer
2015-06-15 15:51 ` Jesper Dangaard Brouer
2015-06-15 15:51 ` [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing Jesper Dangaard Brouer
2015-06-15 15:51   ` Jesper Dangaard Brouer
2015-06-15 16:45   ` Alexander Duyck
2015-06-15 16:45     ` Alexander Duyck
2015-06-15 16:50     ` Christoph Lameter
2015-06-16 21:44   ` Andrew Morton
2015-06-16 21:44     ` Andrew Morton
2015-06-15 15:52 ` [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab Jesper Dangaard Brouer
2015-06-15 15:52   ` Jesper Dangaard Brouer
2015-06-16  7:21   ` Joonsoo Kim
2015-06-16  7:21     ` Joonsoo Kim
2015-06-16 15:05     ` Christoph Lameter
2015-06-16 21:48   ` Andrew Morton
2015-06-16 21:48     ` Andrew Morton
2015-06-17  6:24     ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 3/7] slub: reduce indention level in kmem_cache_alloc_bulk() Jesper Dangaard Brouer
2015-06-15 15:52   ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 4/7] slub: fix error path bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
2015-06-15 15:52   ` Jesper Dangaard Brouer
2015-06-16 21:51   ` Andrew Morton
2015-06-17  6:25     ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 5/7] slub: kmem_cache_alloc_bulk() move clearing outside IRQ disabled section Jesper Dangaard Brouer
2015-06-15 15:52   ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 6/7] slub: improve bulk alloc strategy Jesper Dangaard Brouer
2015-06-15 15:52   ` Jesper Dangaard Brouer
2015-06-15 16:36   ` Christoph Lameter
2015-06-16 21:53   ` Andrew Morton
2015-06-17  6:29     ` Jesper Dangaard Brouer
2015-06-17  6:29       ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 7/7] slub: initial bulk free implementation Jesper Dangaard Brouer
2015-06-15 15:52   ` Jesper Dangaard Brouer
2015-06-15 16:34   ` Christoph Lameter
2015-06-16  8:04     ` Jesper Dangaard Brouer
2015-06-15 17:04   ` Alexander Duyck
2015-06-16  7:23   ` Joonsoo Kim
2015-06-16  7:23     ` Joonsoo Kim
2015-06-16  9:20     ` Jesper Dangaard Brouer
2015-06-16 12:00       ` Joonsoo Kim
2015-06-16 12:00         ` Joonsoo Kim
2015-06-16 13:58         ` Jesper Dangaard Brouer
2015-06-16 15:06         ` Christoph Lameter
2015-06-16  7:28   ` Joonsoo Kim
2015-06-16  8:21     ` Jesper Dangaard Brouer
2015-06-16  8:21       ` Jesper Dangaard Brouer
2015-06-16  8:57       ` Jesper Dangaard Brouer
2015-06-16 12:05         ` Joonsoo Kim
2015-06-16 15:10           ` Christoph Lameter
2015-06-16 15:52             ` Jesper Dangaard Brouer
2015-06-16 15:52               ` Jesper Dangaard Brouer
2015-06-16 16:04               ` Christoph Lameter
2015-06-16 16:04                 ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.