All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] kmalloc_index optimization(code size & runtime stable)
@ 2020-04-21  3:25 Bernard Zhao
  2020-04-21 11:18 ` Matthew Wilcox
  2020-04-21 13:13 ` Vlastimil Babka
  0 siblings, 2 replies; 8+ messages in thread
From: Bernard Zhao @ 2020-04-21  3:25 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel
  Cc: opensource.kernel, Bernard Zhao

kmalloc_index inline function code size optimization and runtime
performance stability optimization. After optimization, the function
kmalloc_index is more stable, the size will never affecte the function`s
execution efficiency.
And follow test data shows that the performance of new optimization
exceeds the original algorithm when applying for more than 512 Bytes
(include 512B).And new optimization runtime is more stable than before.
Test platform:install vmware ubuntu 16.04, ram 2G, cpu 1, i5-8500 3.00GHz
Compiler: gcc -O2 optimization, gcc version 5.4.0.
Just test diff code part.
Follow is detailed test data:
            size        time/Per 100 million times
                        old fun		new fun with optimise
		8	203777		241934
		16	245611		409278
		32	236384		408419
		64	275499		447732
		128	354909		416439
		256	360472		406598
		512	431072		409168
		1024	463822		407401
        2 * 1024	548519		407710
        4 * 1024	623378		422326
        8 * 1024	655932		407457
       16 * 1024	744673		417574
       32 * 1024	824889		415316
       64 * 1024	854374		408577
      128 * 1024	968079		433582
      256 * 1024	985527		412080
      512 * 1024	1196877		448199
     1024 * 1024	1310315		448969
2  * 1024 * 1024	1367441		513117
4  * 1024 * 1024	1264623		415019
8  * 1024 * 1024	1255727		417197
16 * 1024 * 1024	1401431		411087
32 * 1024 * 1024	1440415		416616
64 * 1024 * 1024	1428122		417459

Signed-off-by: Bernard Zhao <bernard@vivo.com>

Changes since V1:
*i am not sure wht kbuild being triggered?
*fix kbuild compiler error

Link for V1:
*https://lore.kernel.org/patchwork/patch/1226159/
---
 include/linux/slab.h | 62 +++++++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 24 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 6d454886bcaf..b09785a79465 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -301,6 +301,23 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
 #define SLAB_OBJ_MIN_SIZE      (KMALLOC_MIN_SIZE < 16 ? \
                                (KMALLOC_MIN_SIZE) : 16)
 
+#ifndef CONFIG_SLOB
+/*
+ * This used to show the relation between size`s last (most-significant)
+ * bit set & index of kmalloc_info[]
+ * If size%2 ==0, then fls - 1, else fls(round up)
+ * size  8(b 1000)-(b 1xxx)-16(b 10000)-(b 1xxxx)-32(b 100000)-(b 1xxxxx)
+ *       |            |          |           |            |           |
+ * index 3            4          4           5            5           6
+ *       64(b 1000000)-(b 1xxxxxx)-128(b 10000000)-(b 1xxxxxxx)-256....
+ *          |           |              |            |            |
+ *          6           7              7            8            8...
+ */
+#define KMALLOC_SIZE_POW_2_SHIFT_BIT (2)
+#define KMALLOC_SIZE_POW_2_INDEX_BIT (1)
+#endif
+
+
 /*
  * Whenever changing this, take care of that kmalloc_type() and
  * create_kmalloc_caches() still work as intended.
@@ -348,6 +365,7 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
  */
 static __always_inline unsigned int kmalloc_index(size_t size)
 {
+	unsigned char high_bit = 0;
 	if (!size)
 		return 0;
 
@@ -358,30 +376,26 @@ static __always_inline unsigned int kmalloc_index(size_t size)
 		return 1;
 	if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192)
 		return 2;
-	if (size <=          8) return 3;
-	if (size <=         16) return 4;
-	if (size <=         32) return 5;
-	if (size <=         64) return 6;
-	if (size <=        128) return 7;
-	if (size <=        256) return 8;
-	if (size <=        512) return 9;
-	if (size <=       1024) return 10;
-	if (size <=   2 * 1024) return 11;
-	if (size <=   4 * 1024) return 12;
-	if (size <=   8 * 1024) return 13;
-	if (size <=  16 * 1024) return 14;
-	if (size <=  32 * 1024) return 15;
-	if (size <=  64 * 1024) return 16;
-	if (size <= 128 * 1024) return 17;
-	if (size <= 256 * 1024) return 18;
-	if (size <= 512 * 1024) return 19;
-	if (size <= 1024 * 1024) return 20;
-	if (size <=  2 * 1024 * 1024) return 21;
-	if (size <=  4 * 1024 * 1024) return 22;
-	if (size <=  8 * 1024 * 1024) return 23;
-	if (size <=  16 * 1024 * 1024) return 24;
-	if (size <=  32 * 1024 * 1024) return 25;
-	if (size <=  64 * 1024 * 1024) return 26;
+	if (size <= 8)
+		return 3;
+
+	/* size over KMALLOC_MAX_SIZE should trigger BUG */
+	if (size <= KMALLOC_MAX_SIZE) {
+		/*
+		 * kmalloc_info[index]
+		 * size  8----16----32----64----128---256---512---1024---2048.
+		 *       |  |  |  |  |  |  |  |  |  |  |  |  |  |   |  |   |
+		 * index 3  4  4  5  5  6  6  7  7  8  8  9  9  10  10 11  11
+		 */
+
+		high_bit = fls((int)size);
+
+		if (size == (2 << (high_bit - KMALLOC_SIZE_POW_2_SHIFT_BIT)))
+			return (high_bit - KMALLOC_SIZE_POW_2_INDEX_BIT);
+
+		return high_bit;
+	}
+
 	BUG();
 
 	/* Will never be reached. Needed because the compiler may complain */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)
  2020-04-21  3:25 [PATCH V2] kmalloc_index optimization(code size & runtime stable) Bernard Zhao
@ 2020-04-21 11:18 ` Matthew Wilcox
  2020-04-21 11:55     ` 赵军奎
  2020-04-21 13:13 ` Vlastimil Babka
  1 sibling, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2020-04-21 11:18 UTC (permalink / raw)
  To: 1587089010-110083-1-git-send-email-bernard
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel, opensource.kernel,
	Bernard Zhao

On Mon, Apr 20, 2020 at 08:25:01PM -0700, Bernard Zhao wrote:
> kmalloc_index inline function code size optimization and runtime
> performance stability optimization. After optimization, the function
> kmalloc_index is more stable, the size will never affecte the function`s
> execution efficiency.

Please stop posting this patch until it's faster *for small sizes*.
As I explained last time you posted it, it's not an optimisation.

>             size        time/Per 100 million times
>                         old fun		new fun with optimise
> 		8	203777		241934
> 		16	245611		409278
> 		32	236384		408419
> 		64	275499		447732
> 		128	354909		416439

^^^^ these are the important cases that need to be fast.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re:Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)
  2020-04-21 11:18 ` Matthew Wilcox
@ 2020-04-21 11:55     ` 赵军奎
  0 siblings, 0 replies; 8+ messages in thread
From: 赵军奎 @ 2020-04-21 11:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel, opensource.kernel



From: Matthew Wilcox <willy@infradead.org>
Date: 2020-04-21 19:18:49
To:  1587089010-110083-1-git-send-email-bernard@vivo.com
Cc:  Christoph Lameter <cl@linux.com>,Pekka Enberg <penberg@kernel.org>,David Rientjes <rientjes@google.com>,Joonsoo Kim <iamjoonsoo.kim@lge.com>,Andrew Morton <akpm@linux-foundation.org>,linux-mm@kvack.org,linux-kernel@vger.kernel.org,opensource.kernel@vivo.com,Bernard Zhao <bernard@vivo.com>
Subject: Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)>On Mon, Apr 20, 2020 at 08:25:01PM -0700, Bernard Zhao wrote:
>> kmalloc_index inline function code size optimization and runtime
>> performance stability optimization. After optimization, the function
>> kmalloc_index is more stable, the size will never affecte the function`s
>> execution efficiency.
>
>Please stop posting this patch until it's faster *for small sizes*.
>As I explained last time you posted it, it's not an optimisation.
>
>>             size        time/Per 100 million times
>>                         old fun		new fun with optimise
>> 		8	203777		241934
>> 		16	245611		409278
>> 		32	236384		408419
>> 		64	275499		447732
>> 		128	354909		416439
>
>^^^^ these are the important cases that need to be fast.
>

Sure, i just received some kbuild compiler error mails and prompt me to do something? 
I don`t know why this happened, so i update the patch again.

Regards,
Bernard


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re:Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)
@ 2020-04-21 11:55     ` 赵军奎
  0 siblings, 0 replies; 8+ messages in thread
From: 赵军奎 @ 2020-04-21 11:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel, opensource.kernel



From: Matthew Wilcox <willy@infradead.org>
Date: 2020-04-21 19:18:49
To:  1587089010-110083-1-git-send-email-bernard@vivo.com
Cc:  Christoph Lameter <cl@linux.com>,Pekka Enberg <penberg@kernel.org>,David Rientjes <rientjes@google.com>,Joonsoo Kim <iamjoonsoo.kim@lge.com>,Andrew Morton <akpm@linux-foundation.org>,linux-mm@kvack.org,linux-kernel@vger.kernel.org,opensource.kernel@vivo.com,Bernard Zhao <bernard@vivo.com>
Subject: Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)>On Mon, Apr 20, 2020 at 08:25:01PM -0700, Bernard Zhao wrote:
>> kmalloc_index inline function code size optimization and runtime
>> performance stability optimization. After optimization, the function
>> kmalloc_index is more stable, the size will never affecte the function`s
>> execution efficiency.
>
>Please stop posting this patch until it's faster *for small sizes*.
>As I explained last time you posted it, it's not an optimisation.
>
>>             size        time/Per 100 million times
>>                         old fun		new fun with optimise
>> 		8	203777		241934
>> 		16	245611		409278
>> 		32	236384		408419
>> 		64	275499		447732
>> 		128	354909		416439
>
>^^^^ these are the important cases that need to be fast.
>

Sure, i just received some kbuild compiler error mails and prompt me to do something? 
I don`t know why this happened, so i update the patch again.

Regards,
Bernard


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)
  2020-04-21  3:25 [PATCH V2] kmalloc_index optimization(code size & runtime stable) Bernard Zhao
  2020-04-21 11:18 ` Matthew Wilcox
@ 2020-04-21 13:13 ` Vlastimil Babka
  1 sibling, 0 replies; 8+ messages in thread
From: Vlastimil Babka @ 2020-04-21 13:13 UTC (permalink / raw)
  To: 1587089010-110083-1-git-send-email-bernard, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	linux-mm, linux-kernel
  Cc: opensource.kernel, Bernard Zhao

On 4/21/20 5:25 AM, Bernard Zhao wrote:
> kmalloc_index inline function code size optimization and runtime
> performance stability optimization. After optimization, the function
> kmalloc_index is more stable, the size will never affecte the function`s
> execution efficiency.
> And follow test data shows that the performance of new optimization
> exceeds the original algorithm when applying for more than 512 Bytes
> (include 512B).And new optimization runtime is more stable than before.
> Test platform:install vmware ubuntu 16.04, ram 2G, cpu 1, i5-8500 3.00GHz
> Compiler: gcc -O2 optimization, gcc version 5.4.0.
> Just test diff code part.
> Follow is detailed test data:
>             size        time/Per 100 million times
>                         old fun		new fun with optimise
> 		8	203777		241934
> 		16	245611		409278
> 		32	236384		408419
> 		64	275499		447732
> 		128	354909		416439
> 		256	360472		406598
> 		512	431072		409168
> 		1024	463822		407401
>         2 * 1024	548519		407710
>         4 * 1024	623378		422326
>         8 * 1024	655932		407457
>        16 * 1024	744673		417574
>        32 * 1024	824889		415316
>        64 * 1024	854374		408577
>       128 * 1024	968079		433582
>       256 * 1024	985527		412080
>       512 * 1024	1196877		448199
>      1024 * 1024	1310315		448969
> 2  * 1024 * 1024	1367441		513117
> 4  * 1024 * 1024	1264623		415019
> 8  * 1024 * 1024	1255727		417197
> 16 * 1024 * 1024	1401431		411087
> 32 * 1024 * 1024	1440415		416616
> 64 * 1024 * 1024	1428122		417459

No, the kernel will never see these time improvements (or non-improvements for
small sizes). See how kmalloc() and kmalloc_node() both call kmalloc_index()
only under "if (__builtin_constant_p(size))"
which means kmalloc is called with a (compile-time) constant size, so this code
is only evaluated at compile time, not while kernel is running. Otherwise it
really wouldn't be implemented as a stream of if's :)
The cases that are not compile time constant size end up in kmalloc_slab(), so
you can see how that one is implemented and what its performance is.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)
  2020-04-21 11:55     ` 赵军奎
  (?)
@ 2020-04-21 14:36     ` Matthew Wilcox
  2020-04-22  1:12         ` 赵军奎
  -1 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2020-04-21 14:36 UTC (permalink / raw)
  To: 赵军奎
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel, opensource.kernel

On Tue, Apr 21, 2020 at 07:55:03PM +0800, 赵军奎 wrote:
> Sure, i just received some kbuild compiler error mails and prompt me to do something? 
> I don`t know why this happened, so i update the patch again.

Don't.  The patch has been NACKed, so there's no need to post a v2.

If you want to do something useful, how about looking at the effect
of adding different slab sizes?  There's a fairly common pattern of
allocating things which are a power of two + a header.  So it may make
sense to have kmalloc caches of 320 (256 + 64), 576 (512 + 64) and 1088
(1024 + 64).  I use 64 here as that's the size of a cacheline, so we
won't get false sharing between users.

This could save a fair quantity of memory; today if you allocate 512 +
8 bytes, it will round up to 1024.  So we'll get 4 allocations per 4kB
page, but with a 576-byte slab, we'd get 7 allocations per 4kB page.
Of course, if there aren't a lot of users which allocate memory in this
range, then it'll be a waste of memory.  On my laptop, it seems like
there might be a decent amount of allocations in the right range:

kmalloc-2k          3881   4384   2048   16    8 : tunables    0    0    0 : sla
bdata    274    274      0
kmalloc-1k          6488   7056   1024   16    4 : tunables    0    0    0 : slabdata    441    441      0
kmalloc-512         7700   8256    512   16    2 : tunables    0    0    0 : slabdata    516    516      0

Now, maybe 576 isn't quite the right size.  Need to try it on a variety
of configurations and find out.  Want to investigate this?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re:Re: Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)
  2020-04-21 14:36     ` Matthew Wilcox
@ 2020-04-22  1:12         ` 赵军奎
  0 siblings, 0 replies; 8+ messages in thread
From: 赵军奎 @ 2020-04-22  1:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel, opensource.kernel



发件人:Matthew Wilcox <willy@infradead.org>
发送日期:2020-04-21 22:36:09
收件人:"赵军奎" <bernard@vivo.com>
抄送人:Christoph Lameter <cl@linux.com>,Pekka Enberg <penberg@kernel.org>,David Rientjes <rientjes@google.com>,Joonsoo Kim <iamjoonsoo.kim@lge.com>,Andrew Morton <akpm@linux-foundation.org>,linux-mm@kvack.org,linux-kernel@vger.kernel.org,opensource.kernel@vivo.com
主题:Re: Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)>On Tue, Apr 21, 2020 at 07:55:03PM +0800, 赵军奎 wrote:
>> Sure, i just received some kbuild compiler error mails and prompt me to do something? 
>> I don`t know why this happened, so i update the patch again.
>
>Don't.  The patch has been NACKed, so there's no need to post a v2.
>
>If you want to do something useful, how about looking at the effect
>of adding different slab sizes?  There's a fairly common pattern of
>allocating things which are a power of two + a header.  So it may make
>sense to have kmalloc caches of 320 (256 + 64), 576 (512 + 64) and 1088
>(1024 + 64).  I use 64 here as that's the size of a cacheline, so we
>won't get false sharing between users.
>
>This could save a fair quantity of memory; today if you allocate 512 +
>8 bytes, it will round up to 1024.  So we'll get 4 allocations per 4kB
>page, but with a 576-byte slab, we'd get 7 allocations per 4kB page.
>Of course, if there aren't a lot of users which allocate memory in this
>range, then it'll be a waste of memory.  On my laptop, it seems like
>there might be a decent amount of allocations in the right range:
>
>kmalloc-2k          3881   4384   2048   16    8 : tunables    0    0    0 : sla
>bdata    274    274      0
>kmalloc-1k          6488   7056   1024   16    4 : tunables    0    0    0 : slabdata    441    441      0
>kmalloc-512         7700   8256    512   16    2 : tunables    0    0    0 : slabdata    516    516      0
>
>Now, maybe 576 isn't quite the right size.  Need to try it on a variety
>of configurations and find out.  Want to investigate this?

This looks like a great idea!
Maybe I can do some research on our mobile phone products,
and see how the original size of kmalloc is distributed.
This may be useful as a reference to provide a flexible configuration method.
Thank you very much for your sharing.

Regards,
Bernard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re:Re: Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)
@ 2020-04-22  1:12         ` 赵军奎
  0 siblings, 0 replies; 8+ messages in thread
From: 赵军奎 @ 2020-04-22  1:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel, opensource.kernel



发件人:Matthew Wilcox <willy@infradead.org>
发送日期:2020-04-21 22:36:09
收件人:"赵军奎" <bernard@vivo.com>
抄送人:Christoph Lameter <cl@linux.com>,Pekka Enberg <penberg@kernel.org>,David Rientjes <rientjes@google.com>,Joonsoo Kim <iamjoonsoo.kim@lge.com>,Andrew Morton <akpm@linux-foundation.org>,linux-mm@kvack.org,linux-kernel@vger.kernel.org,opensource.kernel@vivo.com
主题:Re: Re: [PATCH V2] kmalloc_index optimization(code size & runtime stable)>On Tue, Apr 21, 2020 at 07:55:03PM +0800, 赵军奎 wrote:
>> Sure, i just received some kbuild compiler error mails and prompt me to do something? 
>> I don`t know why this happened, so i update the patch again.
>
>Don't.  The patch has been NACKed, so there's no need to post a v2.
>
>If you want to do something useful, how about looking at the effect
>of adding different slab sizes?  There's a fairly common pattern of
>allocating things which are a power of two + a header.  So it may make
>sense to have kmalloc caches of 320 (256 + 64), 576 (512 + 64) and 1088
>(1024 + 64).  I use 64 here as that's the size of a cacheline, so we
>won't get false sharing between users.
>
>This could save a fair quantity of memory; today if you allocate 512 +
>8 bytes, it will round up to 1024.  So we'll get 4 allocations per 4kB
>page, but with a 576-byte slab, we'd get 7 allocations per 4kB page.
>Of course, if there aren't a lot of users which allocate memory in this
>range, then it'll be a waste of memory.  On my laptop, it seems like
>there might be a decent amount of allocations in the right range:
>
>kmalloc-2k          3881   4384   2048   16    8 : tunables    0    0    0 : sla
>bdata    274    274      0
>kmalloc-1k          6488   7056   1024   16    4 : tunables    0    0    0 : slabdata    441    441      0
>kmalloc-512         7700   8256    512   16    2 : tunables    0    0    0 : slabdata    516    516      0
>
>Now, maybe 576 isn't quite the right size.  Need to try it on a variety
>of configurations and find out.  Want to investigate this?

This looks like a great idea!
Maybe I can do some research on our mobile phone products,
and see how the original size of kmalloc is distributed.
This may be useful as a reference to provide a flexible configuration method.
Thank you very much for your sharing.

Regards,
Bernard



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-04-22  1:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21  3:25 [PATCH V2] kmalloc_index optimization(code size & runtime stable) Bernard Zhao
2020-04-21 11:18 ` Matthew Wilcox
2020-04-21 11:55   ` 赵军奎
2020-04-21 11:55     ` 赵军奎
2020-04-21 14:36     ` Matthew Wilcox
2020-04-22  1:12       ` 赵军奎
2020-04-22  1:12         ` 赵军奎
2020-04-21 13:13 ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.