From: Nick Piggin <npiggin@suse.de>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [rfc][patch] SLQB slab allocator
Date: Fri, 12 Dec 2008 06:50:51 +0100 [thread overview]
Message-ID: <20081212055051.GE15804@wotan.suse.de> (raw)
In-Reply-To: <4941F8D2.4060807@cosmosbay.com>
On Fri, Dec 12, 2008 at 06:38:26AM +0100, Eric Dumazet wrote:
> Nick Piggin a écrit :
> > I'm going to continue working on this as I get time, and I plan to soon ask
> > to have it merged. It would be great if people could comment or test it.
> >
>
> It seems really good, but will need some hours to review :)
>
> Minor nit : You spelled Qeued instead of Queued in init/Kconfig
>
> +config SLQB
> + bool "SLQB (Qeued allocator)"
OK, thanks.
> One of the problem I see with SLAB & SLUB is the irq masking stuff.
> Some (many ???) kmem_cache are only used in process context, I see no point of
> disabling irqs for them.
That's a very good point actually, and something I want to look at...
I'm thinking it will make most sense to provide a
kmem_cache_alloc/free_irqsafe for callers who either don't do any
interrupt context allocations, or already have irqs off (another
slab flag will just add another branch in the fastpaths).
And then also a kmalloc/kfree_irqsoff for code which already has
irqs off.
That's something which benefit all slab allocators roughly equally,
so at the moment I'm concentrating on the core code. But it's a very
good idea.
> I tested your patch on my 8 ways HP BL460c G1, on top
> on my last patch serie. (linux-2.6, not net-next-2.6)
>
> # time ./socketallocbench
>
> real 0m1.300s
> user 0m0.078s
> sys 0m1.207s
> # time ./socketallocbench -n 8
>
> real 0m1.686s
> user 0m0.614s
> sys 0m12.737s
>
> So no bad effect (same than SLUB).
Cool, thanks.
> For the record, SLAB is really really bad for this workload
>
> PU: Core 2, speed 3000.1 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100
> 000
> samples cum. samples % cum. % symbol name
> 136537 136537 10.8300 10.8300 kmem_cache_alloc
> 129380 265917 10.2623 21.0924 tcp_close
> 79696 345613 6.3214 27.4138 tcp_v4_init_sock
> 73873 419486 5.8596 33.2733 tcp_v4_destroy_sock
> 63436 482922 5.0317 38.3050 sysenter_past_esp
> 62140 545062 4.9289 43.2339 inet_csk_destroy_sock
> 56565 601627 4.4867 47.7206 kmem_cache_free
> 40430 642057 3.2069 50.9275 __percpu_counter_add
> 35742 677799 2.8350 53.7626 init_timer
> 35611 713410 2.8246 56.5872 copy_from_user
> 21616 735026 1.7146 58.3018 d_alloc
> 20821 755847 1.6515 59.9533 alloc_inode
> 19645 775492 1.5582 61.5115 alloc_fd
> 18935 794427 1.5019 63.0134 __fput
> 18922 813349 1.5009 64.5143 inet_create
> 18919 832268 1.5006 66.0149 sys_close
> 16074 848342 1.2750 67.2899 release_sock
> 15337 863679 1.2165 68.5064 lock_sock_nested
> 15172 878851 1.2034 69.7099 sock_init_data
> 14196 893047 1.1260 70.8359 fd_install
> 13677 906724 1.0849 71.9207 drop_file_write_access
> 13195 919919 1.0466 72.9673 dput
> 12768 932687 1.0127 73.9801 inotify_d_instantiate
> 11404 944091 0.9046 74.8846 init_waitqueue_head
> 11228 955319 0.8906 75.7752 sysenter_do_call
> 11213 966532 0.8894 76.6647 local_bh_enable_ip
> 10948 977480 0.8684 77.5330 __sock_create
> 10912 988392 0.8655 78.3986 local_bh_enable
> 10665 999057 0.8459 79.2445 __new_inode
> 10579 1009636 0.8391 80.0836 inet_release
> 9665 1019301 0.7666 80.8503 iput_single
> 9545 1028846 0.7571 81.6074 fput
> 7950 1036796 0.6306 82.2379 sock_release
> 7236 1044032 0.5740 82.8119 local_bh_disable
>
>
> We can see most of the time is taken by the memset() to clear object,
> then irq masking stuff...
Yep, it's difficult to make the local alloc/free fastpath much more
optimal as-is.
Is SLAB still bad at the test with the slab-rcu patch in place?
SLAB has a pretty optimal fastpath as well, although if its queues
start overflowing, it can run into contention quite easily.
>
> c0281e10 <kmem_cache_alloc>: /* kmem_cache_alloc total: 140659 10.8277 */
I guess you're compiling with -Os? I find gcc can pack the fastpath
much better with -O2, and actually decrease the effective icache
footprint size even if the total text size increases...
> 2414 0.1858 :c0281e10: push %ebp
> 7 5.4e-04 :c0281e11: mov %esp,%ebp
> :c0281e13: push %edi
> 1454 0.1119 :c0281e14: push %esi
> 310 0.0239 :c0281e15: mov %eax,%esi
> :c0281e17: push %ebx
> 368 0.0283 :c0281e18: sub $0x10,%esp
> 949 0.0731 :c0281e1b: mov %edx,-0x18(%ebp)
> 383 0.0295 :c0281e1e: mov 0x4(%ebp),%eax
> 1189 0.0915 :c0281e21: mov %eax,-0x14(%ebp)
> 1240 0.0955 :c0281e24: jmp c0281e6e <kmem_cache_alloc+0x5e>
> :c0281e26: lea 0x0(%esi),%esi
> :c0281e29: lea 0x0(%edi,%eiz,1),%edi
> 1188 0.0915 :c0281e30: mov 0x10(%esi),%eax
> :c0281e33: mov (%edx,%eax,1),%eax
> 1483 0.1142 :c0281e36: decl (%ebx)
> 898 0.0691 :c0281e38: mov %eax,0x4(%ebx)
> 586 0.0451 :c0281e3b: mov %edx,-0x1c(%ebp)
> 1 7.7e-05 :c0281e3e: pushl -0x10(%ebp)
> 1226 0.0944 :c0281e41: popf
> 26385 2.0311 :c0281e42: testl $0x210d00,(%esi)
> 1188 0.0915 :c0281e48: je c0281ef8 <kmem_cache_alloc+0xe8>
> :c0281e4e: mov -0x1c(%ebp),%eax
> :c0281e51: test %eax,%eax
> :c0281e53: je c0281ef8 <kmem_cache_alloc+0xe8>
> :c0281e59: mov -0x14(%ebp),%ecx
> :c0281e5c: mov -0x1c(%ebp),%edx
> :c0281e5f: mov %esi,%eax
> :c0281e61: call c0280d60 <alloc_debug_processing>
> :c0281e66: test %eax,%eax
> :c0281e68: jne c0281ef8 <kmem_cache_alloc+0xe8>
> 1205 0.0928 :c0281e6e: pushf
> 4888 0.3763 :c0281e6f: popl -0x10(%ebp)
> 319 0.0246 :c0281e72: cli
> 5975 0.4599 :c0281e73: nop
> :c0281e74: lea 0x0(%esi,%eiz,1),%esi
> :c0281e78: mov %fs:0xc068d004,%eax
> 1166 0.0898 :c0281e7e: mov 0x38(%esi,%eax,4),%ebx
> 26 0.0020 :c0281e82: mov 0x4(%ebx),%edx
> 662 0.0510 :c0281e85: test %edx,%edx
> :c0281e87: jne c0281e30 <kmem_cache_alloc+0x20>
> :c0281e89: mov 0xc(%ebx),%eax
> :c0281e8c: test %eax,%eax
> :c0281e8e: jne c0281ec8 <kmem_cache_alloc+0xb8>
> :c0281e90: mov %ebx,%edx
> :c0281e92: mov %esi,%eax
> :c0281e94: call c0280010 <__cache_list_get_page>
> :c0281e99: mov %eax,%edx
> :c0281e9b: test %eax,%eax
> :c0281e9d: jne c0281f31 <kmem_cache_alloc+0x121>
> :c0281ea3: mov $0xffffffff,%ecx
> 1 7.7e-05 :c0281ea8: mov -0x18(%ebp),%edx
> :c0281eab: mov %esi,%eax
> :c0281ead: call c02815a0 <__slab_alloc_page>
> :c0281eb2: test %eax,%eax
> :c0281eb4: jne c0281e78 <kmem_cache_alloc+0x68>
> :c0281eb6: movl $0x0,-0x1c(%ebp)
> :c0281ebd: jmp c0281e3e <kmem_cache_alloc+0x2e>
> :c0281ec2: lea 0x0(%esi),%esi
> :c0281ec8: mov %esi,%eax
> :c0281eca: mov %ebx,%edx
> :c0281ecc: call c0280240 <claim_remote_free_list>
> :c0281ed1: mov 0x4(%esi),%eax
> :c0281ed4: shl $0x2,%eax
> :c0281ed7: cmp %eax,(%ebx)
> :c0281ed9: ja c0281f48 <kmem_cache_alloc+0x138>
> :c0281edb: mov 0x4(%ebx),%edx
> :c0281ede: mov %edx,-0x1c(%ebp)
> :c0281ee1: test %edx,%edx
> :c0281ee3: je c0281e90 <kmem_cache_alloc+0x80>
> :c0281ee5: mov 0x10(%esi),%eax
> :c0281ee8: mov (%edx,%eax,1),%eax
> :c0281eeb: decl (%ebx)
> :c0281eed: mov %eax,0x4(%ebx)
> :c0281ef0: jmp c0281e3e <kmem_cache_alloc+0x2e>
>
> :c0281ef5: lea 0x0(%esi),%esi
> 1261 0.0971 :c0281ef8: cmpw $0x0,-0x18(%ebp)
> 957 0.0737 :c0281efd: jns c0281f26 <kmem_cache_alloc+0x116>
> 627 0.0483 :c0281eff: mov -0x1c(%ebp),%eax
> :c0281f02: test %eax,%eax
> :c0281f04: je c0281f26 <kmem_cache_alloc+0x116>
> 82 0.0063 :c0281f06: mov 0xc(%esi),%esi
> 2 1.5e-04 :c0281f09: mov -0x1c(%ebp),%ebx
> 527 0.0406 :c0281f0c: mov %esi,%ecx
> :c0281f0e: mov %ebx,%edi
> 86 0.0066 :c0281f10: shr $0x2,%ecx
> 602 0.0463 :c0281f13: xor %eax,%eax
> 1 7.7e-05 :c0281f15: mov %esi,%edx
> 74845 5.7614 :c0281f17: rep stos %eax,%es:(%edi)
> 1170 0.0901 :c0281f19: test $0x2,%dl
> 2 1.5e-04 :c0281f1c: je c0281f20 <kmem_cache_alloc+0x110>
> :c0281f1e: stos %ax,%es:(%edi)
> 600 0.0462 :c0281f20: test $0x1,%dl
> :c0281f23: je c0281f26 <kmem_cache_alloc+0x116>
> :c0281f25: stos %al,%es:(%edi)
> 1171 0.0901 :c0281f26: mov -0x1c(%ebp),%eax
> 199 0.0153 :c0281f29: add $0x10,%esp
> :c0281f2c: pop %ebx
> 2 1.5e-04 :c0281f2d: pop %esi
> 1215 0.0935 :c0281f2e: pop %edi
> 548 0.0422 :c0281f2f: leave
> 1251 0.0963 :c0281f30: ret
> :c0281f31: mov 0x10(%edx),%ecx
> :c0281f34: mov %ecx,-0x1c(%ebp)
> :c0281f37: mov 0x10(%esi),%eax
> :c0281f3a: mov (%ecx,%eax,1),%eax
> :c0281f3d: mov %eax,0x10(%edx)
> :c0281f40: jmp c0281e3e <kmem_cache_alloc+0x2e>
> :c0281f45: lea 0x0(%esi),%esi
> :c0281f48: mov %ebx,%edx
> :c0281f4a: mov %esi,%eax
> :c0281f4c: call c02811d0 <flush_free_list>
> :c0281f51: jmp c0281edb <kmem_cache_alloc+0xcb>
> :c0281f53: lea 0x0(%esi),%esi
next prev parent reply other threads:[~2008-12-12 5:51 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-12 0:25 [rfc][patch] SLQB slab allocator Nick Piggin
2008-12-12 0:31 ` [rfc][patch] mm: kfree_size Nick Piggin
2008-12-13 2:36 ` Christoph Lameter
2008-12-12 5:38 ` [rfc][patch] SLQB slab allocator Eric Dumazet
2008-12-12 5:50 ` Nick Piggin [this message]
2008-12-12 7:07 ` Eric Dumazet
2008-12-12 7:23 ` Nick Piggin
2008-12-12 8:05 ` Eric Dumazet
2008-12-12 9:43 ` Nick Piggin
2008-12-13 2:34 ` Christoph Lameter
2008-12-13 9:03 ` Pekka Enberg
2008-12-15 1:51 ` Christoph Lameter
2008-12-15 1:51 ` Christoph Lameter
2008-12-14 23:04 ` Nick Piggin
2008-12-14 23:04 ` Nick Piggin
2008-12-15 14:02 ` Christoph Lameter
2008-12-15 14:02 ` Christoph Lameter
2008-12-15 14:16 ` Nick Piggin
2008-12-15 14:16 ` Nick Piggin
2008-12-15 15:03 ` Christoph Lameter
2008-12-15 15:03 ` Christoph Lameter
2008-12-15 23:42 ` MinChan Kim
2008-12-15 23:42 ` MinChan Kim
2008-12-17 6:42 ` Nick Piggin
2008-12-17 6:42 ` Nick Piggin
2008-12-17 7:01 ` MinChan Kim
2008-12-17 7:01 ` MinChan Kim
2008-12-17 7:09 ` Nick Piggin
2008-12-17 7:09 ` Nick Piggin
2008-12-19 7:48 ` Zhang, Yanmin
2008-12-19 7:48 ` Zhang, Yanmin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081212055051.GE15804@wotan.suse.de \
--to=npiggin@suse.de \
--cc=dada1@cosmosbay.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.