* client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
@ 2015-05-10 9:05 Alexandre DERUMIER
2015-05-11 5:53 ` [Cbt] " Alexandre DERUMIER
0 siblings, 1 reply; 13+ messages in thread
From: Alexandre DERUMIER @ 2015-05-10 9:05 UTC (permalink / raw)
To: cbt, ceph-devel
Hi,
I have done some client benchmark with fio-rbd.
ubuntu vivid vs debian wheezy (kernel 3.16), and the difference is huge:
fio-rbd benchmark, with 10jobs - 4k randread - 1 rbd volume of 10GB - 1 osd:
ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
debian wheezy : rbd_cache=false : iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
I have done perf record, and it's really different.
(I'll try to test with debian jessie and centos7 to compare)
debian wheezy:
---------------
+ 12.39% 0.19% fio libstdc++.so.6.0.17 [.] operator new(unsigned long)
+ 12.07% 5.28% fio libc-2.13.so [.] malloc
+ 9.41% 0.20% fio libc-2.13.so [.] __lll_unlock_wake_private
+ 8.76% 5.78% fio libc-2.13.so [.] free
+ 8.01% 5.96% fio libc-2.13.so [.] _int_malloc
+ 7.55% 1.08% fio libc-2.13.so [.] __lll_lock_wait_private
+ 4.67% 4.67% fio [kernel.kallsyms] [k] _raw_spin_lock
+ 4.51% 4.51% swapper [kernel.kallsyms] [k] intel_idle
+ 3.47% 0.02% fio libpthread-2.13.so [.] 0x000000000000e72d
+ 3.31% 0.02% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
+ 3.29% 0.01% fio libc-2.13.so [.] __posix_memalign
+ 3.28% 0.27% fio libc-2.13.so [.] __libc_memalign
+ 2.95% 0.07% fio libc-2.13.so [.] _int_memalign
+ 2.87% 0.25% fio libpthread-2.13.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
+ 2.81% 2.50% fio libc-2.13.so [.] _int_free
+ 2.42% 0.18% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
+ 2.29% 0.25% fio libpthread-2.13.so [.] pthread_cond_wait@@GLIBC_2.3.2
+ 1.83% 1.71% fio libc-2.13.so [.] malloc_consolidate
+ 1.79% 0.05% fio libpthread-2.13.so [.] __libc_recv
+ 1.79% 1.79% fio [kernel.kallsyms] [k] futex_wake
+ 1.61% 0.06% fio libpthread-2.13.so [.] __lll_unlock_wake
+ 1.50% 1.50% fio [kernel.kallsyms] [k] futex_wait_setup
+ 1.47% 0.00% fio libc-2.13.so [.] __clone
+ 1.47% 0.00% fio libpthread-2.13.so [.] start_thread
+ 1.47% 0.09% fio fio [.] thread_main
+ 1.44% 1.29% fio libc-2.13.so [.] __memcpy_ssse3
+ 1.39% 1.39% swapper [kernel.kallsyms] [k] native_write_msr_safe
+ 1.38% 0.09% fio libpthread-2.13.so [.] __lll_lock_wait
+ 1.26% 0.07% fio libc-2.13.so [.] _L_lock_9676
+ 1.20% 1.20% fio [kernel.kallsyms] [k] try_to_wake_up
+ 1.17% 1.17% fio [kernel.kallsyms] [k] __schedule
+ 1.12% 1.03% fio libc-2.13.so [.] arena_get2
+ 1.08% 1.08% fio [kernel.kallsyms] [k] get_futex_key_refs
+ 0.96% 0.96% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 0.93% 0.93% swapper [kernel.kallsyms] [k] enqueue_task_fair
+ 0.87% 0.87% swapper [kernel.kallsyms] [k] __schedule
+ 0.87% 0.80% fio libpthread-2.13.so [.] pthread_mutex_trylock
+ 0.81% 0.01% fio librados.so.2.0.0 [.] 0x0000000000292030
+ 0.75% 0.75% swapper [kernel.kallsyms] [k] __switch_to
+ 0.74% 0.74% fio [kernel.kallsyms] [k] futex_wait
+ 0.72% 0.01% fio fio [.] wait_for_completions
+ 0.71% 0.71% fio [kernel.kallsyms] [k] __switch_to
+ 0.71% 0.01% fio fio [.] io_u_queued_complete
+ 0.70% 0.70% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave
+ 0.68% 0.68% fio [kernel.kallsyms] [k] finish_task_switch
+ 0.68% 0.00% fio librados.so.2.0.0 [.] 0x0000000000050ad0
+ 0.65% 0.61% fio libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt
+ 0.64% 0.64% fio [kernel.kallsyms] [k] wake_futex
+ 0.64% 0.00% fio librbd.so.1.0.0 [.] 0x0000000000045f9b
+ 0.63% 0.10% fio librados.so.2.0.0 [.] ceph::buffer::ptr::append(char const*, unsigned int)
+ 0.63% 0.00% fio [unknown] [.] 0x0000000000000030
+ 0.63% 0.63% fio [kernel.kallsyms] [k] _raw_spin_lock_irqsave
+ 0.60% 0.00% fio librados.so.2.0.0 [.] 0x0000000000103fec
+ 0.59% 0.00% fio librbd.so.1.0.0 [.] 0x000000000033a9ae
+ 0.58% 0.00% fio librbd.so.1.0.0 [.] 0x0000000000044962
+ 0.57% 0.00% fio fio [.] td_io_getevents
+ 0.56% 0.02% fio fio [.] fio_rbd_getevents
+ 0.55% 0.00% fio librbd.so.1.0.0 [.] 0x000000000033b23e
ubuntu vivid
------------
+ 28,37% 0,03% fio [kernel.kallsyms] [k] system_call_fastpath
+ 20,78% 0,67% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 20,60% 0,00% swapper [kernel.kallsyms] [k] start_secondary
+ 18,85% 0,09% fio [kernel.kallsyms] [k] sys_futex
+ 18,71% 0,08% fio [kernel.kallsyms] [k] do_futex
+ 8,39% 0,45% fio [kernel.kallsyms] [k] wake_futex
+ 8,31% 0,42% fio [kernel.kallsyms] [k] try_to_wake_up
+ 7,78% 0,00% fio [kernel.kallsyms] [k] wake_up_state
+ 7,67% 0,04% fio [kernel.kallsyms] [k] ret_from_intr
+ 7,62% 0,04% fio [kernel.kallsyms] [k] do_IRQ
+ 7,41% 0,34% fio [kernel.kallsyms] [k] futex_wait
+ 7,10% 0,00% swapper [kernel.kallsyms] [k] cpuidle_enter
+ 6,97% 0,05% fio [kernel.kallsyms] [k] __do_softirq
+ 6,88% 0,02% fio [kernel.kallsyms] [k] irq_exit
+ 6,71% 0,34% fio libpthread-2.21.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
+ 5,93% 0,57% fio [kernel.kallsyms] [k] futex_requeue
+ 5,77% 0,04% fio [kernel.kallsyms] [k] net_rx_action
+ 5,57% 0,14% fio [kernel.kallsyms] [k] schedule
+ 5,49% 0,96% fio [kernel.kallsyms] [k] __sched_text_start
+ 5,38% 0,00% fio [unknown] [.] 0x0000000000000001
+ 5,20% 0,20% fio [kernel.kallsyms] [k] futex_wait_queue_me
+ 5,09% 0,00% fio [kernel.kallsyms] [k] mlx4_en_poll_rx_cq
+ 4,99% 0,00% fio [unknown] [.] 0000000000000000
+ 4,99% 0,06% fio libpthread-2.21.so [.] __libc_sendmsg
+ 4,87% 0,57% fio [kernel.kallsyms] [k] futex_wake
+ 4,78% 0,36% fio libpthread-2.21.so [.] pthread_cond_wait@@GLIBC_2.3.2
+ 4,75% 0,00% fio [kernel.kallsyms] [k] sys_sendmsg
+ 4,73% 0,13% swapper [kernel.kallsyms] [k] cpuidle_enter_state
+ 4,69% 0,02% fio [kernel.kallsyms] [k] __sys_sendmsg
+ 4,69% 0,00% fio [unknown] [.] 0x00401f0f00000000
+ 4,69% 0,00% fio librados.so.2.0.0 [.] 0xffff8099ead6f590
+ 4,54% 0,00% fio [unknown] [.] 0x0000000100000002
+ 4,45% 4,45% swapper [kernel.kallsyms] [k] intel_idle
+ 4,37% 0,11% fio [kernel.kallsyms] [k] ___sys_sendmsg
+ 4,06% 0,04% fio [kernel.kallsyms] [k] do_sock_sendmsg
+ 3,98% 0,08% fio libpthread-2.21.so [.] __libc_recv
+ 3,89% 0,14% fio [kernel.kallsyms] [k] inet_sendmsg
+ 3,82% 3,41% fio libc-2.21.so [.] malloc
+ 3,78% 0,01% fio [kernel.kallsyms] [k] sys_recvfrom
+ 3,77% 0,01% fio [kernel.kallsyms] [k] netif_receive_skb_internal
+ 3,75% 0,01% fio [kernel.kallsyms] [k] __netif_receive_skb
+ 3,74% 0,09% fio [kernel.kallsyms] [k] __netif_receive_skb_core
+ 3,73% 0,08% fio [kernel.kallsyms] [k] SYSC_recvfrom
+ 3,70% 0,18% fio [kernel.kallsyms] [k] mlx4_en_process_rx_cq
+ 3,65% 0,29% fio [kernel.kallsyms] [k] tcp_sendmsg
+ 3,52% 0,00% fio [kernel.kallsyms] [k] tcp_v4_do_rcv
+ 3,49% 0,10% fio [kernel.kallsyms] [k] tcp_rcv_established
+ 3,48% 0,05% fio [kernel.kallsyms] [k] ip_rcv
+ 3,44% 0,01% fio [kernel.kallsyms] [k] ip_rcv_finish
+ 3,41% 0,02% fio [kernel.kallsyms] [k] sock_recvmsg
+ 3,31% 0,01% fio [kernel.kallsyms] [k] ip_local_deliver
+ 3,30% 0,00% fio [kernel.kallsyms] [k] ip_local_deliver_finish
+ 3,30% 0,02% fio [kernel.kallsyms] [k] inet_recvmsg
+ 3,28% 0,07% fio [kernel.kallsyms] [k] dequeue_task
+ 3,27% 0,00% fio [kernel.kallsyms] [k] deactivate_task
+ 3,27% 0,09% fio [kernel.kallsyms] [k] tcp_v4_rcv
+ 3,17% 0,16% fio [kernel.kallsyms] [k] tcp_recvmsg
+ 3,08% 0,29% fio [kernel.kallsyms] [k] dequeue_task_fair
+ 3,07% 0,09% fio libpthread-2.21.so [.] __lll_unlock_wake
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-10 9:05 client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference Alexandre DERUMIER
@ 2015-05-11 5:53 ` Alexandre DERUMIER
2015-05-11 10:30 ` Stefan Priebe - Profihost AG
2015-05-11 13:45 ` Mark Nelson
0 siblings, 2 replies; 13+ messages in thread
From: Alexandre DERUMIER @ 2015-05-11 5:53 UTC (permalink / raw)
To: cbt, ceph-devel; +Cc: Stefan Priebe
Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
Maybe is it related to old glibc on debian wheezy ?
debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
jessie perf report
------------------
+ 9,18% 3,75% fio libc-2.19.so [.] malloc
+ 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
+ 5,83% 5,64% fio libc-2.19.so [.] _int_free
+ 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
+ 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
+ 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
+ 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
+ 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
+ 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
+ 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
+ 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
+ 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
+ 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
+ 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
+ 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
+ 1,88% 0,00% fio libc-2.19.so [.] __clone
+ 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
+ 1,88% 0,12% fio fio [.] thread_main
+ 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
+ 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
+ 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
+ 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
+ 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
+ 1,19% 1,19% fio libc-2.19.so [.] free
+ 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
+ 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
+ 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
+ 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
+ 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
+ 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
+ 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
+ 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
+ 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
+ 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
+ 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
+ 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
+ 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
+ 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
+ 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
+ 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
+ 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
+ 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
+ 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
+ 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
+ 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
+ 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
+ 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
+ 0,70% 0,23% fio fio [.] get_io_u
+ 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
+ 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
+ 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
+ 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
+ 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
+ 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-11 5:53 ` [Cbt] " Alexandre DERUMIER
@ 2015-05-11 10:30 ` Stefan Priebe - Profihost AG
2015-05-11 14:20 ` Alexandre DERUMIER
2015-05-11 13:45 ` Mark Nelson
1 sibling, 1 reply; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2015-05-11 10:30 UTC (permalink / raw)
To: Alexandre DERUMIER, cbt, ceph-devel
Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER:
> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>
> Maybe is it related to old glibc on debian wheezy ?
That's pretty interesting. I wasn't aware that there were performance
optimisations in glibc.
As you have a test setup. Is it possible to install jessie libc on wheezy?
Stefan
>
> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>
>
> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
>
>
>
> jessie perf report
> ------------------
> + 9,18% 3,75% fio libc-2.19.so [.] malloc
> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
> + 5,83% 5,64% fio libc-2.19.so [.] _int_free
> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
> + 1,88% 0,00% fio libc-2.19.so [.] __clone
> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
> + 1,88% 0,12% fio fio [.] thread_main
> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
> + 1,19% 1,19% fio libc-2.19.so [.] free
> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
> + 0,70% 0,23% fio fio [.] get_io_u
> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-11 5:53 ` [Cbt] " Alexandre DERUMIER
2015-05-11 10:30 ` Stefan Priebe - Profihost AG
@ 2015-05-11 13:45 ` Mark Nelson
2015-05-11 14:15 ` Alexandre DERUMIER
1 sibling, 1 reply; 13+ messages in thread
From: Mark Nelson @ 2015-05-11 13:45 UTC (permalink / raw)
To: Alexandre DERUMIER, cbt, ceph-devel; +Cc: Stefan Priebe
On 05/11/2015 12:53 AM, Alexandre DERUMIER wrote:
> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>
> Maybe is it related to old glibc on debian wheezy ?
>
>
>
>
>
> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>
>
> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
Isn't Wheezy a pretty old kernel too? (like 3.2?) There's been a ton of
changes since then. Originally I was thinking this might have been some
of the new network/inode optimizations in 3.18+ for vivid, but if Jesse
is better maybe it's some of the other kernel changes (or perhaps it's
glibc or something else). FWIW, we've noticed a pretty significant
performance improvement going from CentOS 6.5 to RHEL7 on the same
hardware. Haven't looked very deeply into it, but there definitely
appears to be an advantage to running modern distributions.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-11 13:45 ` Mark Nelson
@ 2015-05-11 14:15 ` Alexandre DERUMIER
0 siblings, 0 replies; 13+ messages in thread
From: Alexandre DERUMIER @ 2015-05-11 14:15 UTC (permalink / raw)
To: Mark Nelson; +Cc: cbt, ceph-devel, Stefan Priebe
>>Isn't Wheezy a pretty old kernel too? (like 3.2?
Yes, but for my test, I have used 3.16 kernel from wheezy-backport. (same than jessie)
So no kernel difference, it must be something in librairies.
----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "aderumier" <aderumier@odiso.com>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Cc: "Stefan Priebe" <s.priebe@profihost.ag>
Envoyé: Lundi 11 Mai 2015 15:45:45
Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
On 05/11/2015 12:53 AM, Alexandre DERUMIER wrote:
> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>
> Maybe is it related to old glibc on debian wheezy ?
>
>
>
>
>
> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>
>
> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
Isn't Wheezy a pretty old kernel too? (like 3.2?) There's been a ton of
changes since then. Originally I was thinking this might have been some
of the new network/inode optimizations in 3.18+ for vivid, but if Jesse
is better maybe it's some of the other kernel changes (or perhaps it's
glibc or something else). FWIW, we've noticed a pretty significant
performance improvement going from CentOS 6.5 to RHEL7 on the same
hardware. Haven't looked very deeply into it, but there definitely
appears to be an advantage to running modern distributions.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-11 10:30 ` Stefan Priebe - Profihost AG
@ 2015-05-11 14:20 ` Alexandre DERUMIER
2015-05-11 21:38 ` Milosz Tanski
0 siblings, 1 reply; 13+ messages in thread
From: Alexandre DERUMIER @ 2015-05-11 14:20 UTC (permalink / raw)
To: Stefan Priebe; +Cc: cbt, ceph-devel
>>That's pretty interesting. I wasn't aware that there were performance
>>optimisations in glibc.
>>
>>As you have a test setup. Is it possible to install jessie libc on wheezy?
mmm, I can try that. Not sure it'll work.
BTW, librbd cpu usage is always 3x-4x more than KRBD.
a lot of cpu is used from malloc/free. It could be great to optimise that.
I don't known if jemmaloc or tcmalloc could be used, like for osd daemons ?
Reducing cpu usage could improve a lot qemu performance, as qemu use only 1 thread by disk.
----- Mail original -----
De: "Stefan Priebe" <s.priebe@profihost.ag>
À: "aderumier" <aderumier@odiso.com>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Lundi 11 Mai 2015 12:30:03
Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER:
> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>
> Maybe is it related to old glibc on debian wheezy ?
That's pretty interesting. I wasn't aware that there were performance
optimisations in glibc.
As you have a test setup. Is it possible to install jessie libc on wheezy?
Stefan
>
> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>
>
> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
>
>
>
> jessie perf report
> ------------------
> + 9,18% 3,75% fio libc-2.19.so [.] malloc
> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
> + 5,83% 5,64% fio libc-2.19.so [.] _int_free
> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
> + 1,88% 0,00% fio libc-2.19.so [.] __clone
> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
> + 1,88% 0,12% fio fio [.] thread_main
> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
> + 1,19% 1,19% fio libc-2.19.so [.] free
> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
> + 0,70% 0,23% fio fio [.] get_io_u
> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-11 14:20 ` Alexandre DERUMIER
@ 2015-05-11 21:38 ` Milosz Tanski
2015-05-12 0:34 ` Alexandre DERUMIER
0 siblings, 1 reply; 13+ messages in thread
From: Milosz Tanski @ 2015-05-11 21:38 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: Stefan Priebe, cbt, ceph-devel
On Mon, May 11, 2015 at 10:20 AM, Alexandre DERUMIER
<aderumier@odiso.com> wrote:
>>>That's pretty interesting. I wasn't aware that there were performance
>>>optimisations in glibc.
>>>
>>>As you have a test setup. Is it possible to install jessie libc on wheezy?
>
> mmm, I can try that. Not sure it'll work.
>
>
> BTW, librbd cpu usage is always 3x-4x more than KRBD.
> a lot of cpu is used from malloc/free. It could be great to optimise that.
>
> I don't known if jemmaloc or tcmalloc could be used, like for osd daemons ?
You can try it and see if it'll make a difference. Set LD_PRELOAD to
include the so of jemalloc / tcmalloc before starting FIO. Like this:
$ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
$ ./run_test.sh
As a matter of policy, libraries shouldn't force a particular malloc
implementation on the users of a particular library. It might go
against the user's wishes, not to mention what conflicts would happen
if one library wanted / needed jamalloc while another one wanted /
needed tcmalloc.
>
>
> Reducing cpu usage could improve a lot qemu performance, as qemu use only 1 thread by disk.
>
>
>
> ----- Mail original -----
> De: "Stefan Priebe" <s.priebe@profihost.ag>
> À: "aderumier" <aderumier@odiso.com>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Lundi 11 Mai 2015 12:30:03
> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>
> Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER:
>> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>>
>> Maybe is it related to old glibc on debian wheezy ?
>
> That's pretty interesting. I wasn't aware that there were performance
> optimisations in glibc.
>
> As you have a test setup. Is it possible to install jessie libc on wheezy?
>
> Stefan
>
>
>>
>> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>
>>
>> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
>> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
>> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
>> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
>>
>>
>>
>> jessie perf report
>> ------------------
>> + 9,18% 3,75% fio libc-2.19.so [.] malloc
>> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
>> + 5,83% 5,64% fio libc-2.19.so [.] _int_free
>> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
>> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
>> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
>> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
>> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
>> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
>> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
>> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
>> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
>> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
>> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
>> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
>> + 1,88% 0,00% fio libc-2.19.so [.] __clone
>> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
>> + 1,88% 0,12% fio fio [.] thread_main
>> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
>> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
>> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
>> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
>> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
>> + 1,19% 1,19% fio libc-2.19.so [.] free
>> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
>> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
>> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
>> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
>> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
>> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
>> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
>> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
>> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
>> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
>> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
>> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
>> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
>> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
>> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
>> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
>> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
>> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
>> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
>> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
>> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
>> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
>> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
>> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
>> + 0,70% 0,23% fio fio [.] get_io_u
>> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
>> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
>> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
>> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
>> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
>> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-11 21:38 ` Milosz Tanski
@ 2015-05-12 0:34 ` Alexandre DERUMIER
2015-05-12 6:12 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 13+ messages in thread
From: Alexandre DERUMIER @ 2015-05-12 0:34 UTC (permalink / raw)
To: Milosz Tanski; +Cc: Stefan Priebe, cbt, ceph-devel
>>ou can try it and see if it'll make a difference. Set LD_PRELOAD to
>>include the so of jemalloc / tcmalloc before starting FIO. Like this:
>>
>>$ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
>>$ ./run_test.sh
Thanks it's working.
Seem that jemmaloc with fio-rbd give 17% iops improvement and reduce latencies and cpu usage !
results with 1 numjob:
glibc : iops=36668 usr=62.23%, sys=12.13%
libtcmalloc : iops=36105 usr=63.54%, sys=8.45%
jemalloc: iops=43181 usr=60.91%, sys=10.51%
(with 10numjobs, i'm around 240k iops with jemalloc vs 220k iops with glibc/tcmalloc)
I just found a qemu git a patch to enable tcmalloc
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=2847b46958ab0bd604e1b3fcafba0f5ba4375833
I'll try to test it to see if it's help
fio results
------------
glibc
-----
Jobs: 1 (f=1): [r(1)] [100.0% done] [123.9MB/0KB/0KB /s] [31.8K/0/0 iops] [eta 00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7239: Tue May 12 02:05:46 2015
read : io=30000MB, bw=146675KB/s, iops=36668, runt=209443msec
slat (usec): min=8, max=1245, avg=26.07, stdev=13.99
clat (usec): min=107, max=4752, avg=525.40, stdev=207.46
lat (usec): min=126, max=4767, avg=551.47, stdev=208.27
clat percentiles (usec):
| 1.00th=[ 171], 5.00th=[ 215], 10.00th=[ 253], 20.00th=[ 322],
| 30.00th=[ 386], 40.00th=[ 450], 50.00th=[ 516], 60.00th=[ 588],
| 70.00th=[ 652], 80.00th=[ 716], 90.00th=[ 796], 95.00th=[ 868],
| 99.00th=[ 996], 99.50th=[ 1048], 99.90th=[ 1192], 99.95th=[ 1240],
| 99.99th=[ 1368]
bw (KB /s): min=112328, max=176848, per=100.00%, avg=146768.86, stdev=12974.09
lat (usec) : 250=9.61%, 500=37.58%, 750=37.25%, 1000=14.60%
lat (msec) : 2=0.96%, 4=0.01%, 10=0.01%
cpu : usr=62.23%, sys=12.13%, ctx=10008821, majf=0, minf=1348
IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.8%, 16=64.2%, 32=4.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.9%, 64=0.0%, >=64=0.0%
issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: io=30000MB, aggrb=146674KB/s, minb=146674KB/s, maxb=146674KB/s, mint=209443msec, maxt=209443msec
Disk stats (read/write):
sdb: ios=0/22, merge=0/13, ticks=0/0, in_queue=0, util=0.00%
jemmaloc
--------
Jobs: 1 (f=1): [r(1)] [100.0% done] [165.4MB/0KB/0KB /s] [42.3K/0/0 iops] [eta 00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7137: Tue May 12 02:01:25 2015
read : io=30000MB, bw=172726KB/s, iops=43181, runt=177854msec
slat (usec): min=6, max=563, avg=22.28, stdev=14.68
clat (usec): min=95, max=3559, avg=456.29, stdev=168.37
lat (usec): min=110, max=3579, avg=478.56, stdev=169.06
clat percentiles (usec):
| 1.00th=[ 161], 5.00th=[ 201], 10.00th=[ 233], 20.00th=[ 290],
| 30.00th=[ 346], 40.00th=[ 402], 50.00th=[ 454], 60.00th=[ 506],
| 70.00th=[ 556], 80.00th=[ 612], 90.00th=[ 676], 95.00th=[ 732],
| 99.00th=[ 844], 99.50th=[ 900], 99.90th=[ 1020], 99.95th=[ 1064],
| 99.99th=[ 1192]
bw (KB /s): min=129936, max=199712, per=100.00%, avg=172822.83, stdev=11812.99
lat (usec) : 100=0.01%, 250=12.77%, 500=45.87%, 750=37.60%, 1000=3.62%
lat (msec) : 2=0.13%, 4=0.01%
cpu : usr=60.91%, sys=10.51%, ctx=9329053, majf=0, minf=1687
IO depths : 1=0.1%, 2=0.1%, 4=1.8%, 8=26.4%, 16=67.5%, 32=4.2%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.9%, 8=0.1%, 16=0.1%, 32=4.0%, 64=0.0%, >=64=0.0%
issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: io=30000MB, aggrb=172725KB/s, minb=172725KB/s, maxb=172725KB/s, mint=177854msec, maxt=177854msec
Disk stats (read/write):
sdb: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
libtcmalloc
------------
rbd engine: RBD version: 0.1.10
Jobs: 1 (f=1): [r(1)] [100.0% done] [140.1MB/0KB/0KB /s] [35.9K/0/0 iops] [eta 00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7039: Tue May 12 01:57:41 2015
read : io=30000MB, bw=144423KB/s, iops=36105, runt=212708msec
slat (usec): min=10, max=803, avg=26.65, stdev=17.68
clat (usec): min=54, max=5052, avg=530.82, stdev=216.05
lat (usec): min=114, max=5531, avg=557.46, stdev=217.22
clat percentiles (usec):
| 1.00th=[ 169], 5.00th=[ 213], 10.00th=[ 251], 20.00th=[ 322],
| 30.00th=[ 386], 40.00th=[ 454], 50.00th=[ 524], 60.00th=[ 596],
| 70.00th=[ 660], 80.00th=[ 724], 90.00th=[ 804], 95.00th=[ 876],
| 99.00th=[ 1048], 99.50th=[ 1128], 99.90th=[ 1336], 99.95th=[ 1464],
| 99.99th=[ 2256]
bw (KB /s): min=60416, max=161496, per=100.00%, avg=144529.50, stdev=10827.54
lat (usec) : 100=0.01%, 250=9.88%, 500=36.69%, 750=36.97%, 1000=14.88%
lat (msec) : 2=1.57%, 4=0.01%, 10=0.01%
cpu : usr=63.54%, sys=8.45%, ctx=9209514, majf=0, minf=2120
IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.9%, 16=64.0%, 32=4.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.8%, 64=0.0%, >=64=0.0%
issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
----- Mail original -----
De: "Milosz Tanski" <milosz@adfin.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Stefan Priebe" <s.priebe@profihost.ag>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Lundi 11 Mai 2015 23:38:51
Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
On Mon, May 11, 2015 at 10:20 AM, Alexandre DERUMIER
<aderumier@odiso.com> wrote:
>>>That's pretty interesting. I wasn't aware that there were performance
>>>optimisations in glibc.
>>>
>>>As you have a test setup. Is it possible to install jessie libc on wheezy?
>
> mmm, I can try that. Not sure it'll work.
>
>
> BTW, librbd cpu usage is always 3x-4x more than KRBD.
> a lot of cpu is used from malloc/free. It could be great to optimise that.
>
> I don't known if jemmaloc or tcmalloc could be used, like for osd daemons ?
You can try it and see if it'll make a difference. Set LD_PRELOAD to
include the so of jemalloc / tcmalloc before starting FIO. Like this:
$ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
$ ./run_test.sh
As a matter of policy, libraries shouldn't force a particular malloc
implementation on the users of a particular library. It might go
against the user's wishes, not to mention what conflicts would happen
if one library wanted / needed jamalloc while another one wanted /
needed tcmalloc.
>
>
> Reducing cpu usage could improve a lot qemu performance, as qemu use only 1 thread by disk.
>
>
>
> ----- Mail original -----
> De: "Stefan Priebe" <s.priebe@profihost.ag>
> À: "aderumier" <aderumier@odiso.com>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Lundi 11 Mai 2015 12:30:03
> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>
> Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER:
>> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>>
>> Maybe is it related to old glibc on debian wheezy ?
>
> That's pretty interesting. I wasn't aware that there were performance
> optimisations in glibc.
>
> As you have a test setup. Is it possible to install jessie libc on wheezy?
>
> Stefan
>
>
>>
>> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>
>>
>> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
>> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
>> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
>> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
>>
>>
>>
>> jessie perf report
>> ------------------
>> + 9,18% 3,75% fio libc-2.19.so [.] malloc
>> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
>> + 5,83% 5,64% fio libc-2.19.so [.] _int_free
>> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
>> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
>> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
>> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
>> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
>> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
>> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
>> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
>> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
>> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
>> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
>> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
>> + 1,88% 0,00% fio libc-2.19.so [.] __clone
>> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
>> + 1,88% 0,12% fio fio [.] thread_main
>> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
>> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
>> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
>> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
>> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
>> + 1,19% 1,19% fio libc-2.19.so [.] free
>> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
>> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
>> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
>> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
>> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
>> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
>> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
>> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
>> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
>> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
>> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
>> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
>> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
>> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
>> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
>> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
>> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
>> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
>> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
>> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
>> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
>> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
>> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
>> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
>> + 0,70% 0,23% fio fio [.] get_io_u
>> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
>> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
>> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
>> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
>> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
>> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-12 0:34 ` Alexandre DERUMIER
@ 2015-05-12 6:12 ` Stefan Priebe - Profihost AG
2015-05-12 8:17 ` Alexandre DERUMIER
0 siblings, 1 reply; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2015-05-12 6:12 UTC (permalink / raw)
To: Alexandre DERUMIER, Milosz Tanski; +Cc: cbt, ceph-devel
Am 12.05.2015 um 02:34 schrieb Alexandre DERUMIER:
>>> ou can try it and see if it'll make a difference. Set LD_PRELOAD to
>>> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>>>
>>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
>>> $ ./run_test.sh
>
> Thanks it's working.
>
> Seem that jemmaloc with fio-rbd give 17% iops improvement and reduce latencies and cpu usage !
>
> results with 1 numjob:
>
> glibc : iops=36668 usr=62.23%, sys=12.13%
> libtcmalloc : iops=36105 usr=63.54%, sys=8.45%
> jemalloc: iops=43181 usr=60.91%, sys=10.51%
>
>
> (with 10numjobs, i'm around 240k iops with jemalloc vs 220k iops with glibc/tcmalloc)
>
>
> I just found a qemu git a patch to enable tcmalloc
> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=2847b46958ab0bd604e1b3fcafba0f5ba4375833
> I'll try to test it to see if it's help
Sounds good. Any reason for not switching to tcmalloc by default in PVE?
Stefan
>
>
>
>
>
>
> fio results
> ------------
>
> glibc
> -----
> Jobs: 1 (f=1): [r(1)] [100.0% done] [123.9MB/0KB/0KB /s] [31.8K/0/0 iops] [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7239: Tue May 12 02:05:46 2015
> read : io=30000MB, bw=146675KB/s, iops=36668, runt=209443msec
> slat (usec): min=8, max=1245, avg=26.07, stdev=13.99
> clat (usec): min=107, max=4752, avg=525.40, stdev=207.46
> lat (usec): min=126, max=4767, avg=551.47, stdev=208.27
> clat percentiles (usec):
> | 1.00th=[ 171], 5.00th=[ 215], 10.00th=[ 253], 20.00th=[ 322],
> | 30.00th=[ 386], 40.00th=[ 450], 50.00th=[ 516], 60.00th=[ 588],
> | 70.00th=[ 652], 80.00th=[ 716], 90.00th=[ 796], 95.00th=[ 868],
> | 99.00th=[ 996], 99.50th=[ 1048], 99.90th=[ 1192], 99.95th=[ 1240],
> | 99.99th=[ 1368]
> bw (KB /s): min=112328, max=176848, per=100.00%, avg=146768.86, stdev=12974.09
> lat (usec) : 250=9.61%, 500=37.58%, 750=37.25%, 1000=14.60%
> lat (msec) : 2=0.96%, 4=0.01%, 10=0.01%
> cpu : usr=62.23%, sys=12.13%, ctx=10008821, majf=0, minf=1348
> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.8%, 16=64.2%, 32=4.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.9%, 64=0.0%, >=64=0.0%
> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=30000MB, aggrb=146674KB/s, minb=146674KB/s, maxb=146674KB/s, mint=209443msec, maxt=209443msec
>
> Disk stats (read/write):
> sdb: ios=0/22, merge=0/13, ticks=0/0, in_queue=0, util=0.00%
>
>
> jemmaloc
> --------
> Jobs: 1 (f=1): [r(1)] [100.0% done] [165.4MB/0KB/0KB /s] [42.3K/0/0 iops] [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7137: Tue May 12 02:01:25 2015
> read : io=30000MB, bw=172726KB/s, iops=43181, runt=177854msec
> slat (usec): min=6, max=563, avg=22.28, stdev=14.68
> clat (usec): min=95, max=3559, avg=456.29, stdev=168.37
> lat (usec): min=110, max=3579, avg=478.56, stdev=169.06
> clat percentiles (usec):
> | 1.00th=[ 161], 5.00th=[ 201], 10.00th=[ 233], 20.00th=[ 290],
> | 30.00th=[ 346], 40.00th=[ 402], 50.00th=[ 454], 60.00th=[ 506],
> | 70.00th=[ 556], 80.00th=[ 612], 90.00th=[ 676], 95.00th=[ 732],
> | 99.00th=[ 844], 99.50th=[ 900], 99.90th=[ 1020], 99.95th=[ 1064],
> | 99.99th=[ 1192]
> bw (KB /s): min=129936, max=199712, per=100.00%, avg=172822.83, stdev=11812.99
> lat (usec) : 100=0.01%, 250=12.77%, 500=45.87%, 750=37.60%, 1000=3.62%
> lat (msec) : 2=0.13%, 4=0.01%
> cpu : usr=60.91%, sys=10.51%, ctx=9329053, majf=0, minf=1687
> IO depths : 1=0.1%, 2=0.1%, 4=1.8%, 8=26.4%, 16=67.5%, 32=4.2%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=95.9%, 8=0.1%, 16=0.1%, 32=4.0%, 64=0.0%, >=64=0.0%
> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=30000MB, aggrb=172725KB/s, minb=172725KB/s, maxb=172725KB/s, mint=177854msec, maxt=177854msec
>
> Disk stats (read/write):
> sdb: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
>
> libtcmalloc
> ------------
> rbd engine: RBD version: 0.1.10
> Jobs: 1 (f=1): [r(1)] [100.0% done] [140.1MB/0KB/0KB /s] [35.9K/0/0 iops] [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7039: Tue May 12 01:57:41 2015
> read : io=30000MB, bw=144423KB/s, iops=36105, runt=212708msec
> slat (usec): min=10, max=803, avg=26.65, stdev=17.68
> clat (usec): min=54, max=5052, avg=530.82, stdev=216.05
> lat (usec): min=114, max=5531, avg=557.46, stdev=217.22
> clat percentiles (usec):
> | 1.00th=[ 169], 5.00th=[ 213], 10.00th=[ 251], 20.00th=[ 322],
> | 30.00th=[ 386], 40.00th=[ 454], 50.00th=[ 524], 60.00th=[ 596],
> | 70.00th=[ 660], 80.00th=[ 724], 90.00th=[ 804], 95.00th=[ 876],
> | 99.00th=[ 1048], 99.50th=[ 1128], 99.90th=[ 1336], 99.95th=[ 1464],
> | 99.99th=[ 2256]
> bw (KB /s): min=60416, max=161496, per=100.00%, avg=144529.50, stdev=10827.54
> lat (usec) : 100=0.01%, 250=9.88%, 500=36.69%, 750=36.97%, 1000=14.88%
> lat (msec) : 2=1.57%, 4=0.01%, 10=0.01%
> cpu : usr=63.54%, sys=8.45%, ctx=9209514, majf=0, minf=2120
> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.9%, 16=64.0%, 32=4.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.8%, 64=0.0%, >=64=0.0%
> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
>
>
>
>
> ----- Mail original -----
> De: "Milosz Tanski" <milosz@adfin.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "Stefan Priebe" <s.priebe@profihost.ag>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Lundi 11 Mai 2015 23:38:51
> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>
> On Mon, May 11, 2015 at 10:20 AM, Alexandre DERUMIER
> <aderumier@odiso.com> wrote:
>>>> That's pretty interesting. I wasn't aware that there were performance
>>>> optimisations in glibc.
>>>>
>>>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>
>> mmm, I can try that. Not sure it'll work.
>>
>>
>> BTW, librbd cpu usage is always 3x-4x more than KRBD.
>> a lot of cpu is used from malloc/free. It could be great to optimise that.
>>
>> I don't known if jemmaloc or tcmalloc could be used, like for osd daemons ?
>
> You can try it and see if it'll make a difference. Set LD_PRELOAD to
> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>
> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
> $ ./run_test.sh
>
> As a matter of policy, libraries shouldn't force a particular malloc
> implementation on the users of a particular library. It might go
> against the user's wishes, not to mention what conflicts would happen
> if one library wanted / needed jamalloc while another one wanted /
> needed tcmalloc.
>
>>
>>
>> Reducing cpu usage could improve a lot qemu performance, as qemu use only 1 thread by disk.
>>
>>
>>
>> ----- Mail original -----
>> De: "Stefan Priebe" <s.priebe@profihost.ag>
>> À: "aderumier" <aderumier@odiso.com>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Lundi 11 Mai 2015 12:30:03
>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>>
>> Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER:
>>> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>>>
>>> Maybe is it related to old glibc on debian wheezy ?
>>
>> That's pretty interesting. I wasn't aware that there were performance
>> optimisations in glibc.
>>
>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>
>> Stefan
>>
>>
>>>
>>> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>>
>>>
>>> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
>>> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
>>> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
>>> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
>>>
>>>
>>>
>>> jessie perf report
>>> ------------------
>>> + 9,18% 3,75% fio libc-2.19.so [.] malloc
>>> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
>>> + 5,83% 5,64% fio libc-2.19.so [.] _int_free
>>> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
>>> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
>>> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
>>> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
>>> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
>>> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
>>> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
>>> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
>>> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
>>> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
>>> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
>>> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
>>> + 1,88% 0,00% fio libc-2.19.so [.] __clone
>>> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
>>> + 1,88% 0,12% fio fio [.] thread_main
>>> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
>>> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
>>> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
>>> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
>>> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
>>> + 1,19% 1,19% fio libc-2.19.so [.] free
>>> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
>>> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
>>> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
>>> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
>>> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
>>> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
>>> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
>>> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
>>> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
>>> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
>>> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
>>> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
>>> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
>>> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
>>> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
>>> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
>>> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
>>> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
>>> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
>>> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
>>> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
>>> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
>>> + 0,70% 0,23% fio fio [.] get_io_u
>>> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
>>> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
>>> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
>>> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
>>> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
>>> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-12 6:12 ` Stefan Priebe - Profihost AG
@ 2015-05-12 8:17 ` Alexandre DERUMIER
2015-05-12 14:37 ` Milosz Tanski
0 siblings, 1 reply; 13+ messages in thread
From: Alexandre DERUMIER @ 2015-05-12 8:17 UTC (permalink / raw)
To: Stefan Priebe; +Cc: Milosz Tanski, cbt, ceph-devel
>>Sounds good. Any reason for not switching to tcmalloc by default in PVE?
I'm currently benching it inside qemu, but I don't see too much improvements
I'm around 30000iops by virtio disk, glibc or tcmalloc. (don't known if jemmaloc works fine with qemu)
I don't known if all this memory allocations call could be reduce in librbd/librados ?
----- Mail original -----
De: "Stefan Priebe" <s.priebe@profihost.ag>
À: "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com>
Cc: "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mardi 12 Mai 2015 08:12:08
Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
Am 12.05.2015 um 02:34 schrieb Alexandre DERUMIER:
>>> ou can try it and see if it'll make a difference. Set LD_PRELOAD to
>>> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>>>
>>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
>>> $ ./run_test.sh
>
> Thanks it's working.
>
> Seem that jemmaloc with fio-rbd give 17% iops improvement and reduce latencies and cpu usage !
>
> results with 1 numjob:
>
> glibc : iops=36668 usr=62.23%, sys=12.13%
> libtcmalloc : iops=36105 usr=63.54%, sys=8.45%
> jemalloc: iops=43181 usr=60.91%, sys=10.51%
>
>
> (with 10numjobs, i'm around 240k iops with jemalloc vs 220k iops with glibc/tcmalloc)
>
>
> I just found a qemu git a patch to enable tcmalloc
> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=2847b46958ab0bd604e1b3fcafba0f5ba4375833
> I'll try to test it to see if it's help
Sounds good. Any reason for not switching to tcmalloc by default in PVE?
Stefan
>
>
>
>
>
>
> fio results
> ------------
>
> glibc
> -----
> Jobs: 1 (f=1): [r(1)] [100.0% done] [123.9MB/0KB/0KB /s] [31.8K/0/0 iops] [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7239: Tue May 12 02:05:46 2015
> read : io=30000MB, bw=146675KB/s, iops=36668, runt=209443msec
> slat (usec): min=8, max=1245, avg=26.07, stdev=13.99
> clat (usec): min=107, max=4752, avg=525.40, stdev=207.46
> lat (usec): min=126, max=4767, avg=551.47, stdev=208.27
> clat percentiles (usec):
> | 1.00th=[ 171], 5.00th=[ 215], 10.00th=[ 253], 20.00th=[ 322],
> | 30.00th=[ 386], 40.00th=[ 450], 50.00th=[ 516], 60.00th=[ 588],
> | 70.00th=[ 652], 80.00th=[ 716], 90.00th=[ 796], 95.00th=[ 868],
> | 99.00th=[ 996], 99.50th=[ 1048], 99.90th=[ 1192], 99.95th=[ 1240],
> | 99.99th=[ 1368]
> bw (KB /s): min=112328, max=176848, per=100.00%, avg=146768.86, stdev=12974.09
> lat (usec) : 250=9.61%, 500=37.58%, 750=37.25%, 1000=14.60%
> lat (msec) : 2=0.96%, 4=0.01%, 10=0.01%
> cpu : usr=62.23%, sys=12.13%, ctx=10008821, majf=0, minf=1348
> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.8%, 16=64.2%, 32=4.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.9%, 64=0.0%, >=64=0.0%
> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=30000MB, aggrb=146674KB/s, minb=146674KB/s, maxb=146674KB/s, mint=209443msec, maxt=209443msec
>
> Disk stats (read/write):
> sdb: ios=0/22, merge=0/13, ticks=0/0, in_queue=0, util=0.00%
>
>
> jemmaloc
> --------
> Jobs: 1 (f=1): [r(1)] [100.0% done] [165.4MB/0KB/0KB /s] [42.3K/0/0 iops] [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7137: Tue May 12 02:01:25 2015
> read : io=30000MB, bw=172726KB/s, iops=43181, runt=177854msec
> slat (usec): min=6, max=563, avg=22.28, stdev=14.68
> clat (usec): min=95, max=3559, avg=456.29, stdev=168.37
> lat (usec): min=110, max=3579, avg=478.56, stdev=169.06
> clat percentiles (usec):
> | 1.00th=[ 161], 5.00th=[ 201], 10.00th=[ 233], 20.00th=[ 290],
> | 30.00th=[ 346], 40.00th=[ 402], 50.00th=[ 454], 60.00th=[ 506],
> | 70.00th=[ 556], 80.00th=[ 612], 90.00th=[ 676], 95.00th=[ 732],
> | 99.00th=[ 844], 99.50th=[ 900], 99.90th=[ 1020], 99.95th=[ 1064],
> | 99.99th=[ 1192]
> bw (KB /s): min=129936, max=199712, per=100.00%, avg=172822.83, stdev=11812.99
> lat (usec) : 100=0.01%, 250=12.77%, 500=45.87%, 750=37.60%, 1000=3.62%
> lat (msec) : 2=0.13%, 4=0.01%
> cpu : usr=60.91%, sys=10.51%, ctx=9329053, majf=0, minf=1687
> IO depths : 1=0.1%, 2=0.1%, 4=1.8%, 8=26.4%, 16=67.5%, 32=4.2%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=95.9%, 8=0.1%, 16=0.1%, 32=4.0%, 64=0.0%, >=64=0.0%
> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=30000MB, aggrb=172725KB/s, minb=172725KB/s, maxb=172725KB/s, mint=177854msec, maxt=177854msec
>
> Disk stats (read/write):
> sdb: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
>
> libtcmalloc
> ------------
> rbd engine: RBD version: 0.1.10
> Jobs: 1 (f=1): [r(1)] [100.0% done] [140.1MB/0KB/0KB /s] [35.9K/0/0 iops] [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7039: Tue May 12 01:57:41 2015
> read : io=30000MB, bw=144423KB/s, iops=36105, runt=212708msec
> slat (usec): min=10, max=803, avg=26.65, stdev=17.68
> clat (usec): min=54, max=5052, avg=530.82, stdev=216.05
> lat (usec): min=114, max=5531, avg=557.46, stdev=217.22
> clat percentiles (usec):
> | 1.00th=[ 169], 5.00th=[ 213], 10.00th=[ 251], 20.00th=[ 322],
> | 30.00th=[ 386], 40.00th=[ 454], 50.00th=[ 524], 60.00th=[ 596],
> | 70.00th=[ 660], 80.00th=[ 724], 90.00th=[ 804], 95.00th=[ 876],
> | 99.00th=[ 1048], 99.50th=[ 1128], 99.90th=[ 1336], 99.95th=[ 1464],
> | 99.99th=[ 2256]
> bw (KB /s): min=60416, max=161496, per=100.00%, avg=144529.50, stdev=10827.54
> lat (usec) : 100=0.01%, 250=9.88%, 500=36.69%, 750=36.97%, 1000=14.88%
> lat (msec) : 2=1.57%, 4=0.01%, 10=0.01%
> cpu : usr=63.54%, sys=8.45%, ctx=9209514, majf=0, minf=2120
> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.9%, 16=64.0%, 32=4.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.8%, 64=0.0%, >=64=0.0%
> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
>
>
>
>
> ----- Mail original -----
> De: "Milosz Tanski" <milosz@adfin.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "Stefan Priebe" <s.priebe@profihost.ag>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Lundi 11 Mai 2015 23:38:51
> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>
> On Mon, May 11, 2015 at 10:20 AM, Alexandre DERUMIER
> <aderumier@odiso.com> wrote:
>>>> That's pretty interesting. I wasn't aware that there were performance
>>>> optimisations in glibc.
>>>>
>>>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>
>> mmm, I can try that. Not sure it'll work.
>>
>>
>> BTW, librbd cpu usage is always 3x-4x more than KRBD.
>> a lot of cpu is used from malloc/free. It could be great to optimise that.
>>
>> I don't known if jemmaloc or tcmalloc could be used, like for osd daemons ?
>
> You can try it and see if it'll make a difference. Set LD_PRELOAD to
> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>
> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
> $ ./run_test.sh
>
> As a matter of policy, libraries shouldn't force a particular malloc
> implementation on the users of a particular library. It might go
> against the user's wishes, not to mention what conflicts would happen
> if one library wanted / needed jamalloc while another one wanted /
> needed tcmalloc.
>
>>
>>
>> Reducing cpu usage could improve a lot qemu performance, as qemu use only 1 thread by disk.
>>
>>
>>
>> ----- Mail original -----
>> De: "Stefan Priebe" <s.priebe@profihost.ag>
>> À: "aderumier" <aderumier@odiso.com>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Lundi 11 Mai 2015 12:30:03
>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>>
>> Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER:
>>> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>>>
>>> Maybe is it related to old glibc on debian wheezy ?
>>
>> That's pretty interesting. I wasn't aware that there were performance
>> optimisations in glibc.
>>
>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>
>> Stefan
>>
>>
>>>
>>> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>>
>>>
>>> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
>>> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
>>> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
>>> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
>>>
>>>
>>>
>>> jessie perf report
>>> ------------------
>>> + 9,18% 3,75% fio libc-2.19.so [.] malloc
>>> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
>>> + 5,83% 5,64% fio libc-2.19.so [.] _int_free
>>> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
>>> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
>>> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
>>> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
>>> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
>>> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
>>> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
>>> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
>>> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
>>> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
>>> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
>>> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
>>> + 1,88% 0,00% fio libc-2.19.so [.] __clone
>>> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
>>> + 1,88% 0,12% fio fio [.] thread_main
>>> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
>>> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
>>> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
>>> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
>>> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
>>> + 1,19% 1,19% fio libc-2.19.so [.] free
>>> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
>>> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
>>> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
>>> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
>>> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
>>> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
>>> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
>>> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
>>> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
>>> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
>>> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
>>> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
>>> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
>>> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
>>> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
>>> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
>>> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
>>> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
>>> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
>>> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
>>> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
>>> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
>>> + 0,70% 0,23% fio fio [.] get_io_u
>>> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
>>> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
>>> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
>>> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
>>> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
>>> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-12 8:17 ` Alexandre DERUMIER
@ 2015-05-12 14:37 ` Milosz Tanski
2015-05-12 15:21 ` Alexandre DERUMIER
0 siblings, 1 reply; 13+ messages in thread
From: Milosz Tanski @ 2015-05-12 14:37 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: Stefan Priebe, cbt, ceph-devel
On Tue, May 12, 2015 at 4:17 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>>>Sounds good. Any reason for not switching to tcmalloc by default in PVE?
>
> I'm currently benching it inside qemu, but I don't see too much improvements
>
> I'm around 30000iops by virtio disk, glibc or tcmalloc. (don't known if jemmaloc works fine with qemu)
I'm going to guess that there's a whole slew of stuff that happens
between qemu and the guest that results in a lower bound for iops.
>
>
> I don't known if all this memory allocations call could be reduce in librbd/librados ?
>
Maybe you can use perf to find the worst offending hotspots and place to start?
>
>
> ----- Mail original -----
> De: "Stefan Priebe" <s.priebe@profihost.ag>
> À: "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com>
> Cc: "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mardi 12 Mai 2015 08:12:08
> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>
> Am 12.05.2015 um 02:34 schrieb Alexandre DERUMIER:
>>>> ou can try it and see if it'll make a difference. Set LD_PRELOAD to
>>>> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>>>>
>>>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
>>>> $ ./run_test.sh
>>
>> Thanks it's working.
>>
>> Seem that jemmaloc with fio-rbd give 17% iops improvement and reduce latencies and cpu usage !
>>
>> results with 1 numjob:
>>
>> glibc : iops=36668 usr=62.23%, sys=12.13%
>> libtcmalloc : iops=36105 usr=63.54%, sys=8.45%
>> jemalloc: iops=43181 usr=60.91%, sys=10.51%
>>
>>
>> (with 10numjobs, i'm around 240k iops with jemalloc vs 220k iops with glibc/tcmalloc)
>>
>>
>> I just found a qemu git a patch to enable tcmalloc
>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=2847b46958ab0bd604e1b3fcafba0f5ba4375833
>> I'll try to test it to see if it's help
>
> Sounds good. Any reason for not switching to tcmalloc by default in PVE?
>
> Stefan
>
>>
>>
>>
>>
>>
>>
>> fio results
>> ------------
>>
>> glibc
>> -----
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [123.9MB/0KB/0KB /s] [31.8K/0/0 iops] [eta 00m:00s]
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7239: Tue May 12 02:05:46 2015
>> read : io=30000MB, bw=146675KB/s, iops=36668, runt=209443msec
>> slat (usec): min=8, max=1245, avg=26.07, stdev=13.99
>> clat (usec): min=107, max=4752, avg=525.40, stdev=207.46
>> lat (usec): min=126, max=4767, avg=551.47, stdev=208.27
>> clat percentiles (usec):
>> | 1.00th=[ 171], 5.00th=[ 215], 10.00th=[ 253], 20.00th=[ 322],
>> | 30.00th=[ 386], 40.00th=[ 450], 50.00th=[ 516], 60.00th=[ 588],
>> | 70.00th=[ 652], 80.00th=[ 716], 90.00th=[ 796], 95.00th=[ 868],
>> | 99.00th=[ 996], 99.50th=[ 1048], 99.90th=[ 1192], 99.95th=[ 1240],
>> | 99.99th=[ 1368]
>> bw (KB /s): min=112328, max=176848, per=100.00%, avg=146768.86, stdev=12974.09
>> lat (usec) : 250=9.61%, 500=37.58%, 750=37.25%, 1000=14.60%
>> lat (msec) : 2=0.96%, 4=0.01%, 10=0.01%
>> cpu : usr=62.23%, sys=12.13%, ctx=10008821, majf=0, minf=1348
>> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.8%, 16=64.2%, 32=4.0%, >=64=0.0%
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.9%, 64=0.0%, >=64=0.0%
>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>> latency : target=0, window=0, percentile=100.00%, depth=32
>>
>> Run status group 0 (all jobs):
>> READ: io=30000MB, aggrb=146674KB/s, minb=146674KB/s, maxb=146674KB/s, mint=209443msec, maxt=209443msec
>>
>> Disk stats (read/write):
>> sdb: ios=0/22, merge=0/13, ticks=0/0, in_queue=0, util=0.00%
>>
>>
>> jemmaloc
>> --------
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [165.4MB/0KB/0KB /s] [42.3K/0/0 iops] [eta 00m:00s]
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7137: Tue May 12 02:01:25 2015
>> read : io=30000MB, bw=172726KB/s, iops=43181, runt=177854msec
>> slat (usec): min=6, max=563, avg=22.28, stdev=14.68
>> clat (usec): min=95, max=3559, avg=456.29, stdev=168.37
>> lat (usec): min=110, max=3579, avg=478.56, stdev=169.06
>> clat percentiles (usec):
>> | 1.00th=[ 161], 5.00th=[ 201], 10.00th=[ 233], 20.00th=[ 290],
>> | 30.00th=[ 346], 40.00th=[ 402], 50.00th=[ 454], 60.00th=[ 506],
>> | 70.00th=[ 556], 80.00th=[ 612], 90.00th=[ 676], 95.00th=[ 732],
>> | 99.00th=[ 844], 99.50th=[ 900], 99.90th=[ 1020], 99.95th=[ 1064],
>> | 99.99th=[ 1192]
>> bw (KB /s): min=129936, max=199712, per=100.00%, avg=172822.83, stdev=11812.99
>> lat (usec) : 100=0.01%, 250=12.77%, 500=45.87%, 750=37.60%, 1000=3.62%
>> lat (msec) : 2=0.13%, 4=0.01%
>> cpu : usr=60.91%, sys=10.51%, ctx=9329053, majf=0, minf=1687
>> IO depths : 1=0.1%, 2=0.1%, 4=1.8%, 8=26.4%, 16=67.5%, 32=4.2%, >=64=0.0%
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>> complete : 0=0.0%, 4=95.9%, 8=0.1%, 16=0.1%, 32=4.0%, 64=0.0%, >=64=0.0%
>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>> latency : target=0, window=0, percentile=100.00%, depth=32
>>
>> Run status group 0 (all jobs):
>> READ: io=30000MB, aggrb=172725KB/s, minb=172725KB/s, maxb=172725KB/s, mint=177854msec, maxt=177854msec
>>
>> Disk stats (read/write):
>> sdb: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>>
>>
>> libtcmalloc
>> ------------
>> rbd engine: RBD version: 0.1.10
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [140.1MB/0KB/0KB /s] [35.9K/0/0 iops] [eta 00m:00s]
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7039: Tue May 12 01:57:41 2015
>> read : io=30000MB, bw=144423KB/s, iops=36105, runt=212708msec
>> slat (usec): min=10, max=803, avg=26.65, stdev=17.68
>> clat (usec): min=54, max=5052, avg=530.82, stdev=216.05
>> lat (usec): min=114, max=5531, avg=557.46, stdev=217.22
>> clat percentiles (usec):
>> | 1.00th=[ 169], 5.00th=[ 213], 10.00th=[ 251], 20.00th=[ 322],
>> | 30.00th=[ 386], 40.00th=[ 454], 50.00th=[ 524], 60.00th=[ 596],
>> | 70.00th=[ 660], 80.00th=[ 724], 90.00th=[ 804], 95.00th=[ 876],
>> | 99.00th=[ 1048], 99.50th=[ 1128], 99.90th=[ 1336], 99.95th=[ 1464],
>> | 99.99th=[ 2256]
>> bw (KB /s): min=60416, max=161496, per=100.00%, avg=144529.50, stdev=10827.54
>> lat (usec) : 100=0.01%, 250=9.88%, 500=36.69%, 750=36.97%, 1000=14.88%
>> lat (msec) : 2=1.57%, 4=0.01%, 10=0.01%
>> cpu : usr=63.54%, sys=8.45%, ctx=9209514, majf=0, minf=2120
>> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.9%, 16=64.0%, 32=4.0%, >=64=0.0%
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.8%, 64=0.0%, >=64=0.0%
>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>> latency : target=0, window=0, percentile=100.00%, depth=32
>>
>>
>>
>>
>>
>> ----- Mail original -----
>> De: "Milosz Tanski" <milosz@adfin.com>
>> À: "aderumier" <aderumier@odiso.com>
>> Cc: "Stefan Priebe" <s.priebe@profihost.ag>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Lundi 11 Mai 2015 23:38:51
>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>>
>> On Mon, May 11, 2015 at 10:20 AM, Alexandre DERUMIER
>> <aderumier@odiso.com> wrote:
>>>>> That's pretty interesting. I wasn't aware that there were performance
>>>>> optimisations in glibc.
>>>>>
>>>>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>>
>>> mmm, I can try that. Not sure it'll work.
>>>
>>>
>>> BTW, librbd cpu usage is always 3x-4x more than KRBD.
>>> a lot of cpu is used from malloc/free. It could be great to optimise that.
>>>
>>> I don't known if jemmaloc or tcmalloc could be used, like for osd daemons ?
>>
>> You can try it and see if it'll make a difference. Set LD_PRELOAD to
>> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>>
>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
>> $ ./run_test.sh
>>
>> As a matter of policy, libraries shouldn't force a particular malloc
>> implementation on the users of a particular library. It might go
>> against the user's wishes, not to mention what conflicts would happen
>> if one library wanted / needed jamalloc while another one wanted /
>> needed tcmalloc.
>>
>>>
>>>
>>> Reducing cpu usage could improve a lot qemu performance, as qemu use only 1 thread by disk.
>>>
>>>
>>>
>>> ----- Mail original -----
>>> De: "Stefan Priebe" <s.priebe@profihost.ag>
>>> À: "aderumier" <aderumier@odiso.com>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>>> Envoyé: Lundi 11 Mai 2015 12:30:03
>>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>>>
>>> Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER:
>>>> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>>>>
>>>> Maybe is it related to old glibc on debian wheezy ?
>>>
>>> That's pretty interesting. I wasn't aware that there were performance
>>> optimisations in glibc.
>>>
>>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>>
>>> Stefan
>>>
>>>
>>>>
>>>> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>>> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>>>
>>>>
>>>> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
>>>> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
>>>> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
>>>> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
>>>>
>>>>
>>>>
>>>> jessie perf report
>>>> ------------------
>>>> + 9,18% 3,75% fio libc-2.19.so [.] malloc
>>>> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
>>>> + 5,83% 5,64% fio libc-2.19.so [.] _int_free
>>>> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
>>>> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
>>>> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
>>>> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
>>>> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
>>>> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
>>>> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
>>>> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
>>>> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
>>>> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
>>>> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
>>>> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
>>>> + 1,88% 0,00% fio libc-2.19.so [.] __clone
>>>> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
>>>> + 1,88% 0,12% fio fio [.] thread_main
>>>> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
>>>> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
>>>> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
>>>> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
>>>> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
>>>> + 1,19% 1,19% fio libc-2.19.so [.] free
>>>> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
>>>> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
>>>> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
>>>> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
>>>> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
>>>> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
>>>> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
>>>> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
>>>> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
>>>> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
>>>> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
>>>> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
>>>> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
>>>> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
>>>> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
>>>> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
>>>> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
>>>> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
>>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
>>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
>>>> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
>>>> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
>>>> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
>>>> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
>>>> + 0,70% 0,23% fio fio [.] get_io_u
>>>> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
>>>> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
>>>> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
>>>> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
>>>> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
>>>> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-12 14:37 ` Milosz Tanski
@ 2015-05-12 15:21 ` Alexandre DERUMIER
2015-05-12 16:55 ` Milosz Tanski
0 siblings, 1 reply; 13+ messages in thread
From: Alexandre DERUMIER @ 2015-05-12 15:21 UTC (permalink / raw)
To: Milosz Tanski; +Cc: Stefan Priebe, cbt, ceph-devel
>>Maybe you can use perf to find the worst offending hotspots and place to start?
Already done some months ago (fio-rbd - debian wheezy),
http://tracker.ceph.com/issues/10139
But I'll try to update it with my new results on jessie.
----- Mail original -----
De: "Milosz Tanski" <milosz@adfin.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Stefan Priebe" <s.priebe@profihost.ag>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mardi 12 Mai 2015 16:37:38
Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
On Tue, May 12, 2015 at 4:17 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>>>Sounds good. Any reason for not switching to tcmalloc by default in PVE?
>
> I'm currently benching it inside qemu, but I don't see too much improvements
>
> I'm around 30000iops by virtio disk, glibc or tcmalloc. (don't known if jemmaloc works fine with qemu)
I'm going to guess that there's a whole slew of stuff that happens
between qemu and the guest that results in a lower bound for iops.
>
>
> I don't known if all this memory allocations call could be reduce in librbd/librados ?
>
Maybe you can use perf to find the worst offending hotspots and place to start?
>
>
> ----- Mail original -----
> De: "Stefan Priebe" <s.priebe@profihost.ag>
> À: "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com>
> Cc: "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mardi 12 Mai 2015 08:12:08
> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>
> Am 12.05.2015 um 02:34 schrieb Alexandre DERUMIER:
>>>> ou can try it and see if it'll make a difference. Set LD_PRELOAD to
>>>> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>>>>
>>>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
>>>> $ ./run_test.sh
>>
>> Thanks it's working.
>>
>> Seem that jemmaloc with fio-rbd give 17% iops improvement and reduce latencies and cpu usage !
>>
>> results with 1 numjob:
>>
>> glibc : iops=36668 usr=62.23%, sys=12.13%
>> libtcmalloc : iops=36105 usr=63.54%, sys=8.45%
>> jemalloc: iops=43181 usr=60.91%, sys=10.51%
>>
>>
>> (with 10numjobs, i'm around 240k iops with jemalloc vs 220k iops with glibc/tcmalloc)
>>
>>
>> I just found a qemu git a patch to enable tcmalloc
>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=2847b46958ab0bd604e1b3fcafba0f5ba4375833
>> I'll try to test it to see if it's help
>
> Sounds good. Any reason for not switching to tcmalloc by default in PVE?
>
> Stefan
>
>>
>>
>>
>>
>>
>>
>> fio results
>> ------------
>>
>> glibc
>> -----
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [123.9MB/0KB/0KB /s] [31.8K/0/0 iops] [eta 00m:00s]
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7239: Tue May 12 02:05:46 2015
>> read : io=30000MB, bw=146675KB/s, iops=36668, runt=209443msec
>> slat (usec): min=8, max=1245, avg=26.07, stdev=13.99
>> clat (usec): min=107, max=4752, avg=525.40, stdev=207.46
>> lat (usec): min=126, max=4767, avg=551.47, stdev=208.27
>> clat percentiles (usec):
>> | 1.00th=[ 171], 5.00th=[ 215], 10.00th=[ 253], 20.00th=[ 322],
>> | 30.00th=[ 386], 40.00th=[ 450], 50.00th=[ 516], 60.00th=[ 588],
>> | 70.00th=[ 652], 80.00th=[ 716], 90.00th=[ 796], 95.00th=[ 868],
>> | 99.00th=[ 996], 99.50th=[ 1048], 99.90th=[ 1192], 99.95th=[ 1240],
>> | 99.99th=[ 1368]
>> bw (KB /s): min=112328, max=176848, per=100.00%, avg=146768.86, stdev=12974.09
>> lat (usec) : 250=9.61%, 500=37.58%, 750=37.25%, 1000=14.60%
>> lat (msec) : 2=0.96%, 4=0.01%, 10=0.01%
>> cpu : usr=62.23%, sys=12.13%, ctx=10008821, majf=0, minf=1348
>> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.8%, 16=64.2%, 32=4.0%, >=64=0.0%
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.9%, 64=0.0%, >=64=0.0%
>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>> latency : target=0, window=0, percentile=100.00%, depth=32
>>
>> Run status group 0 (all jobs):
>> READ: io=30000MB, aggrb=146674KB/s, minb=146674KB/s, maxb=146674KB/s, mint=209443msec, maxt=209443msec
>>
>> Disk stats (read/write):
>> sdb: ios=0/22, merge=0/13, ticks=0/0, in_queue=0, util=0.00%
>>
>>
>> jemmaloc
>> --------
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [165.4MB/0KB/0KB /s] [42.3K/0/0 iops] [eta 00m:00s]
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7137: Tue May 12 02:01:25 2015
>> read : io=30000MB, bw=172726KB/s, iops=43181, runt=177854msec
>> slat (usec): min=6, max=563, avg=22.28, stdev=14.68
>> clat (usec): min=95, max=3559, avg=456.29, stdev=168.37
>> lat (usec): min=110, max=3579, avg=478.56, stdev=169.06
>> clat percentiles (usec):
>> | 1.00th=[ 161], 5.00th=[ 201], 10.00th=[ 233], 20.00th=[ 290],
>> | 30.00th=[ 346], 40.00th=[ 402], 50.00th=[ 454], 60.00th=[ 506],
>> | 70.00th=[ 556], 80.00th=[ 612], 90.00th=[ 676], 95.00th=[ 732],
>> | 99.00th=[ 844], 99.50th=[ 900], 99.90th=[ 1020], 99.95th=[ 1064],
>> | 99.99th=[ 1192]
>> bw (KB /s): min=129936, max=199712, per=100.00%, avg=172822.83, stdev=11812.99
>> lat (usec) : 100=0.01%, 250=12.77%, 500=45.87%, 750=37.60%, 1000=3.62%
>> lat (msec) : 2=0.13%, 4=0.01%
>> cpu : usr=60.91%, sys=10.51%, ctx=9329053, majf=0, minf=1687
>> IO depths : 1=0.1%, 2=0.1%, 4=1.8%, 8=26.4%, 16=67.5%, 32=4.2%, >=64=0.0%
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>> complete : 0=0.0%, 4=95.9%, 8=0.1%, 16=0.1%, 32=4.0%, 64=0.0%, >=64=0.0%
>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>> latency : target=0, window=0, percentile=100.00%, depth=32
>>
>> Run status group 0 (all jobs):
>> READ: io=30000MB, aggrb=172725KB/s, minb=172725KB/s, maxb=172725KB/s, mint=177854msec, maxt=177854msec
>>
>> Disk stats (read/write):
>> sdb: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>>
>>
>> libtcmalloc
>> ------------
>> rbd engine: RBD version: 0.1.10
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [140.1MB/0KB/0KB /s] [35.9K/0/0 iops] [eta 00m:00s]
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7039: Tue May 12 01:57:41 2015
>> read : io=30000MB, bw=144423KB/s, iops=36105, runt=212708msec
>> slat (usec): min=10, max=803, avg=26.65, stdev=17.68
>> clat (usec): min=54, max=5052, avg=530.82, stdev=216.05
>> lat (usec): min=114, max=5531, avg=557.46, stdev=217.22
>> clat percentiles (usec):
>> | 1.00th=[ 169], 5.00th=[ 213], 10.00th=[ 251], 20.00th=[ 322],
>> | 30.00th=[ 386], 40.00th=[ 454], 50.00th=[ 524], 60.00th=[ 596],
>> | 70.00th=[ 660], 80.00th=[ 724], 90.00th=[ 804], 95.00th=[ 876],
>> | 99.00th=[ 1048], 99.50th=[ 1128], 99.90th=[ 1336], 99.95th=[ 1464],
>> | 99.99th=[ 2256]
>> bw (KB /s): min=60416, max=161496, per=100.00%, avg=144529.50, stdev=10827.54
>> lat (usec) : 100=0.01%, 250=9.88%, 500=36.69%, 750=36.97%, 1000=14.88%
>> lat (msec) : 2=1.57%, 4=0.01%, 10=0.01%
>> cpu : usr=63.54%, sys=8.45%, ctx=9209514, majf=0, minf=2120
>> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.9%, 16=64.0%, 32=4.0%, >=64=0.0%
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.8%, 64=0.0%, >=64=0.0%
>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>> latency : target=0, window=0, percentile=100.00%, depth=32
>>
>>
>>
>>
>>
>> ----- Mail original -----
>> De: "Milosz Tanski" <milosz@adfin.com>
>> À: "aderumier" <aderumier@odiso.com>
>> Cc: "Stefan Priebe" <s.priebe@profihost.ag>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Lundi 11 Mai 2015 23:38:51
>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>>
>> On Mon, May 11, 2015 at 10:20 AM, Alexandre DERUMIER
>> <aderumier@odiso.com> wrote:
>>>>> That's pretty interesting. I wasn't aware that there were performance
>>>>> optimisations in glibc.
>>>>>
>>>>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>>
>>> mmm, I can try that. Not sure it'll work.
>>>
>>>
>>> BTW, librbd cpu usage is always 3x-4x more than KRBD.
>>> a lot of cpu is used from malloc/free. It could be great to optimise that.
>>>
>>> I don't known if jemmaloc or tcmalloc could be used, like for osd daemons ?
>>
>> You can try it and see if it'll make a difference. Set LD_PRELOAD to
>> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>>
>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
>> $ ./run_test.sh
>>
>> As a matter of policy, libraries shouldn't force a particular malloc
>> implementation on the users of a particular library. It might go
>> against the user's wishes, not to mention what conflicts would happen
>> if one library wanted / needed jamalloc while another one wanted /
>> needed tcmalloc.
>>
>>>
>>>
>>> Reducing cpu usage could improve a lot qemu performance, as qemu use only 1 thread by disk.
>>>
>>>
>>>
>>> ----- Mail original -----
>>> De: "Stefan Priebe" <s.priebe@profihost.ag>
>>> À: "aderumier" <aderumier@odiso.com>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>>> Envoyé: Lundi 11 Mai 2015 12:30:03
>>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>>>
>>> Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER:
>>>> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>>>>
>>>> Maybe is it related to old glibc on debian wheezy ?
>>>
>>> That's pretty interesting. I wasn't aware that there were performance
>>> optimisations in glibc.
>>>
>>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>>
>>> Stefan
>>>
>>>
>>>>
>>>> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>>> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>>>
>>>>
>>>> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
>>>> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
>>>> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
>>>> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
>>>>
>>>>
>>>>
>>>> jessie perf report
>>>> ------------------
>>>> + 9,18% 3,75% fio libc-2.19.so [.] malloc
>>>> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
>>>> + 5,83% 5,64% fio libc-2.19.so [.] _int_free
>>>> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
>>>> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
>>>> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
>>>> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
>>>> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
>>>> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
>>>> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
>>>> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
>>>> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
>>>> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
>>>> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
>>>> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
>>>> + 1,88% 0,00% fio libc-2.19.so [.] __clone
>>>> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
>>>> + 1,88% 0,12% fio fio [.] thread_main
>>>> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
>>>> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
>>>> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
>>>> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
>>>> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
>>>> + 1,19% 1,19% fio libc-2.19.so [.] free
>>>> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
>>>> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
>>>> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
>>>> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
>>>> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
>>>> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
>>>> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
>>>> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
>>>> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
>>>> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
>>>> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
>>>> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
>>>> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
>>>> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
>>>> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
>>>> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
>>>> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
>>>> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
>>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
>>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
>>>> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
>>>> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
>>>> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
>>>> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
>>>> + 0,70% 0,23% fio fio [.] get_io_u
>>>> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
>>>> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
>>>> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
>>>> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
>>>> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
>>>> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
2015-05-12 15:21 ` Alexandre DERUMIER
@ 2015-05-12 16:55 ` Milosz Tanski
0 siblings, 0 replies; 13+ messages in thread
From: Milosz Tanski @ 2015-05-12 16:55 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: Stefan Priebe, cbt, ceph-devel
On Tue, May 12, 2015 at 11:21 AM, Alexandre DERUMIER
<aderumier@odiso.com> wrote:
>>>Maybe you can use perf to find the worst offending hotspots and place to start?
>
> Already done some months ago (fio-rbd - debian wheezy),
> http://tracker.ceph.com/issues/10139
Looking at librbd in master and just aio_read I can tell you that
there's a lot of objects created / allocated in the process of a short
lived request. There's a bunch of c++ std containers created all of
which allocate heap (like std::map).
The good news is that there's a lot of room for improvement; the bad
news is that it'll be a fair amount of effort.
>
> But I'll try to update it with my new results on jessie.
>
>
> ----- Mail original -----
> De: "Milosz Tanski" <milosz@adfin.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "Stefan Priebe" <s.priebe@profihost.ag>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mardi 12 Mai 2015 16:37:38
> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>
> On Tue, May 12, 2015 at 4:17 AM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>>>>Sounds good. Any reason for not switching to tcmalloc by default in PVE?
>>
>> I'm currently benching it inside qemu, but I don't see too much improvements
>>
>> I'm around 30000iops by virtio disk, glibc or tcmalloc. (don't known if jemmaloc works fine with qemu)
>
> I'm going to guess that there's a whole slew of stuff that happens
> between qemu and the guest that results in a lower bound for iops.
>
>>
>>
>> I don't known if all this memory allocations call could be reduce in librbd/librados ?
>>
>
> Maybe you can use perf to find the worst offending hotspots and place to start?
>
>>
>>
>> ----- Mail original -----
>> De: "Stefan Priebe" <s.priebe@profihost.ag>
>> À: "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com>
>> Cc: "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Mardi 12 Mai 2015 08:12:08
>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>>
>> Am 12.05.2015 um 02:34 schrieb Alexandre DERUMIER:
>>>>> ou can try it and see if it'll make a difference. Set LD_PRELOAD to
>>>>> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>>>>>
>>>>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
>>>>> $ ./run_test.sh
>>>
>>> Thanks it's working.
>>>
>>> Seem that jemmaloc with fio-rbd give 17% iops improvement and reduce latencies and cpu usage !
>>>
>>> results with 1 numjob:
>>>
>>> glibc : iops=36668 usr=62.23%, sys=12.13%
>>> libtcmalloc : iops=36105 usr=63.54%, sys=8.45%
>>> jemalloc: iops=43181 usr=60.91%, sys=10.51%
>>>
>>>
>>> (with 10numjobs, i'm around 240k iops with jemalloc vs 220k iops with glibc/tcmalloc)
>>>
>>>
>>> I just found a qemu git a patch to enable tcmalloc
>>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=2847b46958ab0bd604e1b3fcafba0f5ba4375833
>>> I'll try to test it to see if it's help
>>
>> Sounds good. Any reason for not switching to tcmalloc by default in PVE?
>>
>> Stefan
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> fio results
>>> ------------
>>>
>>> glibc
>>> -----
>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [123.9MB/0KB/0KB /s] [31.8K/0/0 iops] [eta 00m:00s]
>>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7239: Tue May 12 02:05:46 2015
>>> read : io=30000MB, bw=146675KB/s, iops=36668, runt=209443msec
>>> slat (usec): min=8, max=1245, avg=26.07, stdev=13.99
>>> clat (usec): min=107, max=4752, avg=525.40, stdev=207.46
>>> lat (usec): min=126, max=4767, avg=551.47, stdev=208.27
>>> clat percentiles (usec):
>>> | 1.00th=[ 171], 5.00th=[ 215], 10.00th=[ 253], 20.00th=[ 322],
>>> | 30.00th=[ 386], 40.00th=[ 450], 50.00th=[ 516], 60.00th=[ 588],
>>> | 70.00th=[ 652], 80.00th=[ 716], 90.00th=[ 796], 95.00th=[ 868],
>>> | 99.00th=[ 996], 99.50th=[ 1048], 99.90th=[ 1192], 99.95th=[ 1240],
>>> | 99.99th=[ 1368]
>>> bw (KB /s): min=112328, max=176848, per=100.00%, avg=146768.86, stdev=12974.09
>>> lat (usec) : 250=9.61%, 500=37.58%, 750=37.25%, 1000=14.60%
>>> lat (msec) : 2=0.96%, 4=0.01%, 10=0.01%
>>> cpu : usr=62.23%, sys=12.13%, ctx=10008821, majf=0, minf=1348
>>> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.8%, 16=64.2%, 32=4.0%, >=64=0.0%
>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.9%, 64=0.0%, >=64=0.0%
>>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>>> latency : target=0, window=0, percentile=100.00%, depth=32
>>>
>>> Run status group 0 (all jobs):
>>> READ: io=30000MB, aggrb=146674KB/s, minb=146674KB/s, maxb=146674KB/s, mint=209443msec, maxt=209443msec
>>>
>>> Disk stats (read/write):
>>> sdb: ios=0/22, merge=0/13, ticks=0/0, in_queue=0, util=0.00%
>>>
>>>
>>> jemmaloc
>>> --------
>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [165.4MB/0KB/0KB /s] [42.3K/0/0 iops] [eta 00m:00s]
>>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7137: Tue May 12 02:01:25 2015
>>> read : io=30000MB, bw=172726KB/s, iops=43181, runt=177854msec
>>> slat (usec): min=6, max=563, avg=22.28, stdev=14.68
>>> clat (usec): min=95, max=3559, avg=456.29, stdev=168.37
>>> lat (usec): min=110, max=3579, avg=478.56, stdev=169.06
>>> clat percentiles (usec):
>>> | 1.00th=[ 161], 5.00th=[ 201], 10.00th=[ 233], 20.00th=[ 290],
>>> | 30.00th=[ 346], 40.00th=[ 402], 50.00th=[ 454], 60.00th=[ 506],
>>> | 70.00th=[ 556], 80.00th=[ 612], 90.00th=[ 676], 95.00th=[ 732],
>>> | 99.00th=[ 844], 99.50th=[ 900], 99.90th=[ 1020], 99.95th=[ 1064],
>>> | 99.99th=[ 1192]
>>> bw (KB /s): min=129936, max=199712, per=100.00%, avg=172822.83, stdev=11812.99
>>> lat (usec) : 100=0.01%, 250=12.77%, 500=45.87%, 750=37.60%, 1000=3.62%
>>> lat (msec) : 2=0.13%, 4=0.01%
>>> cpu : usr=60.91%, sys=10.51%, ctx=9329053, majf=0, minf=1687
>>> IO depths : 1=0.1%, 2=0.1%, 4=1.8%, 8=26.4%, 16=67.5%, 32=4.2%, >=64=0.0%
>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>> complete : 0=0.0%, 4=95.9%, 8=0.1%, 16=0.1%, 32=4.0%, 64=0.0%, >=64=0.0%
>>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>>> latency : target=0, window=0, percentile=100.00%, depth=32
>>>
>>> Run status group 0 (all jobs):
>>> READ: io=30000MB, aggrb=172725KB/s, minb=172725KB/s, maxb=172725KB/s, mint=177854msec, maxt=177854msec
>>>
>>> Disk stats (read/write):
>>> sdb: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>>>
>>>
>>> libtcmalloc
>>> ------------
>>> rbd engine: RBD version: 0.1.10
>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [140.1MB/0KB/0KB /s] [35.9K/0/0 iops] [eta 00m:00s]
>>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7039: Tue May 12 01:57:41 2015
>>> read : io=30000MB, bw=144423KB/s, iops=36105, runt=212708msec
>>> slat (usec): min=10, max=803, avg=26.65, stdev=17.68
>>> clat (usec): min=54, max=5052, avg=530.82, stdev=216.05
>>> lat (usec): min=114, max=5531, avg=557.46, stdev=217.22
>>> clat percentiles (usec):
>>> | 1.00th=[ 169], 5.00th=[ 213], 10.00th=[ 251], 20.00th=[ 322],
>>> | 30.00th=[ 386], 40.00th=[ 454], 50.00th=[ 524], 60.00th=[ 596],
>>> | 70.00th=[ 660], 80.00th=[ 724], 90.00th=[ 804], 95.00th=[ 876],
>>> | 99.00th=[ 1048], 99.50th=[ 1128], 99.90th=[ 1336], 99.95th=[ 1464],
>>> | 99.99th=[ 2256]
>>> bw (KB /s): min=60416, max=161496, per=100.00%, avg=144529.50, stdev=10827.54
>>> lat (usec) : 100=0.01%, 250=9.88%, 500=36.69%, 750=36.97%, 1000=14.88%
>>> lat (msec) : 2=1.57%, 4=0.01%, 10=0.01%
>>> cpu : usr=63.54%, sys=8.45%, ctx=9209514, majf=0, minf=2120
>>> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.9%, 16=64.0%, 32=4.0%, >=64=0.0%
>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.8%, 64=0.0%, >=64=0.0%
>>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>>> latency : target=0, window=0, percentile=100.00%, depth=32
>>>
>>>
>>>
>>>
>>>
>>> ----- Mail original -----
>>> De: "Milosz Tanski" <milosz@adfin.com>
>>> À: "aderumier" <aderumier@odiso.com>
>>> Cc: "Stefan Priebe" <s.priebe@profihost.ag>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>>> Envoyé: Lundi 11 Mai 2015 23:38:51
>>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>>>
>>> On Mon, May 11, 2015 at 10:20 AM, Alexandre DERUMIER
>>> <aderumier@odiso.com> wrote:
>>>>>> That's pretty interesting. I wasn't aware that there were performance
>>>>>> optimisations in glibc.
>>>>>>
>>>>>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>>>
>>>> mmm, I can try that. Not sure it'll work.
>>>>
>>>>
>>>> BTW, librbd cpu usage is always 3x-4x more than KRBD.
>>>> a lot of cpu is used from malloc/free. It could be great to optimise that.
>>>>
>>>> I don't known if jemmaloc or tcmalloc could be used, like for osd daemons ?
>>>
>>> You can try it and see if it'll make a difference. Set LD_PRELOAD to
>>> include the so of jemalloc / tcmalloc before starting FIO. Like this:
>>>
>>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1
>>> $ ./run_test.sh
>>>
>>> As a matter of policy, libraries shouldn't force a particular malloc
>>> implementation on the users of a particular library. It might go
>>> against the user's wishes, not to mention what conflicts would happen
>>> if one library wanted / needed jamalloc while another one wanted /
>>> needed tcmalloc.
>>>
>>>>
>>>>
>>>> Reducing cpu usage could improve a lot qemu performance, as qemu use only 1 thread by disk.
>>>>
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "Stefan Priebe" <s.priebe@profihost.ag>
>>>> À: "aderumier" <aderumier@odiso.com>, "cbt" <cbt@ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
>>>> Envoyé: Lundi 11 Mai 2015 12:30:03
>>>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference
>>>>
>>>> Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER:
>>>>> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true)
>>>>>
>>>>> Maybe is it related to old glibc on debian wheezy ?
>>>>
>>>> That's pretty interesting. I wasn't aware that there were performance
>>>> optimisations in glibc.
>>>>
>>>> As you have a test setup. Is it possible to install jessie libc on wheezy?
>>>>
>>>> Stefan
>>>>
>>>>
>>>>>
>>>>> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>>>> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st
>>>>>
>>>>>
>>>>> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st
>>>>> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st
>>>>> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
>>>>> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
>>>>>
>>>>>
>>>>>
>>>>> jessie perf report
>>>>> ------------------
>>>>> + 9,18% 3,75% fio libc-2.19.so [.] malloc
>>>>> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc
>>>>> + 5,83% 5,64% fio libc-2.19.so [.] _int_free
>>>>> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv
>>>>> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle
>>>>> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2
>>>>> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad
>>>>> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2
>>>>> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake
>>>>> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int)
>>>>> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign
>>>>> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait
>>>>> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign
>>>>> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned
>>>>> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign
>>>>> + 1,88% 0,00% fio libc-2.19.so [.] __clone
>>>>> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread
>>>>> + 1,88% 0,12% fio fio [.] thread_main
>>>>> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe
>>>>> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private
>>>>> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock
>>>>> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private
>>>>> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()
>>>>> + 1,19% 1,19% fio libc-2.19.so [.] free
>>>>> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate
>>>>> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13
>>>>> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule
>>>>> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int)
>>>>> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7
>>>>> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock
>>>>> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)
>>>>> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0
>>>>> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to
>>>>> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1
>>>>> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry
>>>>> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1
>>>>> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d
>>>>> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up
>>>>> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule
>>>>> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string
>>>>> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc
>>>>> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select
>>>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh
>>>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake
>>>>> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
>>>>> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to
>>>>> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
>>>>> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*)
>>>>> + 0,70% 0,23% fio fio [.] get_io_u
>>>>> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch
>>>>> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock
>>>>> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98
>>>>> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87
>>>>> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup
>>>>> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair
>>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>
>
>
> --
> Milosz Tanski
> CTO
> 16 East 34th Street, 15th floor
> New York, NY 10016
>
> p: 646-253-9055
> e: milosz@adfin.com
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-05-12 16:55 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-10 9:05 client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference Alexandre DERUMIER
2015-05-11 5:53 ` [Cbt] " Alexandre DERUMIER
2015-05-11 10:30 ` Stefan Priebe - Profihost AG
2015-05-11 14:20 ` Alexandre DERUMIER
2015-05-11 21:38 ` Milosz Tanski
2015-05-12 0:34 ` Alexandre DERUMIER
2015-05-12 6:12 ` Stefan Priebe - Profihost AG
2015-05-12 8:17 ` Alexandre DERUMIER
2015-05-12 14:37 ` Milosz Tanski
2015-05-12 15:21 ` Alexandre DERUMIER
2015-05-12 16:55 ` Milosz Tanski
2015-05-11 13:45 ` Mark Nelson
2015-05-11 14:15 ` Alexandre DERUMIER
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.