All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] add rocksdb support
@ 2014-03-03  2:07 Shu, Xinxin
  2014-03-03 13:37 ` Mark Nelson
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Shu, Xinxin @ 2014-03-03  2:07 UTC (permalink / raw)
  To: ceph-devel

Hi all, 
  
This patch added rocksdb support for ceph, enabled rocksdb for omap directory.   Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file,  and  rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories.
To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb).


Performance Test
Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds,  using 'rados bench' as the test tool. The performance results is quite promising.

Any comments or suggestions are greatly appreciated.

Rados bench	                       BandWidth(MB/s)	                     Average latency
	                                 Leveldb	rocksdb	Leveldb	rocksdb
write 4 threads	                 263.762	272.549	                    0.061	                 0.059
write 8 threads	                 449.834	457.811                    0.071	                 0.070
write 16 threads	 642.100	638.972	                   0.100	                 0.100
write 32 threads	705.897 	717.598                    0.181	                 0.178
write 64 threads	705.011 	717.204	                   0.370	                 0.362
read 4 threads	                873.588	                841.704                    0.073	                 0.076
read 8 threads	                816.699	                818.451	                   0.078	                 0.078
read 16 threads	808.810                	798.053	                   0.079	                 0.080
read 32 threads	798.394 	802.796	                   0.080	                 0.080
read 64 threads	792.848	                790.593	                   0.081	                 0.081

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-03-03  2:07 [RFC] add rocksdb support Shu, Xinxin
@ 2014-03-03 13:37 ` Mark Nelson
  2014-03-04  4:48 ` Alexandre DERUMIER
  2014-05-21  1:19 ` Sage Weil
  2 siblings, 0 replies; 37+ messages in thread
From: Mark Nelson @ 2014-03-03 13:37 UTC (permalink / raw)
  To: Shu, Xinxin; +Cc: ceph-devel

On 03/02/2014 08:07 PM, Shu, Xinxin wrote:
> Hi all,
>
> This patch added rocksdb support for ceph, enabled rocksdb for omap directory.   Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file,  and  rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories.
> To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb).
>
>
> Performance Test
> Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds,  using 'rados bench' as the test tool. The performance results is quite promising.
>
> Any comments or suggestions are greatly appreciated.

Awesome job!  Excited to look at this!

>
> Rados bench	                       BandWidth(MB/s)	                     Average latency
> 	                                 Leveldb	rocksdb	Leveldb	rocksdb
> write 4 threads	                 263.762	272.549	                    0.061	                 0.059
> write 8 threads	                 449.834	457.811                    0.071	                 0.070
> write 16 threads	 642.100	638.972	                   0.100	                 0.100
> write 32 threads	705.897 	717.598                    0.181	                 0.178
> write 64 threads	705.011 	717.204	                   0.370	                 0.362
> read 4 threads	                873.588	                841.704                    0.073	                 0.076
> read 8 threads	                816.699	                818.451	                   0.078	                 0.078
> read 16 threads	808.810                	798.053	                   0.079	                 0.080
> read 32 threads	798.394 	802.796	                   0.080	                 0.080
> read 64 threads	792.848	                790.593	                   0.081	                 0.081
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-03-03  2:07 [RFC] add rocksdb support Shu, Xinxin
  2014-03-03 13:37 ` Mark Nelson
@ 2014-03-04  4:48 ` Alexandre DERUMIER
  2014-03-04  8:41   ` Shu, Xinxin
  2014-05-21  1:19 ` Sage Weil
  2 siblings, 1 reply; 37+ messages in thread
From: Alexandre DERUMIER @ 2014-03-04  4:48 UTC (permalink / raw)
  To: Xinxin Shu; +Cc: ceph-devel

>>Performance Test 
>>Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. 

Thanks for your work, indeed performance seem to be promising !

>>Any comments or suggestions are greatly appreciated. 

Could you do test with random io write with last fio (with rbd support) ?

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-February/008182.html
> The fio command: fio -direct=1 -iodepth=64 -thread -rw=randwrite
>> -ioengine=rbd -bs=4k -size=19G -numjobs=1 -runtime=100
>> -group_reporting -name=ebs_test -pool=openstack -rbdname=image
>> -clientname=fio -invalidate=0


----- Mail original ----- 

De: "Xinxin Shu" <xinxin.shu@intel.com> 
À: ceph-devel@vger.kernel.org 
Envoyé: Lundi 3 Mars 2014 03:07:18 
Objet: [RFC] add rocksdb support 

Hi all, 

This patch added rocksdb support for ceph, enabled rocksdb for omap directory. Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file, and rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories. 
To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb). 


Performance Test 
Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. 

Any comments or suggestions are greatly appreciated. 

Rados bench BandWidth(MB/s) Average latency 
Leveldb rocksdb Leveldb rocksdb 
write 4 threads 263.762 272.549 0.061 0.059 
write 8 threads 449.834 457.811 0.071 0.070 
write 16 threads 642.100 638.972 0.100 0.100 
write 32 threads 705.897 717.598 0.181 0.178 
write 64 threads 705.011 717.204 0.370 0.362 
read 4 threads 873.588 841.704 0.073 0.076 
read 8 threads 816.699 818.451 0.078 0.078 
read 16 threads 808.810 798.053 0.079 0.080 
read 32 threads 798.394 802.796 0.080 0.080 
read 64 threads 792.848 790.593 0.081 0.081 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-03-04  4:48 ` Alexandre DERUMIER
@ 2014-03-04  8:41   ` Shu, Xinxin
  2014-03-05  8:23     ` Alexandre DERUMIER
  0 siblings, 1 reply; 37+ messages in thread
From: Shu, Xinxin @ 2014-03-04  8:41 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-devel

Hi Alexandre, below is random io test results, almost the same iops.

Rocksdb results

ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
fio-2.1.4
Starting 1 thread
rbd engine: RBD version: 0.1.8
Jobs: 1 (f=1): [w] [100.0% done] [0KB/23094KB/0KB /s] [0/5773/0 iops] [eta 00m:00s]
ebs_test: (groupid=0, jobs=1): err= 0: pid=47154: Tue Mar  4 13:48:22 2014
  write: io=3356.2MB, bw=17183KB/s, iops=4295, runt=200004msec
    slat (usec): min=19, max=8855, avg=134.33, stdev=259.00
    clat (usec): min=73, max=4397.6K, avg=12756.12, stdev=79341.35
     lat (msec): min=1, max=4397, avg=12.89, stdev=79.34
    clat percentiles (usec):
     |  1.00th=[ 1432],  5.00th=[ 1752], 10.00th=[ 2128], 20.00th=[ 3408],
     | 30.00th=[ 4768], 40.00th=[ 5856], 50.00th=[ 6880], 60.00th=[ 7904],
     | 70.00th=[ 8896], 80.00th=[10048], 90.00th=[11968], 95.00th=[14016],
     | 99.00th=[27520], 99.50th=[505856], 99.90th=[1204224], 99.95th=[1433600],
     | 99.99th=[2834432]
    bw (KB  /s): min=  403, max=24392, per=100.00%, avg=17358.47, stdev=7446.69
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=8.36%, 4=15.77%, 10=55.27%, 20=19.17%, 50=0.51%
    lat (msec) : 100=0.09%, 250=0.16%, 500=0.14%, 750=0.19%, 1000=0.15%
    lat (msec) : 2000=0.16%, >=2000=0.01%
  cpu          : usr=18.04%, sys=4.15%, ctx=1875119, majf=0, minf=838
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=1.1%, 16=10.9%, 32=65.9%, >=64=22.1%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=97.6%, 8=0.4%, 16=0.4%, 32=0.6%, 64=0.9%, >=64=0.0%
     issued    : total=r=0/w=859165/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=3356.2MB, aggrb=17182KB/s, minb=17182KB/s, maxb=17182KB/s, mint=200004msec, maxt=200004msec

Disk stats (read/write):
  sda: ios=0/2191, merge=0/2904, ticks=0/936, in_queue=936, util=0.29%

leveldb results:

ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
fio-2.1.4
Starting 1 thread
rbd engine: RBD version: 0.1.8
Jobs: 1 (f=1): [w] [100.0% done] [0KB/9428KB/0KB /s] [0/2357/0 iops] [eta 00m:00s]
ebs_test: (groupid=0, jobs=1): err= 0: pid=112425: Tue Mar  4 14:54:00 2014
  write: io=3404.9MB, bw=17431KB/s, iops=4357, runt=200016msec
    slat (usec): min=20, max=7698, avg=114.01, stdev=201.06
    clat (usec): min=220, max=3278.3K, avg=13340.59, stdev=76874.35
     lat (msec): min=1, max=3278, avg=13.45, stdev=76.87
    clat percentiles (usec):
     |  1.00th=[ 1400],  5.00th=[ 1608], 10.00th=[ 1784], 20.00th=[ 2192],
     | 30.00th=[ 2832], 40.00th=[ 3824], 50.00th=[ 5024], 60.00th=[ 6240],
     | 70.00th=[ 7456], 80.00th=[ 8768], 90.00th=[10816], 95.00th=[13120],
     | 99.00th=[284672], 99.50th=[610304], 99.90th=[1089536], 99.95th=[1286144],
     | 99.99th=[1630208]
    bw (KB  /s): min=   24, max=25548, per=100.00%, avg=17606.69, stdev=6779.23
    lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=15.63%, 4=25.94%, 10=45.35%, 20=10.98%, 50=0.44%
    lat (msec) : 100=0.17%, 250=0.40%, 500=0.42%, 750=0.34%, 1000=0.19%
    lat (msec) : 2000=0.12%, >=2000=0.01%
  cpu          : usr=18.25%, sys=4.14%, ctx=1887389, majf=0, minf=742
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.5%, 16=6.0%, 32=55.9%, >=64=37.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=97.8%, 8=0.7%, 16=0.5%, 32=0.5%, 64=0.5%, >=64=0.0%
     issued    : total=r=0/w=871635/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=3404.9MB, aggrb=17431KB/s, minb=17431KB/s, maxb=17431KB/s, mint=200016msec, maxt=200016msec

Disk stats (read/write):
  sda: ios=0/2125, merge=0/2796, ticks=0/708, in_queue=708, util=0.23%

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Tuesday, March 04, 2014 12:49 PM
To: Shu, Xinxin
Cc: ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

>>Performance Test
>>Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. 

Thanks for your work, indeed performance seem to be promising !

>>Any comments or suggestions are greatly appreciated. 

Could you do test with random io write with last fio (with rbd support) ?

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-February/008182.html
> The fio command: fio -direct=1 -iodepth=64 -thread -rw=randwrite
>> -ioengine=rbd -bs=4k -size=19G -numjobs=1 -runtime=100 
>> -group_reporting -name=ebs_test -pool=openstack -rbdname=image 
>> -clientname=fio -invalidate=0


----- Mail original ----- 

De: "Xinxin Shu" <xinxin.shu@intel.com>
À: ceph-devel@vger.kernel.org
Envoyé: Lundi 3 Mars 2014 03:07:18
Objet: [RFC] add rocksdb support 

Hi all, 

This patch added rocksdb support for ceph, enabled rocksdb for omap directory. Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file, and rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories. 
To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb). 


Performance Test
Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. 

Any comments or suggestions are greatly appreciated. 

Rados bench BandWidth(MB/s) Average latency Leveldb rocksdb Leveldb rocksdb write 4 threads 263.762 272.549 0.061 0.059 write 8 threads 449.834 457.811 0.071 0.070 write 16 threads 642.100 638.972 0.100 0.100 write 32 threads 705.897 717.598 0.181 0.178 write 64 threads 705.011 717.204 0.370 0.362 read 4 threads 873.588 841.704 0.073 0.076 read 8 threads 816.699 818.451 0.078 0.078 read 16 threads 808.810 798.053 0.079 0.080 read 32 threads 798.394 802.796 0.080 0.080 read 64 threads 792.848 790.593 0.081 0.081
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-03-04  8:41   ` Shu, Xinxin
@ 2014-03-05  8:23     ` Alexandre DERUMIER
  2014-03-05  8:30       ` Shu, Xinxin
  2014-03-05  8:31       ` Haomai Wang
  0 siblings, 2 replies; 37+ messages in thread
From: Alexandre DERUMIER @ 2014-03-05  8:23 UTC (permalink / raw)
  To: Xinxin Shu; +Cc: ceph-devel

>>Hi Alexandre, below is random io test results, almost the same iops. 

Thanks Xinxin, seem not too bad indeed.  and latencies seem to be a little lower than leveldb

(this was with 7,2k disks ? replication 2x or 3x ?)



----- Mail original ----- 

De: "Xinxin Shu" <xinxin.shu@intel.com> 
À: "Alexandre DERUMIER" <aderumier@odiso.com> 
Cc: ceph-devel@vger.kernel.org 
Envoyé: Mardi 4 Mars 2014 09:41:05 
Objet: RE: [RFC] add rocksdb support 

Hi Alexandre, below is random io test results, almost the same iops. 

Rocksdb results 

ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64 
fio-2.1.4 
Starting 1 thread 
rbd engine: RBD version: 0.1.8 
Jobs: 1 (f=1): [w] [100.0% done] [0KB/23094KB/0KB /s] [0/5773/0 iops] [eta 00m:00s] 
ebs_test: (groupid=0, jobs=1): err= 0: pid=47154: Tue Mar 4 13:48:22 2014 
write: io=3356.2MB, bw=17183KB/s, iops=4295, runt=200004msec 
slat (usec): min=19, max=8855, avg=134.33, stdev=259.00 
clat (usec): min=73, max=4397.6K, avg=12756.12, stdev=79341.35 
lat (msec): min=1, max=4397, avg=12.89, stdev=79.34 
clat percentiles (usec): 
| 1.00th=[ 1432], 5.00th=[ 1752], 10.00th=[ 2128], 20.00th=[ 3408], 
| 30.00th=[ 4768], 40.00th=[ 5856], 50.00th=[ 6880], 60.00th=[ 7904], 
| 70.00th=[ 8896], 80.00th=[10048], 90.00th=[11968], 95.00th=[14016], 
| 99.00th=[27520], 99.50th=[505856], 99.90th=[1204224], 99.95th=[1433600], 
| 99.99th=[2834432] 
bw (KB /s): min= 403, max=24392, per=100.00%, avg=17358.47, stdev=7446.69 
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% 
lat (msec) : 2=8.36%, 4=15.77%, 10=55.27%, 20=19.17%, 50=0.51% 
lat (msec) : 100=0.09%, 250=0.16%, 500=0.14%, 750=0.19%, 1000=0.15% 
lat (msec) : 2000=0.16%, >=2000=0.01% 
cpu : usr=18.04%, sys=4.15%, ctx=1875119, majf=0, minf=838 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=1.1%, 16=10.9%, 32=65.9%, >=64=22.1% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=97.6%, 8=0.4%, 16=0.4%, 32=0.6%, 64=0.9%, >=64=0.0% 
issued : total=r=0/w=859165/d=0, short=r=0/w=0/d=0 

Run status group 0 (all jobs): 
WRITE: io=3356.2MB, aggrb=17182KB/s, minb=17182KB/s, maxb=17182KB/s, mint=200004msec, maxt=200004msec 

Disk stats (read/write): 
sda: ios=0/2191, merge=0/2904, ticks=0/936, in_queue=936, util=0.29% 

leveldb results: 

ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64 
fio-2.1.4 
Starting 1 thread 
rbd engine: RBD version: 0.1.8 
Jobs: 1 (f=1): [w] [100.0% done] [0KB/9428KB/0KB /s] [0/2357/0 iops] [eta 00m:00s] 
ebs_test: (groupid=0, jobs=1): err= 0: pid=112425: Tue Mar 4 14:54:00 2014 
write: io=3404.9MB, bw=17431KB/s, iops=4357, runt=200016msec 
slat (usec): min=20, max=7698, avg=114.01, stdev=201.06 
clat (usec): min=220, max=3278.3K, avg=13340.59, stdev=76874.35 
lat (msec): min=1, max=3278, avg=13.45, stdev=76.87 
clat percentiles (usec): 
| 1.00th=[ 1400], 5.00th=[ 1608], 10.00th=[ 1784], 20.00th=[ 2192], 
| 30.00th=[ 2832], 40.00th=[ 3824], 50.00th=[ 5024], 60.00th=[ 6240], 
| 70.00th=[ 7456], 80.00th=[ 8768], 90.00th=[10816], 95.00th=[13120], 
| 99.00th=[284672], 99.50th=[610304], 99.90th=[1089536], 99.95th=[1286144], 
| 99.99th=[1630208] 
bw (KB /s): min= 24, max=25548, per=100.00%, avg=17606.69, stdev=6779.23 
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% 
lat (msec) : 2=15.63%, 4=25.94%, 10=45.35%, 20=10.98%, 50=0.44% 
lat (msec) : 100=0.17%, 250=0.40%, 500=0.42%, 750=0.34%, 1000=0.19% 
lat (msec) : 2000=0.12%, >=2000=0.01% 
cpu : usr=18.25%, sys=4.14%, ctx=1887389, majf=0, minf=742 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.5%, 16=6.0%, 32=55.9%, >=64=37.5% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=97.8%, 8=0.7%, 16=0.5%, 32=0.5%, 64=0.5%, >=64=0.0% 
issued : total=r=0/w=871635/d=0, short=r=0/w=0/d=0 

Run status group 0 (all jobs): 
WRITE: io=3404.9MB, aggrb=17431KB/s, minb=17431KB/s, maxb=17431KB/s, mint=200016msec, maxt=200016msec 

Disk stats (read/write): 
sda: ios=0/2125, merge=0/2796, ticks=0/708, in_queue=708, util=0.23% 

-----Original Message----- 
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Tuesday, March 04, 2014 12:49 PM 
To: Shu, Xinxin 
Cc: ceph-devel@vger.kernel.org 
Subject: Re: [RFC] add rocksdb support 

>>Performance Test 
>>Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. 

Thanks for your work, indeed performance seem to be promising ! 

>>Any comments or suggestions are greatly appreciated. 

Could you do test with random io write with last fio (with rbd support) ? 

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-February/008182.html 
> The fio command: fio -direct=1 -iodepth=64 -thread -rw=randwrite 
>> -ioengine=rbd -bs=4k -size=19G -numjobs=1 -runtime=100 
>> -group_reporting -name=ebs_test -pool=openstack -rbdname=image 
>> -clientname=fio -invalidate=0 


----- Mail original ----- 

De: "Xinxin Shu" <xinxin.shu@intel.com> 
À: ceph-devel@vger.kernel.org 
Envoyé: Lundi 3 Mars 2014 03:07:18 
Objet: [RFC] add rocksdb support 

Hi all, 

This patch added rocksdb support for ceph, enabled rocksdb for omap directory. Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file, and rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories. 
To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb). 


Performance Test 
Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. 

Any comments or suggestions are greatly appreciated. 

Rados bench BandWidth(MB/s) Average latency Leveldb rocksdb Leveldb rocksdb write 4 threads 263.762 272.549 0.061 0.059 write 8 threads 449.834 457.811 0.071 0.070 write 16 threads 642.100 638.972 0.100 0.100 write 32 threads 705.897 717.598 0.181 0.178 write 64 threads 705.011 717.204 0.370 0.362 read 4 threads 873.588 841.704 0.073 0.076 read 8 threads 816.699 818.451 0.078 0.078 read 16 threads 808.810 798.053 0.079 0.080 read 32 threads 798.394 802.796 0.080 0.080 read 64 threads 792.848 790.593 0.081 0.081 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-03-05  8:23     ` Alexandre DERUMIER
@ 2014-03-05  8:30       ` Shu, Xinxin
  2014-03-05  8:31       ` Haomai Wang
  1 sibling, 0 replies; 37+ messages in thread
From: Shu, Xinxin @ 2014-03-05  8:30 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-devel

with 7.2k disks, replications 2x.

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Wednesday, March 05, 2014 4:23 PM
To: Shu, Xinxin
Cc: ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

>>Hi Alexandre, below is random io test results, almost the same iops. 

Thanks Xinxin, seem not too bad indeed.  and latencies seem to be a little lower than leveldb

(this was with 7,2k disks ? replication 2x or 3x ?)



----- Mail original ----- 

De: "Xinxin Shu" <xinxin.shu@intel.com>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: ceph-devel@vger.kernel.org
Envoyé: Mardi 4 Mars 2014 09:41:05
Objet: RE: [RFC] add rocksdb support 

Hi Alexandre, below is random io test results, almost the same iops. 

Rocksdb results 

ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
fio-2.1.4
Starting 1 thread
rbd engine: RBD version: 0.1.8
Jobs: 1 (f=1): [w] [100.0% done] [0KB/23094KB/0KB /s] [0/5773/0 iops] [eta 00m:00s]
ebs_test: (groupid=0, jobs=1): err= 0: pid=47154: Tue Mar 4 13:48:22 2014
write: io=3356.2MB, bw=17183KB/s, iops=4295, runt=200004msec slat (usec): min=19, max=8855, avg=134.33, stdev=259.00 clat (usec): min=73, max=4397.6K, avg=12756.12, stdev=79341.35 lat (msec): min=1, max=4397, avg=12.89, stdev=79.34 clat percentiles (usec): 
| 1.00th=[ 1432], 5.00th=[ 1752], 10.00th=[ 2128], 20.00th=[ 3408], 
| 30.00th=[ 4768], 40.00th=[ 5856], 50.00th=[ 6880], 60.00th=[ 7904], 
| 70.00th=[ 8896], 80.00th=[10048], 90.00th=[11968], 95.00th=[14016], 
| 99.00th=[27520], 99.50th=[505856], 99.90th=[1204224], 
| 99.95th=[1433600], 99.99th=[2834432]
bw (KB /s): min= 403, max=24392, per=100.00%, avg=17358.47, stdev=7446.69 lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=8.36%, 4=15.77%, 10=55.27%, 20=19.17%, 50=0.51% lat (msec) : 100=0.09%, 250=0.16%, 500=0.14%, 750=0.19%, 1000=0.15% lat (msec) : 2000=0.16%, >=2000=0.01% cpu : usr=18.04%, sys=4.15%, ctx=1875119, majf=0, minf=838 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=1.1%, 16=10.9%, 32=65.9%, >=64=22.1% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=97.6%, 8=0.4%, 16=0.4%, 32=0.6%, 64=0.9%, >=64=0.0% issued : total=r=0/w=859165/d=0, short=r=0/w=0/d=0 

Run status group 0 (all jobs): 
WRITE: io=3356.2MB, aggrb=17182KB/s, minb=17182KB/s, maxb=17182KB/s, mint=200004msec, maxt=200004msec 

Disk stats (read/write): 
sda: ios=0/2191, merge=0/2904, ticks=0/936, in_queue=936, util=0.29% 

leveldb results: 

ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
fio-2.1.4
Starting 1 thread
rbd engine: RBD version: 0.1.8
Jobs: 1 (f=1): [w] [100.0% done] [0KB/9428KB/0KB /s] [0/2357/0 iops] [eta 00m:00s]
ebs_test: (groupid=0, jobs=1): err= 0: pid=112425: Tue Mar 4 14:54:00 2014
write: io=3404.9MB, bw=17431KB/s, iops=4357, runt=200016msec slat (usec): min=20, max=7698, avg=114.01, stdev=201.06 clat (usec): min=220, max=3278.3K, avg=13340.59, stdev=76874.35 lat (msec): min=1, max=3278, avg=13.45, stdev=76.87 clat percentiles (usec): 
| 1.00th=[ 1400], 5.00th=[ 1608], 10.00th=[ 1784], 20.00th=[ 2192], 
| 30.00th=[ 2832], 40.00th=[ 3824], 50.00th=[ 5024], 60.00th=[ 6240], 
| 70.00th=[ 7456], 80.00th=[ 8768], 90.00th=[10816], 95.00th=[13120], 
| 99.00th=[284672], 99.50th=[610304], 99.90th=[1089536], 
| 99.95th=[1286144], 99.99th=[1630208]
bw (KB /s): min= 24, max=25548, per=100.00%, avg=17606.69, stdev=6779.23 lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=15.63%, 4=25.94%, 10=45.35%, 20=10.98%, 50=0.44% lat (msec) : 100=0.17%, 250=0.40%, 500=0.42%, 750=0.34%, 1000=0.19% lat (msec) : 2000=0.12%, >=2000=0.01% cpu : usr=18.25%, sys=4.14%, ctx=1887389, majf=0, minf=742 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.5%, 16=6.0%, 32=55.9%, >=64=37.5% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=97.8%, 8=0.7%, 16=0.5%, 32=0.5%, 64=0.5%, >=64=0.0% issued : total=r=0/w=871635/d=0, short=r=0/w=0/d=0 

Run status group 0 (all jobs): 
WRITE: io=3404.9MB, aggrb=17431KB/s, minb=17431KB/s, maxb=17431KB/s, mint=200016msec, maxt=200016msec 

Disk stats (read/write): 
sda: ios=0/2125, merge=0/2796, ticks=0/708, in_queue=708, util=0.23% 

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
Sent: Tuesday, March 04, 2014 12:49 PM
To: Shu, Xinxin
Cc: ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support 

>>Performance Test
>>Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. 

Thanks for your work, indeed performance seem to be promising ! 

>>Any comments or suggestions are greatly appreciated. 

Could you do test with random io write with last fio (with rbd support) ? 

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-February/008182.html 
> The fio command: fio -direct=1 -iodepth=64 -thread -rw=randwrite
>> -ioengine=rbd -bs=4k -size=19G -numjobs=1 -runtime=100 
>> -group_reporting -name=ebs_test -pool=openstack -rbdname=image 
>> -clientname=fio -invalidate=0


----- Mail original ----- 

De: "Xinxin Shu" <xinxin.shu@intel.com> 
À: ceph-devel@vger.kernel.org 
Envoyé: Lundi 3 Mars 2014 03:07:18 
Objet: [RFC] add rocksdb support 

Hi all, 

This patch added rocksdb support for ceph, enabled rocksdb for omap directory. Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file, and rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories. 
To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb). 


Performance Test 
Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. 

Any comments or suggestions are greatly appreciated. 

Rados bench BandWidth(MB/s) Average latency Leveldb rocksdb Leveldb rocksdb write 4 threads 263.762 272.549 0.061 0.059 write 8 threads 449.834 457.811 0.071 0.070 write 16 threads 642.100 638.972 0.100 0.100 write 32 threads 705.897 717.598 0.181 0.178 write 64 threads 705.011 717.204 0.370 0.362 read 4 threads 873.588 841.704 0.073 0.076 read 8 threads 816.699 818.451 0.078 0.078 read 16 threads 808.810 798.053 0.079 0.080 read 32 threads 798.394 802.796 0.080 0.080 read 64 threads 792.848 790.593 0.081 0.081 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-03-05  8:23     ` Alexandre DERUMIER
  2014-03-05  8:30       ` Shu, Xinxin
@ 2014-03-05  8:31       ` Haomai Wang
  2014-03-05  9:19         ` Andreas Joachim Peters
  1 sibling, 1 reply; 37+ messages in thread
From: Haomai Wang @ 2014-03-05  8:31 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Xinxin Shu, ceph-devel

I think the reason why the little difference between leveldb and
rocksdb in FileStore is that the main latency cause isn't KeyValueDB
backend.

So we may not get enough benefit from rocksdb instead of leveldb by FileStore.

On Wed, Mar 5, 2014 at 4:23 PM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>>>Hi Alexandre, below is random io test results, almost the same iops.
>
> Thanks Xinxin, seem not too bad indeed.  and latencies seem to be a little lower than leveldb
>
> (this was with 7,2k disks ? replication 2x or 3x ?)
>
>
>
> ----- Mail original -----
>
> De: "Xinxin Shu" <xinxin.shu@intel.com>
> À: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: ceph-devel@vger.kernel.org
> Envoyé: Mardi 4 Mars 2014 09:41:05
> Objet: RE: [RFC] add rocksdb support
>
> Hi Alexandre, below is random io test results, almost the same iops.
>
> Rocksdb results
>
> ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
> fio-2.1.4
> Starting 1 thread
> rbd engine: RBD version: 0.1.8
> Jobs: 1 (f=1): [w] [100.0% done] [0KB/23094KB/0KB /s] [0/5773/0 iops] [eta 00m:00s]
> ebs_test: (groupid=0, jobs=1): err= 0: pid=47154: Tue Mar 4 13:48:22 2014
> write: io=3356.2MB, bw=17183KB/s, iops=4295, runt=200004msec
> slat (usec): min=19, max=8855, avg=134.33, stdev=259.00
> clat (usec): min=73, max=4397.6K, avg=12756.12, stdev=79341.35
> lat (msec): min=1, max=4397, avg=12.89, stdev=79.34
> clat percentiles (usec):
> | 1.00th=[ 1432], 5.00th=[ 1752], 10.00th=[ 2128], 20.00th=[ 3408],
> | 30.00th=[ 4768], 40.00th=[ 5856], 50.00th=[ 6880], 60.00th=[ 7904],
> | 70.00th=[ 8896], 80.00th=[10048], 90.00th=[11968], 95.00th=[14016],
> | 99.00th=[27520], 99.50th=[505856], 99.90th=[1204224], 99.95th=[1433600],
> | 99.99th=[2834432]
> bw (KB /s): min= 403, max=24392, per=100.00%, avg=17358.47, stdev=7446.69
> lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
> lat (msec) : 2=8.36%, 4=15.77%, 10=55.27%, 20=19.17%, 50=0.51%
> lat (msec) : 100=0.09%, 250=0.16%, 500=0.14%, 750=0.19%, 1000=0.15%
> lat (msec) : 2000=0.16%, >=2000=0.01%
> cpu : usr=18.04%, sys=4.15%, ctx=1875119, majf=0, minf=838
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=1.1%, 16=10.9%, 32=65.9%, >=64=22.1%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=97.6%, 8=0.4%, 16=0.4%, 32=0.6%, 64=0.9%, >=64=0.0%
> issued : total=r=0/w=859165/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> WRITE: io=3356.2MB, aggrb=17182KB/s, minb=17182KB/s, maxb=17182KB/s, mint=200004msec, maxt=200004msec
>
> Disk stats (read/write):
> sda: ios=0/2191, merge=0/2904, ticks=0/936, in_queue=936, util=0.29%
>
> leveldb results:
>
> ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
> fio-2.1.4
> Starting 1 thread
> rbd engine: RBD version: 0.1.8
> Jobs: 1 (f=1): [w] [100.0% done] [0KB/9428KB/0KB /s] [0/2357/0 iops] [eta 00m:00s]
> ebs_test: (groupid=0, jobs=1): err= 0: pid=112425: Tue Mar 4 14:54:00 2014
> write: io=3404.9MB, bw=17431KB/s, iops=4357, runt=200016msec
> slat (usec): min=20, max=7698, avg=114.01, stdev=201.06
> clat (usec): min=220, max=3278.3K, avg=13340.59, stdev=76874.35
> lat (msec): min=1, max=3278, avg=13.45, stdev=76.87
> clat percentiles (usec):
> | 1.00th=[ 1400], 5.00th=[ 1608], 10.00th=[ 1784], 20.00th=[ 2192],
> | 30.00th=[ 2832], 40.00th=[ 3824], 50.00th=[ 5024], 60.00th=[ 6240],
> | 70.00th=[ 7456], 80.00th=[ 8768], 90.00th=[10816], 95.00th=[13120],
> | 99.00th=[284672], 99.50th=[610304], 99.90th=[1089536], 99.95th=[1286144],
> | 99.99th=[1630208]
> bw (KB /s): min= 24, max=25548, per=100.00%, avg=17606.69, stdev=6779.23
> lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
> lat (msec) : 2=15.63%, 4=25.94%, 10=45.35%, 20=10.98%, 50=0.44%
> lat (msec) : 100=0.17%, 250=0.40%, 500=0.42%, 750=0.34%, 1000=0.19%
> lat (msec) : 2000=0.12%, >=2000=0.01%
> cpu : usr=18.25%, sys=4.14%, ctx=1887389, majf=0, minf=742
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.5%, 16=6.0%, 32=55.9%, >=64=37.5%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=97.8%, 8=0.7%, 16=0.5%, 32=0.5%, 64=0.5%, >=64=0.0%
> issued : total=r=0/w=871635/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> WRITE: io=3404.9MB, aggrb=17431KB/s, minb=17431KB/s, maxb=17431KB/s, mint=200016msec, maxt=200016msec
>
> Disk stats (read/write):
> sda: ios=0/2125, merge=0/2796, ticks=0/708, in_queue=708, util=0.23%
>
> -----Original Message-----
> From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
> Sent: Tuesday, March 04, 2014 12:49 PM
> To: Shu, Xinxin
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
>>>Performance Test
>>>Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising.
>
> Thanks for your work, indeed performance seem to be promising !
>
>>>Any comments or suggestions are greatly appreciated.
>
> Could you do test with random io write with last fio (with rbd support) ?
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-February/008182.html
>> The fio command: fio -direct=1 -iodepth=64 -thread -rw=randwrite
>>> -ioengine=rbd -bs=4k -size=19G -numjobs=1 -runtime=100
>>> -group_reporting -name=ebs_test -pool=openstack -rbdname=image
>>> -clientname=fio -invalidate=0
>
>
> ----- Mail original -----
>
> De: "Xinxin Shu" <xinxin.shu@intel.com>
> À: ceph-devel@vger.kernel.org
> Envoyé: Lundi 3 Mars 2014 03:07:18
> Objet: [RFC] add rocksdb support
>
> Hi all,
>
> This patch added rocksdb support for ceph, enabled rocksdb for omap directory. Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file, and rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories.
> To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb).
>
>
> Performance Test
> Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising.
>
> Any comments or suggestions are greatly appreciated.
>
> Rados bench BandWidth(MB/s) Average latency Leveldb rocksdb Leveldb rocksdb write 4 threads 263.762 272.549 0.061 0.059 write 8 threads 449.834 457.811 0.071 0.070 write 16 threads 642.100 638.972 0.100 0.100 write 32 threads 705.897 717.598 0.181 0.178 write 64 threads 705.011 717.204 0.370 0.362 read 4 threads 873.588 841.704 0.073 0.076 read 8 threads 816.699 818.451 0.078 0.078 read 16 threads 808.810 798.053 0.079 0.080 read 32 threads 798.394 802.796 0.080 0.080 read 64 threads 792.848 790.593 0.081 0.081
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-03-05  8:31       ` Haomai Wang
@ 2014-03-05  9:19         ` Andreas Joachim Peters
  2014-03-06  9:18           ` Shu, Xinxin
  0 siblings, 1 reply; 37+ messages in thread
From: Andreas Joachim Peters @ 2014-03-05  9:19 UTC (permalink / raw)
  To: Haomai Wang, Alexandre DERUMIER; +Cc: Xinxin Shu, ceph-devel

To me this numbers look within error bars identical and isn't that expected?

The main benefit of Rocksdb vs. Leveldb you can see when you create large tables going to 1 billion entries.

How many keys did you create per OSD in your Rados benchmarks?

Cheers Andreas.

________________________________________
From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Haomai Wang [haomaiwang@gmail.com]
Sent: 05 March 2014 09:31
To: Alexandre DERUMIER
Cc: Xinxin Shu; ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

I think the reason why the little difference between leveldb and
rocksdb in FileStore is that the main latency cause isn't KeyValueDB
backend.

So we may not get enough benefit from rocksdb instead of leveldb by FileStore.

On Wed, Mar 5, 2014 at 4:23 PM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>>>Hi Alexandre, below is random io test results, almost the same iops.
>
> Thanks Xinxin, seem not too bad indeed.  and latencies seem to be a little lower than leveldb
>
> (this was with 7,2k disks ? replication 2x or 3x ?)
>
>
>
> ----- Mail original -----
>
> De: "Xinxin Shu" <xinxin.shu@intel.com>
> À: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: ceph-devel@vger.kernel.org
> Envoyé: Mardi 4 Mars 2014 09:41:05
> Objet: RE: [RFC] add rocksdb support
>
> Hi Alexandre, below is random io test results, almost the same iops.
>
> Rocksdb results
>
> ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
> fio-2.1.4
> Starting 1 thread
> rbd engine: RBD version: 0.1.8
> Jobs: 1 (f=1): [w] [100.0% done] [0KB/23094KB/0KB /s] [0/5773/0 iops] [eta 00m:00s]
> ebs_test: (groupid=0, jobs=1): err= 0: pid=47154: Tue Mar 4 13:48:22 2014
> write: io=3356.2MB, bw=17183KB/s, iops=4295, runt=200004msec
> slat (usec): min=19, max=8855, avg=134.33, stdev=259.00
> clat (usec): min=73, max=4397.6K, avg=12756.12, stdev=79341.35
> lat (msec): min=1, max=4397, avg=12.89, stdev=79.34
> clat percentiles (usec):
> | 1.00th=[ 1432], 5.00th=[ 1752], 10.00th=[ 2128], 20.00th=[ 3408],
> | 30.00th=[ 4768], 40.00th=[ 5856], 50.00th=[ 6880], 60.00th=[ 7904],
> | 70.00th=[ 8896], 80.00th=[10048], 90.00th=[11968], 95.00th=[14016],
> | 99.00th=[27520], 99.50th=[505856], 99.90th=[1204224], 99.95th=[1433600],
> | 99.99th=[2834432]
> bw (KB /s): min= 403, max=24392, per=100.00%, avg=17358.47, stdev=7446.69
> lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
> lat (msec) : 2=8.36%, 4=15.77%, 10=55.27%, 20=19.17%, 50=0.51%
> lat (msec) : 100=0.09%, 250=0.16%, 500=0.14%, 750=0.19%, 1000=0.15%
> lat (msec) : 2000=0.16%, >=2000=0.01%
> cpu : usr=18.04%, sys=4.15%, ctx=1875119, majf=0, minf=838
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=1.1%, 16=10.9%, 32=65.9%, >=64=22.1%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=97.6%, 8=0.4%, 16=0.4%, 32=0.6%, 64=0.9%, >=64=0.0%
> issued : total=r=0/w=859165/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> WRITE: io=3356.2MB, aggrb=17182KB/s, minb=17182KB/s, maxb=17182KB/s, mint=200004msec, maxt=200004msec
>
> Disk stats (read/write):
> sda: ios=0/2191, merge=0/2904, ticks=0/936, in_queue=936, util=0.29%
>
> leveldb results:
>
> ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
> fio-2.1.4
> Starting 1 thread
> rbd engine: RBD version: 0.1.8
> Jobs: 1 (f=1): [w] [100.0% done] [0KB/9428KB/0KB /s] [0/2357/0 iops] [eta 00m:00s]
> ebs_test: (groupid=0, jobs=1): err= 0: pid=112425: Tue Mar 4 14:54:00 2014
> write: io=3404.9MB, bw=17431KB/s, iops=4357, runt=200016msec
> slat (usec): min=20, max=7698, avg=114.01, stdev=201.06
> clat (usec): min=220, max=3278.3K, avg=13340.59, stdev=76874.35
> lat (msec): min=1, max=3278, avg=13.45, stdev=76.87
> clat percentiles (usec):
> | 1.00th=[ 1400], 5.00th=[ 1608], 10.00th=[ 1784], 20.00th=[ 2192],
> | 30.00th=[ 2832], 40.00th=[ 3824], 50.00th=[ 5024], 60.00th=[ 6240],
> | 70.00th=[ 7456], 80.00th=[ 8768], 90.00th=[10816], 95.00th=[13120],
> | 99.00th=[284672], 99.50th=[610304], 99.90th=[1089536], 99.95th=[1286144],
> | 99.99th=[1630208]
> bw (KB /s): min= 24, max=25548, per=100.00%, avg=17606.69, stdev=6779.23
> lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
> lat (msec) : 2=15.63%, 4=25.94%, 10=45.35%, 20=10.98%, 50=0.44%
> lat (msec) : 100=0.17%, 250=0.40%, 500=0.42%, 750=0.34%, 1000=0.19%
> lat (msec) : 2000=0.12%, >=2000=0.01%
> cpu : usr=18.25%, sys=4.14%, ctx=1887389, majf=0, minf=742
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.5%, 16=6.0%, 32=55.9%, >=64=37.5%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=97.8%, 8=0.7%, 16=0.5%, 32=0.5%, 64=0.5%, >=64=0.0%
> issued : total=r=0/w=871635/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> WRITE: io=3404.9MB, aggrb=17431KB/s, minb=17431KB/s, maxb=17431KB/s, mint=200016msec, maxt=200016msec
>
> Disk stats (read/write):
> sda: ios=0/2125, merge=0/2796, ticks=0/708, in_queue=708, util=0.23%
>
> -----Original Message-----
> From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
> Sent: Tuesday, March 04, 2014 12:49 PM
> To: Shu, Xinxin
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
>>>Performance Test
>>>Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising.
>
> Thanks for your work, indeed performance seem to be promising !
>
>>>Any comments or suggestions are greatly appreciated.
>
> Could you do test with random io write with last fio (with rbd support) ?
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-February/008182.html
>> The fio command: fio -direct=1 -iodepth=64 -thread -rw=randwrite
>>> -ioengine=rbd -bs=4k -size=19G -numjobs=1 -runtime=100
>>> -group_reporting -name=ebs_test -pool=openstack -rbdname=image
>>> -clientname=fio -invalidate=0
>
>
> ----- Mail original -----
>
> De: "Xinxin Shu" <xinxin.shu@intel.com>
> À: ceph-devel@vger.kernel.org
> Envoyé: Lundi 3 Mars 2014 03:07:18
> Objet: [RFC] add rocksdb support
>
> Hi all,
>
> This patch added rocksdb support for ceph, enabled rocksdb for omap directory. Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file, and rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories.
> To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb).
>
>
> Performance Test
> Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising.
>
> Any comments or suggestions are greatly appreciated.
>
> Rados bench BandWidth(MB/s) Average latency Leveldb rocksdb Leveldb rocksdb write 4 threads 263.762 272.549 0.061 0.059 write 8 threads 449.834 457.811 0.071 0.070 write 16 threads 642.100 638.972 0.100 0.100 write 32 threads 705.897 717.598 0.181 0.178 write 64 threads 705.011 717.204 0.370 0.362 read 4 threads 873.588 841.704 0.073 0.076 read 8 threads 816.699 818.451 0.078 0.078 read 16 threads 808.810 798.053 0.079 0.080 read 32 threads 798.394 802.796 0.080 0.080 read 64 threads 792.848 790.593 0.081 0.081
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-03-05  9:19         ` Andreas Joachim Peters
@ 2014-03-06  9:18           ` Shu, Xinxin
  0 siblings, 0 replies; 37+ messages in thread
From: Shu, Xinxin @ 2014-03-06  9:18 UTC (permalink / raw)
  To: Andreas Joachim Peters, Haomai Wang, Alexandre DERUMIER; +Cc: ceph-devel

I don't get the exact number, but from the size of files , we don't get billions of entries.

-----Original Message-----
From: Andreas Joachim Peters [mailto:Andreas.Joachim.Peters@cern.ch] 
Sent: Wednesday, March 05, 2014 5:19 PM
To: Haomai Wang; Alexandre DERUMIER
Cc: Shu, Xinxin; ceph-devel@vger.kernel.org
Subject: RE: [RFC] add rocksdb support

To me this numbers look within error bars identical and isn't that expected?

The main benefit of Rocksdb vs. Leveldb you can see when you create large tables going to 1 billion entries.

How many keys did you create per OSD in your Rados benchmarks?

Cheers Andreas.

________________________________________
From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Haomai Wang [haomaiwang@gmail.com]
Sent: 05 March 2014 09:31
To: Alexandre DERUMIER
Cc: Xinxin Shu; ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

I think the reason why the little difference between leveldb and rocksdb in FileStore is that the main latency cause isn't KeyValueDB backend.

So we may not get enough benefit from rocksdb instead of leveldb by FileStore.

On Wed, Mar 5, 2014 at 4:23 PM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>>>Hi Alexandre, below is random io test results, almost the same iops.
>
> Thanks Xinxin, seem not too bad indeed.  and latencies seem to be a 
> little lower than leveldb
>
> (this was with 7,2k disks ? replication 2x or 3x ?)
>
>
>
> ----- Mail original -----
>
> De: "Xinxin Shu" <xinxin.shu@intel.com>
> À: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: ceph-devel@vger.kernel.org
> Envoyé: Mardi 4 Mars 2014 09:41:05
> Objet: RE: [RFC] add rocksdb support
>
> Hi Alexandre, below is random io test results, almost the same iops.
>
> Rocksdb results
>
> ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
> iodepth=64
> fio-2.1.4
> Starting 1 thread
> rbd engine: RBD version: 0.1.8
> Jobs: 1 (f=1): [w] [100.0% done] [0KB/23094KB/0KB /s] [0/5773/0 iops] 
> [eta 00m:00s]
> ebs_test: (groupid=0, jobs=1): err= 0: pid=47154: Tue Mar 4 13:48:22 
> 2014
> write: io=3356.2MB, bw=17183KB/s, iops=4295, runt=200004msec slat 
> (usec): min=19, max=8855, avg=134.33, stdev=259.00 clat (usec): 
> min=73, max=4397.6K, avg=12756.12, stdev=79341.35 lat (msec): min=1, 
> max=4397, avg=12.89, stdev=79.34 clat percentiles (usec):
> | 1.00th=[ 1432], 5.00th=[ 1752], 10.00th=[ 2128], 20.00th=[ 3408], 
> | 30.00th=[ 4768], 40.00th=[ 5856], 50.00th=[ 6880], 60.00th=[ 7904], 
> | 70.00th=[ 8896], 80.00th=[10048], 90.00th=[11968], 95.00th=[14016], 
> | 99.00th=[27520], 99.50th=[505856], 99.90th=[1204224], 
> | 99.95th=[1433600], 99.99th=[2834432]
> bw (KB /s): min= 403, max=24392, per=100.00%, avg=17358.47, 
> stdev=7446.69 lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 
> 1000=0.01% lat (msec) : 2=8.36%, 4=15.77%, 10=55.27%, 20=19.17%, 
> 50=0.51% lat (msec) : 100=0.09%, 250=0.16%, 500=0.14%, 750=0.19%, 
> 1000=0.15% lat (msec) : 2000=0.16%, >=2000=0.01% cpu : usr=18.04%, 
> sys=4.15%, ctx=1875119, majf=0, minf=838 IO depths : 1=0.1%, 2=0.1%, 
> 4=0.1%, 8=1.1%, 16=10.9%, 32=65.9%, >=64=22.1% submit : 0=0.0%, 
> 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 
> 0=0.0%, 4=97.6%, 8=0.4%, 16=0.4%, 32=0.6%, 64=0.9%, >=64=0.0% issued : 
> total=r=0/w=859165/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> WRITE: io=3356.2MB, aggrb=17182KB/s, minb=17182KB/s, maxb=17182KB/s, 
> mint=200004msec, maxt=200004msec
>
> Disk stats (read/write):
> sda: ios=0/2191, merge=0/2904, ticks=0/936, in_queue=936, util=0.29%
>
> leveldb results:
>
> ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
> iodepth=64
> fio-2.1.4
> Starting 1 thread
> rbd engine: RBD version: 0.1.8
> Jobs: 1 (f=1): [w] [100.0% done] [0KB/9428KB/0KB /s] [0/2357/0 iops] 
> [eta 00m:00s]
> ebs_test: (groupid=0, jobs=1): err= 0: pid=112425: Tue Mar 4 14:54:00 
> 2014
> write: io=3404.9MB, bw=17431KB/s, iops=4357, runt=200016msec slat 
> (usec): min=20, max=7698, avg=114.01, stdev=201.06 clat (usec): 
> min=220, max=3278.3K, avg=13340.59, stdev=76874.35 lat (msec): min=1, 
> max=3278, avg=13.45, stdev=76.87 clat percentiles (usec):
> | 1.00th=[ 1400], 5.00th=[ 1608], 10.00th=[ 1784], 20.00th=[ 2192], 
> | 30.00th=[ 2832], 40.00th=[ 3824], 50.00th=[ 5024], 60.00th=[ 6240], 
> | 70.00th=[ 7456], 80.00th=[ 8768], 90.00th=[10816], 95.00th=[13120], 
> | 99.00th=[284672], 99.50th=[610304], 99.90th=[1089536], 
> | 99.95th=[1286144], 99.99th=[1630208]
> bw (KB /s): min= 24, max=25548, per=100.00%, avg=17606.69, 
> stdev=6779.23 lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% 
> lat (msec) : 2=15.63%, 4=25.94%, 10=45.35%, 20=10.98%, 50=0.44% lat 
> (msec) : 100=0.17%, 250=0.40%, 500=0.42%, 750=0.34%, 1000=0.19% lat 
> (msec) : 2000=0.12%, >=2000=0.01% cpu : usr=18.25%, sys=4.14%, 
> ctx=1887389, majf=0, minf=742 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 
> 8=0.5%, 16=6.0%, 32=55.9%, >=64=37.5% submit : 0=0.0%, 4=100.0%, 
> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
> 4=97.8%, 8=0.7%, 16=0.5%, 32=0.5%, 64=0.5%, >=64=0.0% issued : 
> total=r=0/w=871635/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> WRITE: io=3404.9MB, aggrb=17431KB/s, minb=17431KB/s, maxb=17431KB/s, 
> mint=200016msec, maxt=200016msec
>
> Disk stats (read/write):
> sda: ios=0/2125, merge=0/2796, ticks=0/708, in_queue=708, util=0.23%
>
> -----Original Message-----
> From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
> Sent: Tuesday, March 04, 2014 12:49 PM
> To: Shu, Xinxin
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
>>>Performance Test
>>>Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising.
>
> Thanks for your work, indeed performance seem to be promising !
>
>>>Any comments or suggestions are greatly appreciated.
>
> Could you do test with random io write with last fio (with rbd support) ?
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-February/0081
> 82.html
>> The fio command: fio -direct=1 -iodepth=64 -thread -rw=randwrite
>>> -ioengine=rbd -bs=4k -size=19G -numjobs=1 -runtime=100 
>>> -group_reporting -name=ebs_test -pool=openstack -rbdname=image 
>>> -clientname=fio -invalidate=0
>
>
> ----- Mail original -----
>
> De: "Xinxin Shu" <xinxin.shu@intel.com>
> À: ceph-devel@vger.kernel.org
> Envoyé: Lundi 3 Mars 2014 03:07:18
> Objet: [RFC] add rocksdb support
>
> Hi all,
>
> This patch added rocksdb support for ceph, enabled rocksdb for omap directory. Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file, and rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories.
> To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb).
>
>
> Performance Test
> Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising.
>
> Any comments or suggestions are greatly appreciated.
>
> Rados bench BandWidth(MB/s) Average latency Leveldb rocksdb Leveldb 
> rocksdb write 4 threads 263.762 272.549 0.061 0.059 write 8 threads 
> 449.834 457.811 0.071 0.070 write 16 threads 642.100 638.972 0.100 
> 0.100 write 32 threads 705.897 717.598 0.181 0.178 write 64 threads 
> 705.011 717.204 0.370 0.362 read 4 threads 873.588 841.704 0.073 0.076 
> read 8 threads 816.699 818.451 0.078 0.078 read 16 threads 808.810 
> 798.053 0.079 0.080 read 32 threads 798.394 802.796 0.080 0.080 read 
> 64 threads 792.848 790.593 0.081 0.081
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html



--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-03-03  2:07 [RFC] add rocksdb support Shu, Xinxin
  2014-03-03 13:37 ` Mark Nelson
  2014-03-04  4:48 ` Alexandre DERUMIER
@ 2014-05-21  1:19 ` Sage Weil
  2014-05-21 12:54   ` Shu, Xinxin
  2 siblings, 1 reply; 37+ messages in thread
From: Sage Weil @ 2014-05-21  1:19 UTC (permalink / raw)
  To: Shu, Xinxin; +Cc: ceph-devel

Hi Xinxin,

I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that 
includes the latest set of patches with the groundwork and your rocksdb 
patch.  There is also a commit that adds rocksdb as a git submodule.  I'm 
thinking that, since there aren't any distro packages for rocksdb at this 
point, this is going to be the easiest way to make this usable for people.

If you can wire the submodule into the makefile, we can merge this in so 
that rocksdb support is in the ceph.com packages on ceph.com.  I suspect 
that the distros will prefer to turns this off in favor of separate shared 
libs, but they can do this at their option if/when they include rocksdb in 
the distro. I think the key is just to have both --with-librockdb and 
--with-librocksdb-static (or similar) options so that you can either use 
the static or dynamically linked one.

Has your group done further testing with rocksdb?  Anything interesting to 
share?

Thanks!
sage


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-05-21  1:19 ` Sage Weil
@ 2014-05-21 12:54   ` Shu, Xinxin
  2014-05-21 13:06     ` Mark Nelson
  0 siblings, 1 reply; 37+ messages in thread
From: Shu, Xinxin @ 2014-05-21 12:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel, Zhang, Jian

Hi, sage

I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.

-----Original Message-----
From: Sage Weil [mailto:sage@inktank.com] 
Sent: Wednesday, May 21, 2014 9:19 AM
To: Shu, Xinxin
Cc: ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

Hi Xinxin,

I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.

If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.

Has your group done further testing with rocksdb?  Anything interesting to share?

Thanks!
sage


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-05-21 12:54   ` Shu, Xinxin
@ 2014-05-21 13:06     ` Mark Nelson
  2014-05-28 10:05       ` Shu, Xinxin
  0 siblings, 1 reply; 37+ messages in thread
From: Mark Nelson @ 2014-05-21 13:06 UTC (permalink / raw)
  To: Shu, Xinxin, Sage Weil; +Cc: ceph-devel, Zhang, Jian

On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
> Hi, sage
>
> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.

I'm definitely interested in any performance tests you do here.  Last 
winter I started doing some fairly high level tests on raw 
leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see 
with rocksdb as a backend.

>
> -----Original Message-----
> From: Sage Weil [mailto:sage@inktank.com]
> Sent: Wednesday, May 21, 2014 9:19 AM
> To: Shu, Xinxin
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> Hi Xinxin,
>
> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>
> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>
> Has your group done further testing with rocksdb?  Anything interesting to share?
>
> Thanks!
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-05-21 13:06     ` Mark Nelson
@ 2014-05-28 10:05       ` Shu, Xinxin
  2014-06-03 20:01         ` Sage Weil
  2014-06-09 17:11         ` Mark Nelson
  0 siblings, 2 replies; 37+ messages in thread
From: Shu, Xinxin @ 2014-05-28 10:05 UTC (permalink / raw)
  To: Mark Nelson, Sage Weil; +Cc: ceph-devel, Zhang, Jian

Hi sage ,  
I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?

since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .

-----Original Message-----
From: Mark Nelson [mailto:mark.nelson@inktank.com] 
Sent: Wednesday, May 21, 2014 9:06 PM
To: Shu, Xinxin; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: Re: [RFC] add rocksdb support

On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
> Hi, sage
>
> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.

I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.

>
> -----Original Message-----
> From: Sage Weil [mailto:sage@inktank.com]
> Sent: Wednesday, May 21, 2014 9:19 AM
> To: Shu, Xinxin
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> Hi Xinxin,
>
> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>
> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>
> Has your group done further testing with rocksdb?  Anything interesting to share?
>
> Thanks!
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-05-28 10:05       ` Shu, Xinxin
@ 2014-06-03 20:01         ` Sage Weil
  2014-06-09 17:11         ` Mark Nelson
  1 sibling, 0 replies; 37+ messages in thread
From: Sage Weil @ 2014-06-03 20:01 UTC (permalink / raw)
  To: Shu, Xinxin; +Cc: Mark Nelson, ceph-devel, Zhang, Jian

Hi Xinxin,

On Wed, 28 May 2014, Shu, Xinxin wrote:
> Hi sage ,  
> I will add two configure options to --with-librocksdb-static and 
> --with-librocksdb , with --with-librocksdb-static option , ceph will 
> compile the code that get from ceph repository , with --with-librocksdb 
> option , in case of distro packages for rocksdb , ceph will not compile 
> the rocksdb code , will use pre-installed library. is that ok for you ?
> 
> since current rocksdb does not support autoconf&automake , I will add 
> autoconf&automake support for rocksdb , but before that , i think we 
> should fork a stable branch (maybe 3.0) for ceph .

That sounds right to me.  We can update which commit we're building easily 
later.

Thanks!
sage


> 
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com] 
> Sent: Wednesday, May 21, 2014 9:06 PM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
> 
> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
> > Hi, sage
> >
> > I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
> 
> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
> 
> >
> > -----Original Message-----
> > From: Sage Weil [mailto:sage@inktank.com]
> > Sent: Wednesday, May 21, 2014 9:19 AM
> > To: Shu, Xinxin
> > Cc: ceph-devel@vger.kernel.org
> > Subject: Re: [RFC] add rocksdb support
> >
> > Hi Xinxin,
> >
> > I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
> >
> > If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
> >
> > Has your group done further testing with rocksdb?  Anything interesting to share?
> >
> > Thanks!
> > sage
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-05-28 10:05       ` Shu, Xinxin
  2014-06-03 20:01         ` Sage Weil
@ 2014-06-09 17:11         ` Mark Nelson
  2014-06-10  4:59           ` Shu, Xinxin
  1 sibling, 1 reply; 37+ messages in thread
From: Mark Nelson @ 2014-06-09 17:11 UTC (permalink / raw)
  To: Shu, Xinxin, Sage Weil; +Cc: ceph-devel, Zhang, Jian

Hi Xinxin,

On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
> Hi sage ,
> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>
> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .

I'm looking at testing out the rocksdb support as well, both for the OSD 
and for the monitor based on some issues we've been seeing lately.  Any 
news on the 3.0 fork and autoconf/automake support in rocksdb?

Thanks,
Mark

>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Wednesday, May 21, 2014 9:06 PM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
>
> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>> Hi, sage
>>
>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>
> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>
>>
>> -----Original Message-----
>> From: Sage Weil [mailto:sage@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:19 AM
>> To: Shu, Xinxin
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>
>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>
>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>
>> Thanks!
>> sage
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-06-09 17:11         ` Mark Nelson
@ 2014-06-10  4:59           ` Shu, Xinxin
  2014-06-13 18:51             ` Sushma Gurram
  0 siblings, 1 reply; 37+ messages in thread
From: Shu, Xinxin @ 2014-06-10  4:59 UTC (permalink / raw)
  To: Mark Nelson, Sage Weil; +Cc: ceph-devel, Zhang, Jian

Hi mark 

I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, June 10, 2014 1:12 AM
To: Shu, Xinxin; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: Re: [RFC] add rocksdb support

Hi Xinxin,

On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
> Hi sage ,
> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>
> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .

I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?

Thanks,
Mark

>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Wednesday, May 21, 2014 9:06 PM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
>
> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>> Hi, sage
>>
>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>
> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>
>>
>> -----Original Message-----
>> From: Sage Weil [mailto:sage@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:19 AM
>> To: Shu, Xinxin
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>
>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>
>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>
>> Thanks!
>> sage
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-06-10  4:59           ` Shu, Xinxin
@ 2014-06-13 18:51             ` Sushma Gurram
  2014-06-14  0:49               ` David Zafman
  2014-06-14  3:49               ` Shu, Xinxin
  0 siblings, 2 replies; 37+ messages in thread
From: Sushma Gurram @ 2014-06-13 18:51 UTC (permalink / raw)
  To: Shu, Xinxin, Mark Nelson, Sage Weil; +Cc: ceph-devel, Zhang, Jian

Hi Xinxin,

I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
It doesn't seem to have any other source files and compilation fails:
os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory
compilation terminated.

Thanks,
Sushma

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
Sent: Monday, June 09, 2014 10:00 PM
To: Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support

Hi mark

I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, June 10, 2014 1:12 AM
To: Shu, Xinxin; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: Re: [RFC] add rocksdb support

Hi Xinxin,

On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
> Hi sage ,
> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>
> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .

I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?

Thanks,
Mark

>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Wednesday, May 21, 2014 9:06 PM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
>
> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>> Hi, sage
>>
>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>
> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>
>>
>> -----Original Message-----
>> From: Sage Weil [mailto:sage@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:19 AM
>> To: Shu, Xinxin
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>
>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>
>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>
>> Thanks!
>> sage
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-06-13 18:51             ` Sushma Gurram
@ 2014-06-14  0:49               ` David Zafman
  2014-06-14  3:49               ` Shu, Xinxin
  1 sibling, 0 replies; 37+ messages in thread
From: David Zafman @ 2014-06-14  0:49 UTC (permalink / raw)
  To: Sushma Gurram
  Cc: Shu, Xinxin, Mark Nelson, Sage Weil, ceph-devel, Zhang, Jian


Don’t forget when a new submodule is added you need to initialize it.  From the README:

Building Ceph
=============

To prepare the source tree after it has been git cloned,

        $ git submodule update --init

To build the server daemons, and FUSE client, execute the following:

        $ ./autogen.sh
        $ ./configure
        $ make


David Zafman
Senior Developer
http://www.inktank.com
http://www.redhat.com

On Jun 13, 2014, at 11:51 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:

> Hi Xinxin,
> 
> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
> It doesn't seem to have any other source files and compilation fails:
> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory
> compilation terminated.
> 
> Thanks,
> Sushma
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
> Sent: Monday, June 09, 2014 10:00 PM
> To: Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
> 
> Hi mark
> 
> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, June 10, 2014 1:12 AM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
> 
> Hi Xinxin,
> 
> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>> Hi sage ,
>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>> 
>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
> 
> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
> 
> Thanks,
> Mark
> 
>> 
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:06 PM
>> To: Shu, Xinxin; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: Re: [RFC] add rocksdb support
>> 
>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>> Hi, sage
>>> 
>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>> 
>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>> 
>>> 
>>> -----Original Message-----
>>> From: Sage Weil [mailto:sage@inktank.com]
>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>> To: Shu, Xinxin
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: [RFC] add rocksdb support
>>> 
>>> Hi Xinxin,
>>> 
>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>> 
>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>> 
>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>> 
>>> Thanks!
>>> sage
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> 
>> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-06-13 18:51             ` Sushma Gurram
  2014-06-14  0:49               ` David Zafman
@ 2014-06-14  3:49               ` Shu, Xinxin
  2014-06-23  1:18                 ` Shu, Xinxin
  2014-06-23  7:32                 ` Dan van der Ster
  1 sibling, 2 replies; 37+ messages in thread
From: Shu, Xinxin @ 2014-06-14  3:49 UTC (permalink / raw)
  To: Sushma Gurram, Mark Nelson, Sage Weil; +Cc: ceph-devel, Zhang, Jian

Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .  

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
Sent: Saturday, June 14, 2014 2:52 AM
To: Shu, Xinxin; Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support

Hi Xinxin,

I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
It doesn't seem to have any other source files and compilation fails:
os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.

Thanks,
Sushma

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
Sent: Monday, June 09, 2014 10:00 PM
To: Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support

Hi mark

I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, June 10, 2014 1:12 AM
To: Shu, Xinxin; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: Re: [RFC] add rocksdb support

Hi Xinxin,

On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
> Hi sage ,
> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>
> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .

I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?

Thanks,
Mark

>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Wednesday, May 21, 2014 9:06 PM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
>
> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>> Hi, sage
>>
>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>
> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>
>>
>> -----Original Message-----
>> From: Sage Weil [mailto:sage@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:19 AM
>> To: Shu, Xinxin
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>
>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>
>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>
>> Thanks!
>> sage
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-06-14  3:49               ` Shu, Xinxin
@ 2014-06-23  1:18                 ` Shu, Xinxin
  2014-06-27  0:44                   ` Sushma Gurram
  2014-06-23  7:32                 ` Dan van der Ster
  1 sibling, 1 reply; 37+ messages in thread
From: Shu, Xinxin @ 2014-06-23  1:18 UTC (permalink / raw)
  To: 'Sushma Gurram', 'Mark Nelson', 'Sage Weil'
  Cc: 'ceph-devel@vger.kernel.org', Zhang, Jian


Hi all,

 We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.

	                                          xfs		  leveldb		rocksdb	
	                  throughtput	latency	    throughtput	latency	   throughtput	latency
1 thread write	     84.029	  0.048	            52.430	   0.076	      71.920	0.056
2 threads write	     166.417	  0.048	            97.917	   0.082	     155.148	0.052
4 threads write	      304.099	  0.052	            156.094	    0.102	     270.461	0.059
8 threads write	      323.047	  0.099	            221.370	    0.144	     339.455	0.094
16 threads write     295.040	  0.216	            272.032	    0.235       348.849	0.183
32 threads write     324.467	  0.394	            290.072	     0.441	     338.103	0.378
64 threads write     313.713	  0.812	            293.261	     0.871     324.603	0.787
1 thread read	      75.687	  0.053	             71.629	     0.056      72.526	0.055
2 threads read	      182.329	  0.044	            151.683	      0.053     153.125	0.052
4 threads read	      320.785	  0.050	            307.180	      0.052      312.016	0.051
8 threads read	       504.880	  0.063	            512.295	      0.062      519.683	0.062
16 threads read       477.706	  0.134	            643.385	      0.099      654.149	0.098
32 threads read       517.670	  0.247	             666.696          0.192      678.480	0.189
64 threads read       516.599	  0.495	             668.360	       0.383      680.673	0.376

-----Original Message-----
From: Shu, Xinxin 
Sent: Saturday, June 14, 2014 11:50 AM
To: Sushma Gurram; Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support

Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .  

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
Sent: Saturday, June 14, 2014 2:52 AM
To: Shu, Xinxin; Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support

Hi Xinxin,

I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
It doesn't seem to have any other source files and compilation fails:
os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.

Thanks,
Sushma

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
Sent: Monday, June 09, 2014 10:00 PM
To: Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support

Hi mark

I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, June 10, 2014 1:12 AM
To: Shu, Xinxin; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: Re: [RFC] add rocksdb support

Hi Xinxin,

On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
> Hi sage ,
> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>
> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .

I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?

Thanks,
Mark

>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Wednesday, May 21, 2014 9:06 PM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
>
> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>> Hi, sage
>>
>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>
> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>
>>
>> -----Original Message-----
>> From: Sage Weil [mailto:sage@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:19 AM
>> To: Shu, Xinxin
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>
>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>
>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>
>> Thanks!
>> sage
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-06-14  3:49               ` Shu, Xinxin
  2014-06-23  1:18                 ` Shu, Xinxin
@ 2014-06-23  7:32                 ` Dan van der Ster
  1 sibling, 0 replies; 37+ messages in thread
From: Dan van der Ster @ 2014-06-23  7:32 UTC (permalink / raw)
  To: Xinxin, Sushma Gurram, Mark Nelson, Sage Weil; +Cc: ceph-devel, Jian

Hi,
In your test setup do the KV stores use the SSDs in any way? If not, is this really a fair comparison? If the KV stores can give SSD-like ceph performance (especially latency) without the SSDs, that would be quite good.
Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --

June 23 2014 3:18 AM, "Shu, Xinxin"  wrote: 

> Hi all,
> 
> We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
> 
> xfs        leveldb      rocksdb   
> throughtput   latency       throughtput   latency      throughtput   latency
> 1 thread write        84.029     0.048               52.430      0.076         71.920   0.056
> 2 threads write        166.417     0.048               97.917      0.082        155.148   0.052
> 4 threads write         304.099     0.052               156.094       0.102        270.461   0.059
> 8 threads write         323.047     0.099               221.370       0.144        339.455   0.094
> 16 threads write     295.040     0.216               272.032       0.235       348.849   0.183
> 32 threads write     324.467     0.394               290.072        0.441        338.103   0.378
> 64 threads write     313.713     0.812               293.261        0.871     324.603   0.787
> 1 thread read         75.687     0.053                71.629        0.056      72.526   0.055
> 2 threads read         182.329     0.044               151.683         0.053     153.125   0.052
> 4 threads read         320.785     0.050               307.180         0.052      312.016   0.051
> 8 threads read          504.880     0.063               512.295         0.062      519.683   0.062
> 16 threads read       477.706     0.134               643.385         0.099      654.149   0.098
> 32 threads read       517.670     0.247                666.696          0.192      678.480   0.189
> 64 threads read       516.599     0.495                668.360          0.383      680.673   0.376
> 
> -----Original Message-----
> From: Shu, Xinxin
> Sent: Saturday, June 14, 2014 11:50 AM
> To: Sushma Gurram; Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
> 
> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .  
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
> Sent: Saturday, June 14, 2014 2:52 AM
> To: Shu, Xinxin; Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
> 
> Hi Xinxin,
> 
> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
> It doesn't seem to have any other source files and compilation fails:
> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
> 
> Thanks,
> Sushma
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
> Sent: Monday, June 09, 2014 10:00 PM
> To: Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
> 
> Hi mark
> 
> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, June 10, 2014 1:12 AM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
> 
> Hi Xinxin,
> 
> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
> 
> 
>> Hi sage ,
>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>> 
>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>> 
>> 
>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>> 
>> Thanks,
>> Mark
>> 
>> 
>>> -----Original Message-----
>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>> To: Shu, Xinxin; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: Re: [RFC] add rocksdb support
>>> 
>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote: 
>>> 
>>>> Hi, sage
>>>> 
>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>> 
>>> 
>>> 
>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>> 
>>> 
>>> -----Original Message-----
>>> From: Sage Weil [mailto:sage@inktank.com]
>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>> To: Shu, Xinxin
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: [RFC] add rocksdb support
>>> 
>>> Hi Xinxin,
>>> 
>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>> 
>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>> 
>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>> 
>>> Thanks!
>>> sage
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> 
>> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-06-23  1:18                 ` Shu, Xinxin
@ 2014-06-27  0:44                   ` Sushma Gurram
  2014-06-27  3:33                     ` Alexandre DERUMIER
  2014-06-27  8:08                     ` Haomai Wang
  0 siblings, 2 replies; 37+ messages in thread
From: Sushma Gurram @ 2014-06-27  0:44 UTC (permalink / raw)
  To: Shu, Xinxin, 'Mark Nelson', 'Sage Weil'
  Cc: Zhang, Jian, ceph-devel

Delivery failure due to table format. Resending as plain text.

_____________________________________________
From: Sushma Gurram 
Sent: Thursday, June 26, 2014 5:35 PM
To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
Subject: RE: [RFC] add rocksdb support	


Hi Xinxin,

Thanks for providing the results of the performance tests. 

I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
Is there a ceph.conf config option to configure the background threads in rocksdb?

We ran our tests with following configuration:
System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical cores), HT disabled, 16 GB memory

rocksdb configuration has been set to the following values in ceph.conf.
        rocksdb_write_buffer_size = 4194304
        rocksdb_cache_size = 4194304
        rocksdb_bloom_size = 0
        rocksdb_max_open_files = 10240
        rocksdb_compression = false
        rocksdb_paranoid = false
        rocksdb_log = /dev/null
        rocksdb_compact_on_mount = false

fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.

rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
-------------------------------------------------------------------
IO Pattern	XFS (IOPs)	Rocksdb (IOPs)	
4K writes	~1450		~670
4K reads	~65000		~2000
64K writes	~431		~57
64K reads	~17500		~180


rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
-------------------------------------------------------------------
IO Pattern	XFS (IOPs)	Rocksdb (IOPs)	
4K writes	~1450		~962
4K reads	~65000		~1641
64K writes	~431		~426
64K reads	~17500		~209

I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?

Thanks,
Sushma

-----Original Message-----
From: Shu, Xinxin [mailto:xinxin.shu@intel.com] 
Sent: Sunday, June 22, 2014 6:18 PM
To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
Subject: RE: [RFC] add rocksdb support


Hi all,

 We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.

	                                          xfs		  leveldb		rocksdb	
	                  throughtput	latency	    throughtput	latency	   throughtput	latency
1 thread write	     84.029	  0.048	            52.430	   0.076	      71.920	0.056
2 threads write	     166.417	  0.048	            97.917	   0.082	     155.148	0.052
4 threads write	      304.099	  0.052	            156.094	    0.102	     270.461	0.059
8 threads write	      323.047	  0.099	            221.370	    0.144	     339.455	0.094
16 threads write     295.040	  0.216	            272.032	    0.235       348.849	0.183
32 threads write     324.467	  0.394	            290.072	     0.441	     338.103	0.378
64 threads write     313.713	  0.812	            293.261	     0.871     324.603	0.787
1 thread read	      75.687	  0.053	             71.629	     0.056      72.526	0.055
2 threads read	      182.329	  0.044	            151.683	      0.053     153.125	0.052
4 threads read	      320.785	  0.050	            307.180	      0.052      312.016	0.051
8 threads read	       504.880	  0.063	            512.295	      0.062      519.683	0.062
16 threads read       477.706	  0.134	            643.385	      0.099      654.149	0.098
32 threads read       517.670	  0.247	             666.696          0.192      678.480	0.189
64 threads read       516.599	  0.495	             668.360	       0.383      680.673	0.376

-----Original Message-----
From: Shu, Xinxin
Sent: Saturday, June 14, 2014 11:50 AM
To: Sushma Gurram; Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support

Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .  

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
Sent: Saturday, June 14, 2014 2:52 AM
To: Shu, Xinxin; Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support

Hi Xinxin,

I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
It doesn't seem to have any other source files and compilation fails:
os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.

Thanks,
Sushma

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
Sent: Monday, June 09, 2014 10:00 PM
To: Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support

Hi mark

I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, June 10, 2014 1:12 AM
To: Shu, Xinxin; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: Re: [RFC] add rocksdb support

Hi Xinxin,

On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
> Hi sage ,
> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>
> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .

I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?

Thanks,
Mark

>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Wednesday, May 21, 2014 9:06 PM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
>
> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>> Hi, sage
>>
>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>
> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>
>>
>> -----Original Message-----
>> From: Sage Weil [mailto:sage@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:19 AM
>> To: Shu, Xinxin
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>
>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>
>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>
>> Thanks!
>> sage
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-06-27  0:44                   ` Sushma Gurram
@ 2014-06-27  3:33                     ` Alexandre DERUMIER
  2014-06-27 17:36                       ` Sushma Gurram
  2014-06-27  8:08                     ` Haomai Wang
  1 sibling, 1 reply; 37+ messages in thread
From: Alexandre DERUMIER @ 2014-06-27  3:33 UTC (permalink / raw)
  To: Sushma Gurram; +Cc: Jian Zhang, ceph-devel, Xinxin Shu, Mark Nelson, Sage Weil

Hi Sushma,

what is the hardware disk for osd ? ssd ?
where is the journal for xfs osd ? on the same disk ? another disk ?


also 2GB rbd, seem to be low to test, because reads can be done in page cache.

65000 iops with xfs with a single osd seem to be a crazy. 
All the benchs show around 3000-4000 iops limit of osd because of locks contentions in osd daemon.
(are you sure that's it's not caches client side ?)

----- Mail original ----- 

De: "Sushma Gurram" <Sushma.Gurram@sandisk.com> 
À: "Xinxin Shu" <xinxin.shu@intel.com>, "Mark Nelson" <mark.nelson@inktank.com>, "Sage Weil" <sage@inktank.com> 
Cc: "Jian Zhang" <jian.zhang@intel.com>, ceph-devel@vger.kernel.org 
Envoyé: Vendredi 27 Juin 2014 02:44:17 
Objet: RE: [RFC] add rocksdb support 

Delivery failure due to table format. Resending as plain text. 

_____________________________________________ 
From: Sushma Gurram 
Sent: Thursday, June 26, 2014 5:35 PM 
To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil' 
Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org 
Subject: RE: [RFC] add rocksdb support 


Hi Xinxin, 

Thanks for providing the results of the performance tests. 

I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order. 
My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench? 
For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized). 
Is there a ceph.conf config option to configure the background threads in rocksdb? 

We ran our tests with following configuration: 
System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical cores), HT disabled, 16 GB memory 

rocksdb configuration has been set to the following values in ceph.conf. 
rocksdb_write_buffer_size = 4194304 
rocksdb_cache_size = 4194304 
rocksdb_bloom_size = 0 
rocksdb_max_open_files = 10240 
rocksdb_compression = false 
rocksdb_paranoid = false 
rocksdb_log = /dev/null 
rocksdb_compact_on_mount = false 

fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD. 

rbd image size = 2 GB, rocksdb_write_buffer_size=4MB 
------------------------------------------------------------------- 
IO Pattern XFS (IOPs) Rocksdb (IOPs) 
4K writes ~1450 ~670 
4K reads ~65000 ~2000 
64K writes ~431 ~57 
64K reads ~17500 ~180 


rbd image size = 2 GB, rocksdb_write_buffer_size=1GB 
------------------------------------------------------------------- 
IO Pattern XFS (IOPs) Rocksdb (IOPs) 
4K writes ~1450 ~962 
4K reads ~65000 ~1641 
64K writes ~431 ~426 
64K reads ~17500 ~209 

I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude. 
However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run? 

Thanks, 
Sushma 

-----Original Message----- 
From: Shu, Xinxin [mailto:xinxin.shu@intel.com] 
Sent: Sunday, June 22, 2014 6:18 PM 
To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil' 
Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian 
Subject: RE: [RFC] add rocksdb support 


Hi all, 

We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280) and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool), the following chart shows details, for write , with small number threads , leveldb performance is lower than the other two backends , from 16 threads point , rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number. 

xfs leveldb rocksdb 
throughtput latency throughtput latency throughtput latency 
1 thread write 84.029 0.048 52.430 0.076 71.920 0.056 
2 threads write 166.417 0.048 97.917 0.082 155.148 0.052 
4 threads write 304.099 0.052 156.094 0.102 270.461 0.059 
8 threads write 323.047 0.099 221.370 0.144 339.455 0.094 
16 threads write 295.040 0.216 272.032 0.235 348.849 0.183 
32 threads write 324.467 0.394 290.072 0.441 338.103 0.378 
64 threads write 313.713 0.812 293.261 0.871 324.603 0.787 
1 thread read 75.687 0.053 71.629 0.056 72.526 0.055 
2 threads read 182.329 0.044 151.683 0.053 153.125 0.052 
4 threads read 320.785 0.050 307.180 0.052 312.016 0.051 
8 threads read 504.880 0.063 512.295 0.062 519.683 0.062 
16 threads read 477.706 0.134 643.385 0.099 654.149 0.098 
32 threads read 517.670 0.247 666.696 0.192 678.480 0.189 
64 threads read 516.599 0.495 668.360 0.383 680.673 0.376 

-----Original Message----- 
From: Shu, Xinxin 
Sent: Saturday, June 14, 2014 11:50 AM 
To: Sushma Gurram; Mark Nelson; Sage Weil 
Cc: ceph-devel@vger.kernel.org; Zhang, Jian 
Subject: RE: [RFC] add rocksdb support 

Currently ceph will get stable rocksdb from branch 3.0.fb of ceph/rocksdb , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged , so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake . 

-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram 
Sent: Saturday, June 14, 2014 2:52 AM 
To: Shu, Xinxin; Mark Nelson; Sage Weil 
Cc: ceph-devel@vger.kernel.org; Zhang, Jian 
Subject: RE: [RFC] add rocksdb support 

Hi Xinxin, 

I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory? 
It doesn't seem to have any other source files and compilation fails: 
os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated. 

Thanks, 
Sushma 

-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin 
Sent: Monday, June 09, 2014 10:00 PM 
To: Mark Nelson; Sage Weil 
Cc: ceph-devel@vger.kernel.org; Zhang, Jian 
Subject: RE: [RFC] add rocksdb support 

Hi mark 

I have finished development of support of rocksdb submodule, a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok , I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb . 

-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson 
Sent: Tuesday, June 10, 2014 1:12 AM 
To: Shu, Xinxin; Sage Weil 
Cc: ceph-devel@vger.kernel.org; Zhang, Jian 
Subject: Re: [RFC] add rocksdb support 

Hi Xinxin, 

On 05/28/2014 05:05 AM, Shu, Xinxin wrote: 
> Hi sage , 
> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with --with-librocksdb option , in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ? 
> 
> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph . 

I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately. Any news on the 3.0 fork and autoconf/automake support in rocksdb? 

Thanks, 
Mark 

> 
> -----Original Message----- 
> From: Mark Nelson [mailto:mark.nelson@inktank.com] 
> Sent: Wednesday, May 21, 2014 9:06 PM 
> To: Shu, Xinxin; Sage Weil 
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian 
> Subject: Re: [RFC] add rocksdb support 
> 
> On 05/21/2014 07:54 AM, Shu, Xinxin wrote: 
>> Hi, sage 
>> 
>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance. 
> 
> I'm definitely interested in any performance tests you do here. Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb. I'm very interested in what you see with rocksdb as a backend. 
> 
>> 
>> -----Original Message----- 
>> From: Sage Weil [mailto:sage@inktank.com] 
>> Sent: Wednesday, May 21, 2014 9:19 AM 
>> To: Shu, Xinxin 
>> Cc: ceph-devel@vger.kernel.org 
>> Subject: Re: [RFC] add rocksdb support 
>> 
>> Hi Xinxin, 
>> 
>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch. There is also a commit that adds rocksdb as a git submodule. I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people. 
>> 
>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com. I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one. 
>> 
>> Has your group done further testing with rocksdb? Anything interesting to share? 
>> 
>> Thanks! 
>> sage 
>> 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at http://vger.kernel.org/majordomo-info.html 
>> 
> 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-06-27  0:44                   ` Sushma Gurram
  2014-06-27  3:33                     ` Alexandre DERUMIER
@ 2014-06-27  8:08                     ` Haomai Wang
  2014-07-01  0:39                       ` Sushma Gurram
  1 sibling, 1 reply; 37+ messages in thread
From: Haomai Wang @ 2014-06-27  8:08 UTC (permalink / raw)
  To: Sushma Gurram
  Cc: Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian, ceph-devel

As I mentioned days ago:

There exists two points related kvstore perf:
1. The order of image and the strip
size are important to performance. Because the header like inode in fs is
much lightweight than fd, so the order of image is expected to be
lower. And strip size can be configurated to 4kb to improve large io
performance.
2. The header cache(https://github.com/ceph/ceph/pull/1649) is not
merged, the header cache is important to perf. It's just like fdcahce
in FileStore.

As for detail perf number, I think this result based on master branch
is nearly correct. When strip-size and header cache are ready, I think
it will be better.

On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram
<Sushma.Gurram@sandisk.com> wrote:
> Delivery failure due to table format. Resending as plain text.
>
> _____________________________________________
> From: Sushma Gurram
> Sent: Thursday, June 26, 2014 5:35 PM
> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
> Subject: RE: [RFC] add rocksdb support
>
>
> Hi Xinxin,
>
> Thanks for providing the results of the performance tests.
>
> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
> Is there a ceph.conf config option to configure the background threads in rocksdb?
>
> We ran our tests with following configuration:
> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical cores), HT disabled, 16 GB memory
>
> rocksdb configuration has been set to the following values in ceph.conf.
>         rocksdb_write_buffer_size = 4194304
>         rocksdb_cache_size = 4194304
>         rocksdb_bloom_size = 0
>         rocksdb_max_open_files = 10240
>         rocksdb_compression = false
>         rocksdb_paranoid = false
>         rocksdb_log = /dev/null
>         rocksdb_compact_on_mount = false
>
> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>
> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
> -------------------------------------------------------------------
> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
> 4K writes       ~1450           ~670
> 4K reads        ~65000          ~2000
> 64K writes      ~431            ~57
> 64K reads       ~17500          ~180
>
>
> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
> -------------------------------------------------------------------
> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
> 4K writes       ~1450           ~962
> 4K reads        ~65000          ~1641
> 64K writes      ~431            ~426
> 64K reads       ~17500          ~209
>
> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
> Sent: Sunday, June 22, 2014 6:18 PM
> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
>
>
> Hi all,
>
>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>
>                                                   xfs             leveldb               rocksdb
>                           throughtput   latency     throughtput latency    throughtput  latency
> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>
> -----Original Message-----
> From: Shu, Xinxin
> Sent: Saturday, June 14, 2014 11:50 AM
> To: Sushma Gurram; Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
>
> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
> Sent: Saturday, June 14, 2014 2:52 AM
> To: Shu, Xinxin; Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
>
> Hi Xinxin,
>
> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
> It doesn't seem to have any other source files and compilation fails:
> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
> Sent: Monday, June 09, 2014 10:00 PM
> To: Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
>
> Hi mark
>
> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, June 10, 2014 1:12 AM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
>
> Hi Xinxin,
>
> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>> Hi sage ,
>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>
>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>
> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>
> Thanks,
> Mark
>
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:06 PM
>> To: Shu, Xinxin; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: Re: [RFC] add rocksdb support
>>
>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>> Hi, sage
>>>
>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>
>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>
>>>
>>> -----Original Message-----
>>> From: Sage Weil [mailto:sage@inktank.com]
>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>> To: Shu, Xinxin
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> Hi Xinxin,
>>>
>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>
>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>
>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>
>>> Thanks!
>>> sage
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-06-27  3:33                     ` Alexandre DERUMIER
@ 2014-06-27 17:36                       ` Sushma Gurram
  0 siblings, 0 replies; 37+ messages in thread
From: Sushma Gurram @ 2014-06-27 17:36 UTC (permalink / raw)
  To: Alexandre DERUMIER
  Cc: Jian Zhang, ceph-devel, Xinxin Shu, Mark Nelson, Sage Weil

Hi Alexandre,

Yes, it's SSD which is used for OSD and journal for XFS is on the same SSD.

I agree 2GB rbd is low and most of the reads probably hitting the page cache. Just for my understanding, do you expect rocksdb to perform better than XFS if size of rbd image is much larger than memory?

65000 IOPs on XFS is with a branch we've been working where lock contentions in OSD (especially filestore) have been analyzed and code changes made for better parallelism. This branch is currently under review.

Thanks,
Sushma

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Thursday, June 26, 2014 8:34 PM
To: Sushma Gurram
Cc: Jian Zhang; ceph-devel@vger.kernel.org; Xinxin Shu; Mark Nelson; Sage Weil
Subject: Re: [RFC] add rocksdb support

Hi Sushma,

what is the hardware disk for osd ? ssd ?
where is the journal for xfs osd ? on the same disk ? another disk ?


also 2GB rbd, seem to be low to test, because reads can be done in page cache.

65000 iops with xfs with a single osd seem to be a crazy. 
All the benchs show around 3000-4000 iops limit of osd because of locks contentions in osd daemon.
(are you sure that's it's not caches client side ?)

----- Mail original ----- 

De: "Sushma Gurram" <Sushma.Gurram@sandisk.com>
À: "Xinxin Shu" <xinxin.shu@intel.com>, "Mark Nelson" <mark.nelson@inktank.com>, "Sage Weil" <sage@inktank.com>
Cc: "Jian Zhang" <jian.zhang@intel.com>, ceph-devel@vger.kernel.org
Envoyé: Vendredi 27 Juin 2014 02:44:17
Objet: RE: [RFC] add rocksdb support 

Delivery failure due to table format. Resending as plain text. 

_____________________________________________
From: Sushma Gurram
Sent: Thursday, June 26, 2014 5:35 PM
To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil' 
Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
Subject: RE: [RFC] add rocksdb support 


Hi Xinxin, 

Thanks for providing the results of the performance tests. 

I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order. 
My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench? 
For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized). 
Is there a ceph.conf config option to configure the background threads in rocksdb? 

We ran our tests with following configuration: 
System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical cores), HT disabled, 16 GB memory 

rocksdb configuration has been set to the following values in ceph.conf. 
rocksdb_write_buffer_size = 4194304
rocksdb_cache_size = 4194304
rocksdb_bloom_size = 0
rocksdb_max_open_files = 10240
rocksdb_compression = false
rocksdb_paranoid = false
rocksdb_log = /dev/null
rocksdb_compact_on_mount = false 

fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD. 

rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
-------------------------------------------------------------------
IO Pattern XFS (IOPs) Rocksdb (IOPs)
4K writes ~1450 ~670
4K reads ~65000 ~2000
64K writes ~431 ~57
64K reads ~17500 ~180 


rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
-------------------------------------------------------------------
IO Pattern XFS (IOPs) Rocksdb (IOPs)
4K writes ~1450 ~962
4K reads ~65000 ~1641
64K writes ~431 ~426
64K reads ~17500 ~209 

I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude. 
However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run? 

Thanks,
Sushma 

-----Original Message-----
From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
Sent: Sunday, June 22, 2014 6:18 PM
To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil' 
Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
Subject: RE: [RFC] add rocksdb support 


Hi all, 

We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280) and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool), the following chart shows details, for write , with small number threads , leveldb performance is lower than the other two backends , from 16 threads point , rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number. 

xfs leveldb rocksdb
throughtput latency throughtput latency throughtput latency
1 thread write 84.029 0.048 52.430 0.076 71.920 0.056
2 threads write 166.417 0.048 97.917 0.082 155.148 0.052
4 threads write 304.099 0.052 156.094 0.102 270.461 0.059
8 threads write 323.047 0.099 221.370 0.144 339.455 0.094
16 threads write 295.040 0.216 272.032 0.235 348.849 0.183
32 threads write 324.467 0.394 290.072 0.441 338.103 0.378
64 threads write 313.713 0.812 293.261 0.871 324.603 0.787
1 thread read 75.687 0.053 71.629 0.056 72.526 0.055
2 threads read 182.329 0.044 151.683 0.053 153.125 0.052
4 threads read 320.785 0.050 307.180 0.052 312.016 0.051
8 threads read 504.880 0.063 512.295 0.062 519.683 0.062
16 threads read 477.706 0.134 643.385 0.099 654.149 0.098
32 threads read 517.670 0.247 666.696 0.192 678.480 0.189
64 threads read 516.599 0.495 668.360 0.383 680.673 0.376 

-----Original Message-----
From: Shu, Xinxin
Sent: Saturday, June 14, 2014 11:50 AM
To: Sushma Gurram; Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support 

Currently ceph will get stable rocksdb from branch 3.0.fb of ceph/rocksdb , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged , so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake . 

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
Sent: Saturday, June 14, 2014 2:52 AM
To: Shu, Xinxin; Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support 

Hi Xinxin, 

I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory? 
It doesn't seem to have any other source files and compilation fails: 
os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated. 

Thanks,
Sushma 

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
Sent: Monday, June 09, 2014 10:00 PM
To: Mark Nelson; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: RE: [RFC] add rocksdb support 

Hi mark 

I have finished development of support of rocksdb submodule, a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok , I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb . 

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, June 10, 2014 1:12 AM
To: Shu, Xinxin; Sage Weil
Cc: ceph-devel@vger.kernel.org; Zhang, Jian
Subject: Re: [RFC] add rocksdb support 

Hi Xinxin, 

On 05/28/2014 05:05 AM, Shu, Xinxin wrote: 
> Hi sage ,
> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with --with-librocksdb option , in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ? 
> 
> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph . 

I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately. Any news on the 3.0 fork and autoconf/automake support in rocksdb? 

Thanks,
Mark 

> 
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Wednesday, May 21, 2014 9:06 PM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
> 
> On 05/21/2014 07:54 AM, Shu, Xinxin wrote: 
>> Hi, sage
>> 
>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance. 
> 
> I'm definitely interested in any performance tests you do here. Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb. I'm very interested in what you see with rocksdb as a backend. 
> 
>> 
>> -----Original Message-----
>> From: Sage Weil [mailto:sage@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:19 AM
>> To: Shu, Xinxin
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>> 
>> Hi Xinxin,
>> 
>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch. There is also a commit that adds rocksdb as a git submodule. I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people. 
>> 
>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com. I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one. 
>> 
>> Has your group done further testing with rocksdb? Anything interesting to share? 
>> 
>> Thanks! 
>> sage
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at http://vger.kernel.org/majordomo-info.html
>> 
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-06-27  8:08                     ` Haomai Wang
@ 2014-07-01  0:39                       ` Sushma Gurram
  2014-07-01  6:10                         ` Haomai Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Sushma Gurram @ 2014-07-01  0:39 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian, ceph-devel

Hi Haomai/Greg,

I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() -> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr() -> KeyValueStore::getattr() -> GenericObjectMap::get_values() -> GenericObjectMap::lookup_header()

I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.

In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.

Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?

Thanks,
Sushma

-----Original Message-----
From: Haomai Wang [mailto:haomaiwang@gmail.com] 
Sent: Friday, June 27, 2014 1:08 AM
To: Sushma Gurram
Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

As I mentioned days ago:

There exists two points related kvstore perf:
1. The order of image and the strip
size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.

As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.

On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
> Delivery failure due to table format. Resending as plain text.
>
> _____________________________________________
> From: Sushma Gurram
> Sent: Thursday, June 26, 2014 5:35 PM
> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
> Subject: RE: [RFC] add rocksdb support
>
>
> Hi Xinxin,
>
> Thanks for providing the results of the performance tests.
>
> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
> Is there a ceph.conf config option to configure the background threads in rocksdb?
>
> We ran our tests with following configuration:
> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical cores), 
> HT disabled, 16 GB memory
>
> rocksdb configuration has been set to the following values in ceph.conf.
>         rocksdb_write_buffer_size = 4194304
>         rocksdb_cache_size = 4194304
>         rocksdb_bloom_size = 0
>         rocksdb_max_open_files = 10240
>         rocksdb_compression = false
>         rocksdb_paranoid = false
>         rocksdb_log = /dev/null
>         rocksdb_compact_on_mount = false
>
> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>
> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
> -------------------------------------------------------------------
> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
> 4K writes       ~1450           ~670
> 4K reads        ~65000          ~2000
> 64K writes      ~431            ~57
> 64K reads       ~17500          ~180
>
>
> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
> -------------------------------------------------------------------
> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
> 4K writes       ~1450           ~962
> 4K reads        ~65000          ~1641
> 64K writes      ~431            ~426
> 64K reads       ~17500          ~209
>
> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
> Sent: Sunday, June 22, 2014 6:18 PM
> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
>
>
> Hi all,
>
>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>
>                                                   xfs             leveldb               rocksdb
>                           throughtput   latency     throughtput latency    throughtput  latency
> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>
> -----Original Message-----
> From: Shu, Xinxin
> Sent: Saturday, June 14, 2014 11:50 AM
> To: Sushma Gurram; Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
>
> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
> Sent: Saturday, June 14, 2014 2:52 AM
> To: Shu, Xinxin; Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
>
> Hi Xinxin,
>
> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
> It doesn't seem to have any other source files and compilation fails:
> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
> Sent: Monday, June 09, 2014 10:00 PM
> To: Mark Nelson; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: RE: [RFC] add rocksdb support
>
> Hi mark
>
> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, June 10, 2014 1:12 AM
> To: Shu, Xinxin; Sage Weil
> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> Subject: Re: [RFC] add rocksdb support
>
> Hi Xinxin,
>
> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>> Hi sage ,
>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>
>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>
> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>
> Thanks,
> Mark
>
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Wednesday, May 21, 2014 9:06 PM
>> To: Shu, Xinxin; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: Re: [RFC] add rocksdb support
>>
>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>> Hi, sage
>>>
>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>
>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>
>>>
>>> -----Original Message-----
>>> From: Sage Weil [mailto:sage@inktank.com]
>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>> To: Shu, Xinxin
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> Hi Xinxin,
>>>
>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>
>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>
>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>
>>> Thanks!
>>> sage
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html



--
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-07-01  0:39                       ` Sushma Gurram
@ 2014-07-01  6:10                         ` Haomai Wang
  2014-07-01  7:13                           ` Somnath Roy
  2014-07-02  7:23                           ` Shu, Xinxin
  0 siblings, 2 replies; 37+ messages in thread
From: Haomai Wang @ 2014-07-01  6:10 UTC (permalink / raw)
  To: Sushma Gurram
  Cc: Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian, ceph-devel

Hi Sushma,

Thanks for your investigations! We already noticed the serializing
risk on GenericObjectMap/DBObjectMap. In order to improve performance
we add header cache to DBObjectMap.

As for KeyValueStore, a cache branch is on the reviewing, it can
greatly reduce lookup_header calls. Of course, replace with RWLock is
a good suggestion, I would like to try to estimate!

On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
> Hi Haomai/Greg,
>
> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() -> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr() -> KeyValueStore::getattr() -> GenericObjectMap::get_values() -> GenericObjectMap::lookup_header()
>
> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>
> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>
> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@gmail.com]
> Sent: Friday, June 27, 2014 1:08 AM
> To: Sushma Gurram
> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> As I mentioned days ago:
>
> There exists two points related kvstore perf:
> 1. The order of image and the strip
> size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>
> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>
> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>> Delivery failure due to table format. Resending as plain text.
>>
>> _____________________________________________
>> From: Sushma Gurram
>> Sent: Thursday, June 26, 2014 5:35 PM
>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>> Subject: RE: [RFC] add rocksdb support
>>
>>
>> Hi Xinxin,
>>
>> Thanks for providing the results of the performance tests.
>>
>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>
>> We ran our tests with following configuration:
>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical cores),
>> HT disabled, 16 GB memory
>>
>> rocksdb configuration has been set to the following values in ceph.conf.
>>         rocksdb_write_buffer_size = 4194304
>>         rocksdb_cache_size = 4194304
>>         rocksdb_bloom_size = 0
>>         rocksdb_max_open_files = 10240
>>         rocksdb_compression = false
>>         rocksdb_paranoid = false
>>         rocksdb_log = /dev/null
>>         rocksdb_compact_on_mount = false
>>
>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>
>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>> -------------------------------------------------------------------
>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>> 4K writes       ~1450           ~670
>> 4K reads        ~65000          ~2000
>> 64K writes      ~431            ~57
>> 64K reads       ~17500          ~180
>>
>>
>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>> -------------------------------------------------------------------
>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>> 4K writes       ~1450           ~962
>> 4K reads        ~65000          ~1641
>> 64K writes      ~431            ~426
>> 64K reads       ~17500          ~209
>>
>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>> Sent: Sunday, June 22, 2014 6:18 PM
>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>>
>> Hi all,
>>
>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>
>>                                                   xfs             leveldb               rocksdb
>>                           throughtput   latency     throughtput latency    throughtput  latency
>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>
>> -----Original Message-----
>> From: Shu, Xinxin
>> Sent: Saturday, June 14, 2014 11:50 AM
>> To: Sushma Gurram; Mark Nelson; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
>> Sent: Saturday, June 14, 2014 2:52 AM
>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>> It doesn't seem to have any other source files and compilation fails:
>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>> Sent: Monday, June 09, 2014 10:00 PM
>> To: Mark Nelson; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>> Hi mark
>>
>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>> Sent: Tuesday, June 10, 2014 1:12 AM
>> To: Shu, Xinxin; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>> Hi sage ,
>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>
>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>
>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>
>> Thanks,
>> Mark
>>
>>>
>>> -----Original Message-----
>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>> To: Shu, Xinxin; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>> Hi, sage
>>>>
>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>
>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>
>>>>
>>>> -----Original Message-----
>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>> To: Shu, Xinxin
>>>> Cc: ceph-devel@vger.kernel.org
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> Hi Xinxin,
>>>>
>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>
>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>
>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>
>>>> Thanks!
>>>> sage
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-07-01  6:10                         ` Haomai Wang
@ 2014-07-01  7:13                           ` Somnath Roy
  2014-07-01  8:05                             ` Haomai Wang
  2014-07-01 15:11                             ` Sage Weil
  2014-07-02  7:23                           ` Shu, Xinxin
  1 sibling, 2 replies; 37+ messages in thread
From: Somnath Roy @ 2014-07-01  7:13 UTC (permalink / raw)
  To: Haomai Wang, Sushma Gurram
  Cc: Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian, ceph-devel

Hi Haomai,
But, the cache hit will be very minimal or null, if the actual storage per node is very huge (say in the PB level). So, it will be mostly hitting Omap, isn't it ?
How this header cache is going to resolve this serialization issue then ?

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
Sent: Monday, June 30, 2014 11:10 PM
To: Sushma Gurram
Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

Hi Sushma,

Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.

As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!

On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
> Hi Haomai/Greg,
>
> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() -> 
> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr() -> 
> KeyValueStore::getattr() -> GenericObjectMap::get_values() -> 
> GenericObjectMap::lookup_header()
>
> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>
> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>
> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@gmail.com]
> Sent: Friday, June 27, 2014 1:08 AM
> To: Sushma Gurram
> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
> ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> As I mentioned days ago:
>
> There exists two points related kvstore perf:
> 1. The order of image and the strip
> size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>
> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>
> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>> Delivery failure due to table format. Resending as plain text.
>>
>> _____________________________________________
>> From: Sushma Gurram
>> Sent: Thursday, June 26, 2014 5:35 PM
>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>> Subject: RE: [RFC] add rocksdb support
>>
>>
>> Hi Xinxin,
>>
>> Thanks for providing the results of the performance tests.
>>
>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>
>> We ran our tests with following configuration:
>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical 
>> cores), HT disabled, 16 GB memory
>>
>> rocksdb configuration has been set to the following values in ceph.conf.
>>         rocksdb_write_buffer_size = 4194304
>>         rocksdb_cache_size = 4194304
>>         rocksdb_bloom_size = 0
>>         rocksdb_max_open_files = 10240
>>         rocksdb_compression = false
>>         rocksdb_paranoid = false
>>         rocksdb_log = /dev/null
>>         rocksdb_compact_on_mount = false
>>
>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>
>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>> -------------------------------------------------------------------
>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>> 4K writes       ~1450           ~670
>> 4K reads        ~65000          ~2000
>> 64K writes      ~431            ~57
>> 64K reads       ~17500          ~180
>>
>>
>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>> -------------------------------------------------------------------
>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>> 4K writes       ~1450           ~962
>> 4K reads        ~65000          ~1641
>> 64K writes      ~431            ~426
>> 64K reads       ~17500          ~209
>>
>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>> Sent: Sunday, June 22, 2014 6:18 PM
>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>>
>> Hi all,
>>
>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>
>>                                                   xfs             leveldb               rocksdb
>>                           throughtput   latency     throughtput latency    throughtput  latency
>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>
>> -----Original Message-----
>> From: Shu, Xinxin
>> Sent: Saturday, June 14, 2014 11:50 AM
>> To: Sushma Gurram; Mark Nelson; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
>> Sent: Saturday, June 14, 2014 2:52 AM
>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>> It doesn't seem to have any other source files and compilation fails:
>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>> Sent: Monday, June 09, 2014 10:00 PM
>> To: Mark Nelson; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>> Hi mark
>>
>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>> Sent: Tuesday, June 10, 2014 1:12 AM
>> To: Shu, Xinxin; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>> Hi sage ,
>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>
>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>
>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>
>> Thanks,
>> Mark
>>
>>>
>>> -----Original Message-----
>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>> To: Shu, Xinxin; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>> Hi, sage
>>>>
>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>
>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>
>>>>
>>>> -----Original Message-----
>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>> To: Shu, Xinxin
>>>> Cc: ceph-devel@vger.kernel.org
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> Hi Xinxin,
>>>>
>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>
>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>
>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>
>>>> Thanks!
>>>> sage
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best Regards,
>
> Wheat



--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-07-01  7:13                           ` Somnath Roy
@ 2014-07-01  8:05                             ` Haomai Wang
  2014-07-01 15:15                               ` Sushma Gurram
  2014-07-01 15:11                             ` Sage Weil
  1 sibling, 1 reply; 37+ messages in thread
From: Haomai Wang @ 2014-07-01  8:05 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Sushma Gurram, Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian,
	ceph-devel

Hi,

I don't know why OSD capacity can be PB level. Actually, most of use
case should be serval TBs(1-4TB). As for cache hit, it totally depend
on the IO characteristic. In my opinion, header cache in KeyValueStore
can meet hit cache mostly if config object size and strip
size(KeyValueStore) properly.

But I'm also interested in your lock comments, what ceph version do
you estimate with serialization issue?

On Tue, Jul 1, 2014 at 3:13 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Hi Haomai,
> But, the cache hit will be very minimal or null, if the actual storage per node is very huge (say in the PB level). So, it will be mostly hitting Omap, isn't it ?
> How this header cache is going to resolve this serialization issue then ?
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
> Sent: Monday, June 30, 2014 11:10 PM
> To: Sushma Gurram
> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> Hi Sushma,
>
> Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.
>
> As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!
>
> On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>> Hi Haomai/Greg,
>>
>> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
>> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() ->
>> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr() ->
>> KeyValueStore::getattr() -> GenericObjectMap::get_values() ->
>> GenericObjectMap::lookup_header()
>>
>> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>>
>> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>>
>> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: Haomai Wang [mailto:haomaiwang@gmail.com]
>> Sent: Friday, June 27, 2014 1:08 AM
>> To: Sushma Gurram
>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian;
>> ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> As I mentioned days ago:
>>
>> There exists two points related kvstore perf:
>> 1. The order of image and the strip
>> size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
>> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>>
>> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>>
>> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>> Delivery failure due to table format. Resending as plain text.
>>>
>>> _____________________________________________
>>> From: Sushma Gurram
>>> Sent: Thursday, June 26, 2014 5:35 PM
>>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>>
>>> Hi Xinxin,
>>>
>>> Thanks for providing the results of the performance tests.
>>>
>>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>>
>>> We ran our tests with following configuration:
>>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical
>>> cores), HT disabled, 16 GB memory
>>>
>>> rocksdb configuration has been set to the following values in ceph.conf.
>>>         rocksdb_write_buffer_size = 4194304
>>>         rocksdb_cache_size = 4194304
>>>         rocksdb_bloom_size = 0
>>>         rocksdb_max_open_files = 10240
>>>         rocksdb_compression = false
>>>         rocksdb_paranoid = false
>>>         rocksdb_log = /dev/null
>>>         rocksdb_compact_on_mount = false
>>>
>>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>>
>>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>>> -------------------------------------------------------------------
>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>> 4K writes       ~1450           ~670
>>> 4K reads        ~65000          ~2000
>>> 64K writes      ~431            ~57
>>> 64K reads       ~17500          ~180
>>>
>>>
>>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>>> -------------------------------------------------------------------
>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>> 4K writes       ~1450           ~962
>>> 4K reads        ~65000          ~1641
>>> 64K writes      ~431            ~426
>>> 64K reads       ~17500          ~209
>>>
>>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>>
>>> Thanks,
>>> Sushma
>>>
>>> -----Original Message-----
>>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>>> Sent: Sunday, June 22, 2014 6:18 PM
>>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>>
>>> Hi all,
>>>
>>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>>
>>>                                                   xfs             leveldb               rocksdb
>>>                           throughtput   latency     throughtput latency    throughtput  latency
>>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>>
>>> -----Original Message-----
>>> From: Shu, Xinxin
>>> Sent: Saturday, June 14, 2014 11:50 AM
>>> To: Sushma Gurram; Mark Nelson; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
>>> Sent: Saturday, June 14, 2014 2:52 AM
>>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>> Hi Xinxin,
>>>
>>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>>> It doesn't seem to have any other source files and compilation fails:
>>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>>
>>> Thanks,
>>> Sushma
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>>> Sent: Monday, June 09, 2014 10:00 PM
>>> To: Mark Nelson; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>> Hi mark
>>>
>>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>> Sent: Tuesday, June 10, 2014 1:12 AM
>>> To: Shu, Xinxin; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> Hi Xinxin,
>>>
>>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>>> Hi sage ,
>>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>>
>>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>>
>>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>>
>>> Thanks,
>>> Mark
>>>
>>>>
>>>> -----Original Message-----
>>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>>> To: Shu, Xinxin; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>>> Hi, sage
>>>>>
>>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>>
>>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>>> To: Shu, Xinxin
>>>>> Cc: ceph-devel@vger.kernel.org
>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>
>>>>> Hi Xinxin,
>>>>>
>>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>>
>>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>>
>>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> Best Regards,
>
> Wheat
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-07-01  7:13                           ` Somnath Roy
  2014-07-01  8:05                             ` Haomai Wang
@ 2014-07-01 15:11                             ` Sage Weil
  1 sibling, 0 replies; 37+ messages in thread
From: Sage Weil @ 2014-07-01 15:11 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Haomai Wang, Sushma Gurram, Shu, Xinxin, Mark Nelson, Zhang,
	Jian, ceph-devel

On Tue, 1 Jul 2014, Somnath Roy wrote:
> Hi Haomai,
> But, the cache hit will be very minimal or null, if the actual storage per node is very huge (say in the PB level). So, it will be mostly hitting Omap, isn't it ?
> How this header cache is going to resolve this serialization issue then ?

The header cache is really important for Transactions that have multiple 
ops on the same object.  But I suspect you're right that for some 
workloads it won't help with the lock contention you are seeing here.

sage

> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
> Sent: Monday, June 30, 2014 11:10 PM
> To: Sushma Gurram
> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
> 
> Hi Sushma,
> 
> Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.
> 
> As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!
> 
> On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
> > Hi Haomai/Greg,
> >
> > I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
> > ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() -> 
> > ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr() -> 
> > KeyValueStore::getattr() -> GenericObjectMap::get_values() -> 
> > GenericObjectMap::lookup_header()
> >
> > I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
> >
> > In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
> >
> > Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
> >
> > Thanks,
> > Sushma
> >
> > -----Original Message-----
> > From: Haomai Wang [mailto:haomaiwang@gmail.com]
> > Sent: Friday, June 27, 2014 1:08 AM
> > To: Sushma Gurram
> > Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
> > ceph-devel@vger.kernel.org
> > Subject: Re: [RFC] add rocksdb support
> >
> > As I mentioned days ago:
> >
> > There exists two points related kvstore perf:
> > 1. The order of image and the strip
> > size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
> > 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
> >
> > As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
> >
> > On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
> >> Delivery failure due to table format. Resending as plain text.
> >>
> >> _____________________________________________
> >> From: Sushma Gurram
> >> Sent: Thursday, June 26, 2014 5:35 PM
> >> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
> >> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
> >> Subject: RE: [RFC] add rocksdb support
> >>
> >>
> >> Hi Xinxin,
> >>
> >> Thanks for providing the results of the performance tests.
> >>
> >> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
> >> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
> >> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
> >> Is there a ceph.conf config option to configure the background threads in rocksdb?
> >>
> >> We ran our tests with following configuration:
> >> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical 
> >> cores), HT disabled, 16 GB memory
> >>
> >> rocksdb configuration has been set to the following values in ceph.conf.
> >>         rocksdb_write_buffer_size = 4194304
> >>         rocksdb_cache_size = 4194304
> >>         rocksdb_bloom_size = 0
> >>         rocksdb_max_open_files = 10240
> >>         rocksdb_compression = false
> >>         rocksdb_paranoid = false
> >>         rocksdb_log = /dev/null
> >>         rocksdb_compact_on_mount = false
> >>
> >> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
> >>
> >> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
> >> -------------------------------------------------------------------
> >> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
> >> 4K writes       ~1450           ~670
> >> 4K reads        ~65000          ~2000
> >> 64K writes      ~431            ~57
> >> 64K reads       ~17500          ~180
> >>
> >>
> >> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
> >> -------------------------------------------------------------------
> >> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
> >> 4K writes       ~1450           ~962
> >> 4K reads        ~65000          ~1641
> >> 64K writes      ~431            ~426
> >> 64K reads       ~17500          ~209
> >>
> >> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
> >> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
> >>
> >> Thanks,
> >> Sushma
> >>
> >> -----Original Message-----
> >> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
> >> Sent: Sunday, June 22, 2014 6:18 PM
> >> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
> >> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
> >> Subject: RE: [RFC] add rocksdb support
> >>
> >>
> >> Hi all,
> >>
> >>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
> >>
> >>                                                   xfs             leveldb               rocksdb
> >>                           throughtput   latency     throughtput latency    throughtput  latency
> >> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
> >> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
> >> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
> >> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
> >> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
> >> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
> >> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
> >> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
> >> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
> >> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
> >> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
> >> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
> >> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
> >> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
> >>
> >> -----Original Message-----
> >> From: Shu, Xinxin
> >> Sent: Saturday, June 14, 2014 11:50 AM
> >> To: Sushma Gurram; Mark Nelson; Sage Weil
> >> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> >> Subject: RE: [RFC] add rocksdb support
> >>
> >> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
> >>
> >> -----Original Message-----
> >> From: ceph-devel-owner@vger.kernel.org 
> >> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
> >> Sent: Saturday, June 14, 2014 2:52 AM
> >> To: Shu, Xinxin; Mark Nelson; Sage Weil
> >> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> >> Subject: RE: [RFC] add rocksdb support
> >>
> >> Hi Xinxin,
> >>
> >> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
> >> It doesn't seem to have any other source files and compilation fails:
> >> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
> >>
> >> Thanks,
> >> Sushma
> >>
> >> -----Original Message-----
> >> From: ceph-devel-owner@vger.kernel.org 
> >> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
> >> Sent: Monday, June 09, 2014 10:00 PM
> >> To: Mark Nelson; Sage Weil
> >> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> >> Subject: RE: [RFC] add rocksdb support
> >>
> >> Hi mark
> >>
> >> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
> >>
> >> -----Original Message-----
> >> From: ceph-devel-owner@vger.kernel.org 
> >> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> >> Sent: Tuesday, June 10, 2014 1:12 AM
> >> To: Shu, Xinxin; Sage Weil
> >> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> >> Subject: Re: [RFC] add rocksdb support
> >>
> >> Hi Xinxin,
> >>
> >> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
> >>> Hi sage ,
> >>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
> >>>
> >>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
> >>
> >> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
> >>
> >> Thanks,
> >> Mark
> >>
> >>>
> >>> -----Original Message-----
> >>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> >>> Sent: Wednesday, May 21, 2014 9:06 PM
> >>> To: Shu, Xinxin; Sage Weil
> >>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
> >>> Subject: Re: [RFC] add rocksdb support
> >>>
> >>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
> >>>> Hi, sage
> >>>>
> >>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
> >>>
> >>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
> >>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Sage Weil [mailto:sage@inktank.com]
> >>>> Sent: Wednesday, May 21, 2014 9:19 AM
> >>>> To: Shu, Xinxin
> >>>> Cc: ceph-devel@vger.kernel.org
> >>>> Subject: Re: [RFC] add rocksdb support
> >>>>
> >>>> Hi Xinxin,
> >>>>
> >>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
> >>>>
> >>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
> >>>>
> >>>> Has your group done further testing with rocksdb?  Anything interesting to share?
> >>>>
> >>>> Thanks!
> >>>> sage
> >>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>>> in the body of a message to majordomo@vger.kernel.org More 
> >>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >> in the body of a message to majordomo@vger.kernel.org More majordomo 
> >> info at  http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >> in the body of a message to majordomo@vger.kernel.org More majordomo 
> >> info at  http://vger.kernel.org/majordomo-info.html
> >>
> >> ________________________________
> >>
> >> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >> in the body of a message to majordomo@vger.kernel.org More majordomo 
> >> info at  http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >> in the body of a message to majordomo@vger.kernel.org More majordomo 
> >> info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> >
> > --
> > Best Regards,
> >
> > Wheat
> 
> 
> 
> --
> Best Regards,
> 
> Wheat
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-07-01  8:05                             ` Haomai Wang
@ 2014-07-01 15:15                               ` Sushma Gurram
  2014-07-01 17:02                                 ` Haomai Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Sushma Gurram @ 2014-07-01 15:15 UTC (permalink / raw)
  To: Haomai Wang, Somnath Roy
  Cc: Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian, ceph-devel

Haomoi,

Is there any write up on keyvalue store header cache and strip size? Based on what you stated, it appears that strip size improves performance with large object sizes. How would header cache impact 4KB object sizes?
We'd like to guesstimate the improvement due to strip size and header cache. I'm not sure about header cache implementation yet, but fdcache had serialization issues and there was a sharded fdcache to address this (under review, I guess).

I believe the header_lock serialization exists in all ceph branches so far, including the master.

Thanks,
Sushma

-----Original Message-----
From: Haomai Wang [mailto:haomaiwang@gmail.com] 
Sent: Tuesday, July 01, 2014 1:06 AM
To: Somnath Roy
Cc: Sushma Gurram; Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

Hi,

I don't know why OSD capacity can be PB level. Actually, most of use case should be serval TBs(1-4TB). As for cache hit, it totally depend on the IO characteristic. In my opinion, header cache in KeyValueStore can meet hit cache mostly if config object size and strip
size(KeyValueStore) properly.

But I'm also interested in your lock comments, what ceph version do you estimate with serialization issue?

On Tue, Jul 1, 2014 at 3:13 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Hi Haomai,
> But, the cache hit will be very minimal or null, if the actual storage per node is very huge (say in the PB level). So, it will be mostly hitting Omap, isn't it ?
> How this header cache is going to resolve this serialization issue then ?
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
> Sent: Monday, June 30, 2014 11:10 PM
> To: Sushma Gurram
> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
> ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> Hi Sushma,
>
> Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.
>
> As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!
>
> On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>> Hi Haomai/Greg,
>>
>> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
>> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() ->
>> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr() 
>> ->
>> KeyValueStore::getattr() -> GenericObjectMap::get_values() ->
>> GenericObjectMap::lookup_header()
>>
>> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>>
>> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>>
>> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: Haomai Wang [mailto:haomaiwang@gmail.com]
>> Sent: Friday, June 27, 2014 1:08 AM
>> To: Sushma Gurram
>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
>> ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> As I mentioned days ago:
>>
>> There exists two points related kvstore perf:
>> 1. The order of image and the strip
>> size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
>> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>>
>> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>>
>> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>> Delivery failure due to table format. Resending as plain text.
>>>
>>> _____________________________________________
>>> From: Sushma Gurram
>>> Sent: Thursday, June 26, 2014 5:35 PM
>>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>>
>>> Hi Xinxin,
>>>
>>> Thanks for providing the results of the performance tests.
>>>
>>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>>
>>> We ran our tests with following configuration:
>>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical 
>>> cores), HT disabled, 16 GB memory
>>>
>>> rocksdb configuration has been set to the following values in ceph.conf.
>>>         rocksdb_write_buffer_size = 4194304
>>>         rocksdb_cache_size = 4194304
>>>         rocksdb_bloom_size = 0
>>>         rocksdb_max_open_files = 10240
>>>         rocksdb_compression = false
>>>         rocksdb_paranoid = false
>>>         rocksdb_log = /dev/null
>>>         rocksdb_compact_on_mount = false
>>>
>>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>>
>>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>>> -------------------------------------------------------------------
>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>> 4K writes       ~1450           ~670
>>> 4K reads        ~65000          ~2000
>>> 64K writes      ~431            ~57
>>> 64K reads       ~17500          ~180
>>>
>>>
>>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>>> -------------------------------------------------------------------
>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>> 4K writes       ~1450           ~962
>>> 4K reads        ~65000          ~1641
>>> 64K writes      ~431            ~426
>>> 64K reads       ~17500          ~209
>>>
>>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>>
>>> Thanks,
>>> Sushma
>>>
>>> -----Original Message-----
>>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>>> Sent: Sunday, June 22, 2014 6:18 PM
>>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>>
>>> Hi all,
>>>
>>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>>
>>>                                                   xfs             leveldb               rocksdb
>>>                           throughtput   latency     throughtput latency    throughtput  latency
>>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>>
>>> -----Original Message-----
>>> From: Shu, Xinxin
>>> Sent: Saturday, June 14, 2014 11:50 AM
>>> To: Sushma Gurram; Mark Nelson; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org 
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
>>> Sent: Saturday, June 14, 2014 2:52 AM
>>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>> Hi Xinxin,
>>>
>>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>>> It doesn't seem to have any other source files and compilation fails:
>>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>>
>>> Thanks,
>>> Sushma
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org 
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>>> Sent: Monday, June 09, 2014 10:00 PM
>>> To: Mark Nelson; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>> Hi mark
>>>
>>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org 
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>> Sent: Tuesday, June 10, 2014 1:12 AM
>>> To: Shu, Xinxin; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> Hi Xinxin,
>>>
>>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>>> Hi sage ,
>>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>>
>>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>>
>>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>>
>>> Thanks,
>>> Mark
>>>
>>>>
>>>> -----Original Message-----
>>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>>> To: Shu, Xinxin; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>>> Hi, sage
>>>>>
>>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>>
>>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>>> To: Shu, Xinxin
>>>>> Cc: ceph-devel@vger.kernel.org
>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>
>>>>> Hi Xinxin,
>>>>>
>>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>>
>>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>>
>>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> Best Regards,
>
> Wheat
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html



--
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-07-01 15:15                               ` Sushma Gurram
@ 2014-07-01 17:02                                 ` Haomai Wang
  2014-07-01 23:49                                   ` Sushma Gurram
  0 siblings, 1 reply; 37+ messages in thread
From: Haomai Wang @ 2014-07-01 17:02 UTC (permalink / raw)
  To: Sushma Gurram
  Cc: Somnath Roy, Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian,
	ceph-devel

On Tue, Jul 1, 2014 at 11:15 PM, Sushma Gurram
<Sushma.Gurram@sandisk.com> wrote:
> Haomoi,
>
> Is there any write up on keyvalue store header cache and strip size? Based on what you stated, it appears that strip size improves performance with large object sizes. How would header cache impact 4KB object sizes?

Hmm, I think we need to throw your demand firstly. I don't think 4KB
object size is a good size for both FileStore and KeyValueStore. Even
if using 4KB object size, The main bottleneck for FileStore will be
"File", for KeyValueStore it may be more complex. I agree with
"header_lock" should be a problem.

> We'd like to guesstimate the improvement due to strip size and header cache. I'm not sure about header cache implementation yet, but fdcache had serialization issues and there was a sharded fdcache to address this (under review, I guess).

Yes, fdcache has many problems not only concurrent operations but also
the large size problem. So I introduce RandomCache to avoid it.

>
> I believe the header_lock serialization exists in all ceph branches so far, including the master.

Yes, I don't query the "header_lock". My question is that whether your
estimate branch has DBObjectMap header cache, if enable header cache,
is the "header_lock" still be a awful point? Same as KeyValueStore, I
will try to see too.


>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@gmail.com]
> Sent: Tuesday, July 01, 2014 1:06 AM
> To: Somnath Roy
> Cc: Sushma Gurram; Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> Hi,
>
> I don't know why OSD capacity can be PB level. Actually, most of use case should be serval TBs(1-4TB). As for cache hit, it totally depend on the IO characteristic. In my opinion, header cache in KeyValueStore can meet hit cache mostly if config object size and strip
> size(KeyValueStore) properly.
>
> But I'm also interested in your lock comments, what ceph version do you estimate with serialization issue?
>
> On Tue, Jul 1, 2014 at 3:13 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Hi Haomai,
>> But, the cache hit will be very minimal or null, if the actual storage per node is very huge (say in the PB level). So, it will be mostly hitting Omap, isn't it ?
>> How this header cache is going to resolve this serialization issue then ?
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
>> Sent: Monday, June 30, 2014 11:10 PM
>> To: Sushma Gurram
>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian;
>> ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Sushma,
>>
>> Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.
>>
>> As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!
>>
>> On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>> Hi Haomai/Greg,
>>>
>>> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
>>> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() ->
>>> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr()
>>> ->
>>> KeyValueStore::getattr() -> GenericObjectMap::get_values() ->
>>> GenericObjectMap::lookup_header()
>>>
>>> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>>>
>>> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>>>
>>> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>>>
>>> Thanks,
>>> Sushma
>>>
>>> -----Original Message-----
>>> From: Haomai Wang [mailto:haomaiwang@gmail.com]
>>> Sent: Friday, June 27, 2014 1:08 AM
>>> To: Sushma Gurram
>>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian;
>>> ceph-devel@vger.kernel.org
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> As I mentioned days ago:
>>>
>>> There exists two points related kvstore perf:
>>> 1. The order of image and the strip
>>> size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
>>> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>>>
>>> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>>>
>>> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>>> Delivery failure due to table format. Resending as plain text.
>>>>
>>>> _____________________________________________
>>>> From: Sushma Gurram
>>>> Sent: Thursday, June 26, 2014 5:35 PM
>>>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>>>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>>
>>>> Hi Xinxin,
>>>>
>>>> Thanks for providing the results of the performance tests.
>>>>
>>>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>>>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>>>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>>>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>>>
>>>> We ran our tests with following configuration:
>>>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical
>>>> cores), HT disabled, 16 GB memory
>>>>
>>>> rocksdb configuration has been set to the following values in ceph.conf.
>>>>         rocksdb_write_buffer_size = 4194304
>>>>         rocksdb_cache_size = 4194304
>>>>         rocksdb_bloom_size = 0
>>>>         rocksdb_max_open_files = 10240
>>>>         rocksdb_compression = false
>>>>         rocksdb_paranoid = false
>>>>         rocksdb_log = /dev/null
>>>>         rocksdb_compact_on_mount = false
>>>>
>>>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>>>
>>>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>>>> -------------------------------------------------------------------
>>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>>> 4K writes       ~1450           ~670
>>>> 4K reads        ~65000          ~2000
>>>> 64K writes      ~431            ~57
>>>> 64K reads       ~17500          ~180
>>>>
>>>>
>>>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>>>> -------------------------------------------------------------------
>>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>>> 4K writes       ~1450           ~962
>>>> 4K reads        ~65000          ~1641
>>>> 64K writes      ~431            ~426
>>>> 64K reads       ~17500          ~209
>>>>
>>>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>>>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>>>
>>>> Thanks,
>>>> Sushma
>>>>
>>>> -----Original Message-----
>>>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>>>> Sent: Sunday, June 22, 2014 6:18 PM
>>>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>>>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>>
>>>> Hi all,
>>>>
>>>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>>>
>>>>                                                   xfs             leveldb               rocksdb
>>>>                           throughtput   latency     throughtput latency    throughtput  latency
>>>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>>>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>>>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>>>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>>>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>>>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>>>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>>>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>>>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>>>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>>>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>>>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>>>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>>>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>>>
>>>> -----Original Message-----
>>>> From: Shu, Xinxin
>>>> Sent: Saturday, June 14, 2014 11:50 AM
>>>> To: Sushma Gurram; Mark Nelson; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
>>>> Sent: Saturday, June 14, 2014 2:52 AM
>>>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>> Hi Xinxin,
>>>>
>>>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>>>> It doesn't seem to have any other source files and compilation fails:
>>>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>>>
>>>> Thanks,
>>>> Sushma
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>>>> Sent: Monday, June 09, 2014 10:00 PM
>>>> To: Mark Nelson; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>> Hi mark
>>>>
>>>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>> Sent: Tuesday, June 10, 2014 1:12 AM
>>>> To: Shu, Xinxin; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> Hi Xinxin,
>>>>
>>>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>>>> Hi sage ,
>>>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>>>
>>>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>>>
>>>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>>>> To: Shu, Xinxin; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>
>>>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>>>> Hi, sage
>>>>>>
>>>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>>>
>>>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>>>> To: Shu, Xinxin
>>>>>> Cc: ceph-devel@vger.kernel.org
>>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>>
>>>>>> Hi Xinxin,
>>>>>>
>>>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>>>
>>>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>>>
>>>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>>>
>>>>>> Thanks!
>>>>>> sage
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> ________________________________
>>>>
>>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-07-01 17:02                                 ` Haomai Wang
@ 2014-07-01 23:49                                   ` Sushma Gurram
  2014-07-02 12:56                                     ` Haomai Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Sushma Gurram @ 2014-07-01 23:49 UTC (permalink / raw)
  To: Haomai Wang
  Cc: Somnath Roy, Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian,
	ceph-devel

Hi Haomai,

We understand 4KB object size is not typical, but this would help measure IOPs and uncover any serialization bottlenecks. I also tried with 64KB and 4MB, but the 10Gbps network was the limiting factor - which would hide the serialization issues.

I merged your header cache pull request and it appears that as long as the #objects in OSD is less (say 500), performance is comparable to FileStore. The moment more objects are written, the header cache doesn't seem to help and performance drops again - probably due to compaction/merging. 

Thanks,
Sushma

-----Original Message-----
From: Haomai Wang [mailto:haomaiwang@gmail.com] 
Sent: Tuesday, July 01, 2014 10:03 AM
To: Sushma Gurram
Cc: Somnath Roy; Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

On Tue, Jul 1, 2014 at 11:15 PM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
> Haomoi,
>
> Is there any write up on keyvalue store header cache and strip size? Based on what you stated, it appears that strip size improves performance with large object sizes. How would header cache impact 4KB object sizes?

Hmm, I think we need to throw your demand firstly. I don't think 4KB object size is a good size for both FileStore and KeyValueStore. Even if using 4KB object size, The main bottleneck for FileStore will be "File", for KeyValueStore it may be more complex. I agree with "header_lock" should be a problem.

> We'd like to guesstimate the improvement due to strip size and header cache. I'm not sure about header cache implementation yet, but fdcache had serialization issues and there was a sharded fdcache to address this (under review, I guess).

Yes, fdcache has many problems not only concurrent operations but also the large size problem. So I introduce RandomCache to avoid it.

>
> I believe the header_lock serialization exists in all ceph branches so far, including the master.

Yes, I don't query the "header_lock". My question is that whether your estimate branch has DBObjectMap header cache, if enable header cache, is the "header_lock" still be a awful point? Same as KeyValueStore, I will try to see too.


>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@gmail.com]
> Sent: Tuesday, July 01, 2014 1:06 AM
> To: Somnath Roy
> Cc: Sushma Gurram; Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
> ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> Hi,
>
> I don't know why OSD capacity can be PB level. Actually, most of use 
> case should be serval TBs(1-4TB). As for cache hit, it totally depend 
> on the IO characteristic. In my opinion, header cache in KeyValueStore 
> can meet hit cache mostly if config object size and strip
> size(KeyValueStore) properly.
>
> But I'm also interested in your lock comments, what ceph version do you estimate with serialization issue?
>
> On Tue, Jul 1, 2014 at 3:13 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Hi Haomai,
>> But, the cache hit will be very minimal or null, if the actual storage per node is very huge (say in the PB level). So, it will be mostly hitting Omap, isn't it ?
>> How this header cache is going to resolve this serialization issue then ?
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
>> Sent: Monday, June 30, 2014 11:10 PM
>> To: Sushma Gurram
>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
>> ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Sushma,
>>
>> Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.
>>
>> As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!
>>
>> On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>> Hi Haomai/Greg,
>>>
>>> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
>>> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() ->
>>> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr()
>>> ->
>>> KeyValueStore::getattr() -> GenericObjectMap::get_values() ->
>>> GenericObjectMap::lookup_header()
>>>
>>> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>>>
>>> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>>>
>>> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>>>
>>> Thanks,
>>> Sushma
>>>
>>> -----Original Message-----
>>> From: Haomai Wang [mailto:haomaiwang@gmail.com]
>>> Sent: Friday, June 27, 2014 1:08 AM
>>> To: Sushma Gurram
>>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
>>> ceph-devel@vger.kernel.org
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> As I mentioned days ago:
>>>
>>> There exists two points related kvstore perf:
>>> 1. The order of image and the strip
>>> size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
>>> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>>>
>>> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>>>
>>> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>>> Delivery failure due to table format. Resending as plain text.
>>>>
>>>> _____________________________________________
>>>> From: Sushma Gurram
>>>> Sent: Thursday, June 26, 2014 5:35 PM
>>>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>>>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>>
>>>> Hi Xinxin,
>>>>
>>>> Thanks for providing the results of the performance tests.
>>>>
>>>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>>>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>>>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>>>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>>>
>>>> We ran our tests with following configuration:
>>>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical 
>>>> cores), HT disabled, 16 GB memory
>>>>
>>>> rocksdb configuration has been set to the following values in ceph.conf.
>>>>         rocksdb_write_buffer_size = 4194304
>>>>         rocksdb_cache_size = 4194304
>>>>         rocksdb_bloom_size = 0
>>>>         rocksdb_max_open_files = 10240
>>>>         rocksdb_compression = false
>>>>         rocksdb_paranoid = false
>>>>         rocksdb_log = /dev/null
>>>>         rocksdb_compact_on_mount = false
>>>>
>>>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>>>
>>>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>>>> -------------------------------------------------------------------
>>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>>> 4K writes       ~1450           ~670
>>>> 4K reads        ~65000          ~2000
>>>> 64K writes      ~431            ~57
>>>> 64K reads       ~17500          ~180
>>>>
>>>>
>>>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>>>> -------------------------------------------------------------------
>>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>>> 4K writes       ~1450           ~962
>>>> 4K reads        ~65000          ~1641
>>>> 64K writes      ~431            ~426
>>>> 64K reads       ~17500          ~209
>>>>
>>>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>>>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>>>
>>>> Thanks,
>>>> Sushma
>>>>
>>>> -----Original Message-----
>>>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>>>> Sent: Sunday, June 22, 2014 6:18 PM
>>>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>>>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>>
>>>> Hi all,
>>>>
>>>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>>>
>>>>                                                   xfs             leveldb               rocksdb
>>>>                           throughtput   latency     throughtput latency    throughtput  latency
>>>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>>>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>>>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>>>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>>>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>>>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>>>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>>>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>>>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>>>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>>>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>>>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>>>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>>>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>>>
>>>> -----Original Message-----
>>>> From: Shu, Xinxin
>>>> Sent: Saturday, June 14, 2014 11:50 AM
>>>> To: Sushma Gurram; Mark Nelson; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org 
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma 
>>>> Gurram
>>>> Sent: Saturday, June 14, 2014 2:52 AM
>>>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>> Hi Xinxin,
>>>>
>>>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>>>> It doesn't seem to have any other source files and compilation fails:
>>>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>>>
>>>> Thanks,
>>>> Sushma
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org 
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>>>> Sent: Monday, June 09, 2014 10:00 PM
>>>> To: Mark Nelson; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: RE: [RFC] add rocksdb support
>>>>
>>>> Hi mark
>>>>
>>>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org 
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>> Sent: Tuesday, June 10, 2014 1:12 AM
>>>> To: Shu, Xinxin; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> Hi Xinxin,
>>>>
>>>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>>>> Hi sage ,
>>>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>>>
>>>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>>>
>>>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>>>> To: Shu, Xinxin; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>
>>>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>>>> Hi, sage
>>>>>>
>>>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>>>
>>>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>>>> To: Shu, Xinxin
>>>>>> Cc: ceph-devel@vger.kernel.org
>>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>>
>>>>>> Hi Xinxin,
>>>>>>
>>>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>>>
>>>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>>>
>>>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>>>
>>>>>> Thanks!
>>>>>> sage
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> ________________________________
>>>>
>>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best Regards,
>
> Wheat



--
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-07-01  6:10                         ` Haomai Wang
  2014-07-01  7:13                           ` Somnath Roy
@ 2014-07-02  7:23                           ` Shu, Xinxin
  2014-07-02 13:07                             ` Haomai Wang
  1 sibling, 1 reply; 37+ messages in thread
From: Shu, Xinxin @ 2014-07-02  7:23 UTC (permalink / raw)
  To: 'Haomai Wang', Sushma Gurram
  Cc: Mark Nelson, Sage Weil, Zhang, Jian, ceph-devel

hi haomai,

 I took a look at your keyvaluestore cache patch, you removed exclusive lock on genericobjectmap , in your commit message , you says caller should be maintain be responsible for maintain the exclusive header , what did 'caller' mean ,  in my opnion , the caller should be keyvaluestore op threads,  but I did not see any serializing code , since there could be a number of threads that manipulate key-value db concurrently , if we just removed exclusive lock , there maybe some unsafe scenarios . I am not sure whether my understanding is right ?  if my understanding is right , I think RWlock or a fine-grain lock is a good suggestion.

-----Original Message-----
From: Haomai Wang [mailto:haomaiwang@gmail.com] 
Sent: Tuesday, July 01, 2014 2:10 PM
To: Sushma Gurram
Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

Hi Sushma,

Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.

As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!

On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
> Hi Haomai/Greg,
>
> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() -> 
> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr() -> 
> KeyValueStore::getattr() -> GenericObjectMap::get_values() -> 
> GenericObjectMap::lookup_header()
>
> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>
> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>
> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@gmail.com]
> Sent: Friday, June 27, 2014 1:08 AM
> To: Sushma Gurram
> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
> ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> As I mentioned days ago:
>
> There exists two points related kvstore perf:
> 1. The order of image and the strip
> size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>
> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>
> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>> Delivery failure due to table format. Resending as plain text.
>>
>> _____________________________________________
>> From: Sushma Gurram
>> Sent: Thursday, June 26, 2014 5:35 PM
>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>> Subject: RE: [RFC] add rocksdb support
>>
>>
>> Hi Xinxin,
>>
>> Thanks for providing the results of the performance tests.
>>
>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>
>> We ran our tests with following configuration:
>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical 
>> cores), HT disabled, 16 GB memory
>>
>> rocksdb configuration has been set to the following values in ceph.conf.
>>         rocksdb_write_buffer_size = 4194304
>>         rocksdb_cache_size = 4194304
>>         rocksdb_bloom_size = 0
>>         rocksdb_max_open_files = 10240
>>         rocksdb_compression = false
>>         rocksdb_paranoid = false
>>         rocksdb_log = /dev/null
>>         rocksdb_compact_on_mount = false
>>
>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>
>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>> -------------------------------------------------------------------
>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>> 4K writes       ~1450           ~670
>> 4K reads        ~65000          ~2000
>> 64K writes      ~431            ~57
>> 64K reads       ~17500          ~180
>>
>>
>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>> -------------------------------------------------------------------
>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>> 4K writes       ~1450           ~962
>> 4K reads        ~65000          ~1641
>> 64K writes      ~431            ~426
>> 64K reads       ~17500          ~209
>>
>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>> Sent: Sunday, June 22, 2014 6:18 PM
>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>>
>> Hi all,
>>
>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>
>>                                                   xfs             leveldb               rocksdb
>>                           throughtput   latency     throughtput latency    throughtput  latency
>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>
>> -----Original Message-----
>> From: Shu, Xinxin
>> Sent: Saturday, June 14, 2014 11:50 AM
>> To: Sushma Gurram; Mark Nelson; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
>> Sent: Saturday, June 14, 2014 2:52 AM
>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>> It doesn't seem to have any other source files and compilation fails:
>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>> Sent: Monday, June 09, 2014 10:00 PM
>> To: Mark Nelson; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: RE: [RFC] add rocksdb support
>>
>> Hi mark
>>
>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>> Sent: Tuesday, June 10, 2014 1:12 AM
>> To: Shu, Xinxin; Sage Weil
>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi Xinxin,
>>
>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>> Hi sage ,
>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>
>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>
>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>
>> Thanks,
>> Mark
>>
>>>
>>> -----Original Message-----
>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>> To: Shu, Xinxin; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>> Hi, sage
>>>>
>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>
>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>
>>>>
>>>> -----Original Message-----
>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>> To: Shu, Xinxin
>>>> Cc: ceph-devel@vger.kernel.org
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> Hi Xinxin,
>>>>
>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>
>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>
>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>
>>>> Thanks!
>>>> sage
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best Regards,
>
> Wheat



--
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-07-01 23:49                                   ` Sushma Gurram
@ 2014-07-02 12:56                                     ` Haomai Wang
  2014-07-02 19:01                                       ` Sushma Gurram
  0 siblings, 1 reply; 37+ messages in thread
From: Haomai Wang @ 2014-07-02 12:56 UTC (permalink / raw)
  To: Sushma Gurram
  Cc: Somnath Roy, Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian,
	ceph-devel

On Wed, Jul 2, 2014 at 7:49 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
> Hi Haomai,
>
> We understand 4KB object size is not typical, but this would help measure IOPs and uncover any serialization bottlenecks. I also tried with 64KB and 4MB, but the 10Gbps network was the limiting factor - which would hide the serialization issues.
>
> I merged your header cache pull request and it appears that as long as the #objects in OSD is less (say 500), performance is comparable to FileStore. The moment more objects are written, the header cache doesn't seem to help and performance drops again - probably due to compaction/merging.

Could you give your test program or strategy?

Yes, you could evaluate your active object number in one OSD. If
possible, your can increase "keyvaluestore_header_cache_size", I
mostly set it to 204800 which can fit most of active data. Your
opinion about test parallel performance is right in perf test case, in
my dev test, I usually like to do large data set perf test. So I'd
like to say in large data set, KeyValueStore should perform better in
the same cache memory size(header cache is more lightweight and
effective than fdcache because of cache implementation).


>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@gmail.com]
> Sent: Tuesday, July 01, 2014 10:03 AM
> To: Sushma Gurram
> Cc: Somnath Roy; Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> On Tue, Jul 1, 2014 at 11:15 PM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>> Haomoi,
>>
>> Is there any write up on keyvalue store header cache and strip size? Based on what you stated, it appears that strip size improves performance with large object sizes. How would header cache impact 4KB object sizes?
>
> Hmm, I think we need to throw your demand firstly. I don't think 4KB object size is a good size for both FileStore and KeyValueStore. Even if using 4KB object size, The main bottleneck for FileStore will be "File", for KeyValueStore it may be more complex. I agree with "header_lock" should be a problem.
>
>> We'd like to guesstimate the improvement due to strip size and header cache. I'm not sure about header cache implementation yet, but fdcache had serialization issues and there was a sharded fdcache to address this (under review, I guess).
>
> Yes, fdcache has many problems not only concurrent operations but also the large size problem. So I introduce RandomCache to avoid it.
>
>>
>> I believe the header_lock serialization exists in all ceph branches so far, including the master.
>
> Yes, I don't query the "header_lock". My question is that whether your estimate branch has DBObjectMap header cache, if enable header cache, is the "header_lock" still be a awful point? Same as KeyValueStore, I will try to see too.
>
>
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: Haomai Wang [mailto:haomaiwang@gmail.com]
>> Sent: Tuesday, July 01, 2014 1:06 AM
>> To: Somnath Roy
>> Cc: Sushma Gurram; Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian;
>> ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi,
>>
>> I don't know why OSD capacity can be PB level. Actually, most of use
>> case should be serval TBs(1-4TB). As for cache hit, it totally depend
>> on the IO characteristic. In my opinion, header cache in KeyValueStore
>> can meet hit cache mostly if config object size and strip
>> size(KeyValueStore) properly.
>>
>> But I'm also interested in your lock comments, what ceph version do you estimate with serialization issue?
>>
>> On Tue, Jul 1, 2014 at 3:13 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> Hi Haomai,
>>> But, the cache hit will be very minimal or null, if the actual storage per node is very huge (say in the PB level). So, it will be mostly hitting Omap, isn't it ?
>>> How this header cache is going to resolve this serialization issue then ?
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
>>> Sent: Monday, June 30, 2014 11:10 PM
>>> To: Sushma Gurram
>>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian;
>>> ceph-devel@vger.kernel.org
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> Hi Sushma,
>>>
>>> Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.
>>>
>>> As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!
>>>
>>> On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>>> Hi Haomai/Greg,
>>>>
>>>> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
>>>> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() ->
>>>> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr()
>>>> ->
>>>> KeyValueStore::getattr() -> GenericObjectMap::get_values() ->
>>>> GenericObjectMap::lookup_header()
>>>>
>>>> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>>>>
>>>> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>>>>
>>>> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>>>>
>>>> Thanks,
>>>> Sushma
>>>>
>>>> -----Original Message-----
>>>> From: Haomai Wang [mailto:haomaiwang@gmail.com]
>>>> Sent: Friday, June 27, 2014 1:08 AM
>>>> To: Sushma Gurram
>>>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian;
>>>> ceph-devel@vger.kernel.org
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> As I mentioned days ago:
>>>>
>>>> There exists two points related kvstore perf:
>>>> 1. The order of image and the strip
>>>> size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
>>>> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>>>>
>>>> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>>>>
>>>> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>>>> Delivery failure due to table format. Resending as plain text.
>>>>>
>>>>> _____________________________________________
>>>>> From: Sushma Gurram
>>>>> Sent: Thursday, June 26, 2014 5:35 PM
>>>>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>>>>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>>
>>>>> Hi Xinxin,
>>>>>
>>>>> Thanks for providing the results of the performance tests.
>>>>>
>>>>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>>>>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>>>>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>>>>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>>>>
>>>>> We ran our tests with following configuration:
>>>>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical
>>>>> cores), HT disabled, 16 GB memory
>>>>>
>>>>> rocksdb configuration has been set to the following values in ceph.conf.
>>>>>         rocksdb_write_buffer_size = 4194304
>>>>>         rocksdb_cache_size = 4194304
>>>>>         rocksdb_bloom_size = 0
>>>>>         rocksdb_max_open_files = 10240
>>>>>         rocksdb_compression = false
>>>>>         rocksdb_paranoid = false
>>>>>         rocksdb_log = /dev/null
>>>>>         rocksdb_compact_on_mount = false
>>>>>
>>>>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>>>>
>>>>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>>>>> -------------------------------------------------------------------
>>>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>>>> 4K writes       ~1450           ~670
>>>>> 4K reads        ~65000          ~2000
>>>>> 64K writes      ~431            ~57
>>>>> 64K reads       ~17500          ~180
>>>>>
>>>>>
>>>>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>>>>> -------------------------------------------------------------------
>>>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>>>> 4K writes       ~1450           ~962
>>>>> 4K reads        ~65000          ~1641
>>>>> 64K writes      ~431            ~426
>>>>> 64K reads       ~17500          ~209
>>>>>
>>>>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>>>>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>>>>
>>>>> Thanks,
>>>>> Sushma
>>>>>
>>>>> -----Original Message-----
>>>>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>>>>> Sent: Sunday, June 22, 2014 6:18 PM
>>>>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>>>>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>>>>
>>>>>                                                   xfs             leveldb               rocksdb
>>>>>                           throughtput   latency     throughtput latency    throughtput  latency
>>>>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>>>>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>>>>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>>>>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>>>>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>>>>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>>>>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>>>>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>>>>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>>>>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>>>>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>>>>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>>>>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>>>>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>>>>
>>>>> -----Original Message-----
>>>>> From: Shu, Xinxin
>>>>> Sent: Saturday, June 14, 2014 11:50 AM
>>>>> To: Sushma Gurram; Mark Nelson; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma
>>>>> Gurram
>>>>> Sent: Saturday, June 14, 2014 2:52 AM
>>>>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>> Hi Xinxin,
>>>>>
>>>>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>>>>> It doesn't seem to have any other source files and compilation fails:
>>>>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>>>>
>>>>> Thanks,
>>>>> Sushma
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>>>>> Sent: Monday, June 09, 2014 10:00 PM
>>>>> To: Mark Nelson; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>> Hi mark
>>>>>
>>>>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>>> Sent: Tuesday, June 10, 2014 1:12 AM
>>>>> To: Shu, Xinxin; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>
>>>>> Hi Xinxin,
>>>>>
>>>>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>>>>> Hi sage ,
>>>>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>>>>
>>>>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>>>>
>>>>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>>>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>>>>> To: Shu, Xinxin; Sage Weil
>>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>>
>>>>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>>>>> Hi, sage
>>>>>>>
>>>>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>>>>
>>>>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>>>>> To: Shu, Xinxin
>>>>>>> Cc: ceph-devel@vger.kernel.org
>>>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>>>
>>>>>>> Hi Xinxin,
>>>>>>>
>>>>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>>>>
>>>>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>>>>
>>>>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> ________________________________
>>>>>
>>>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>>
>>>> Wheat
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC] add rocksdb support
  2014-07-02  7:23                           ` Shu, Xinxin
@ 2014-07-02 13:07                             ` Haomai Wang
  0 siblings, 0 replies; 37+ messages in thread
From: Haomai Wang @ 2014-07-02 13:07 UTC (permalink / raw)
  To: Shu, Xinxin
  Cc: Sushma Gurram, Mark Nelson, Sage Weil, Zhang, Jian, ceph-devel

At first we need to agree with some
statements(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h#L242).

My remove lock is used to avoid concurrent operations on the same
header before. Now KeyValueStore will use header as argument to access
GenericObjectMap and KeyValueStore will avoid header concurrent ops
via pg lock(Sequence). What Sushma mentioned is "header_lock", it
still exists used to protect "generate new header".

What we can do next is relacing pg lock to fine-grain lock(object
level) and reduce "header_lock" usage in KeyValueStore case.

For FileStore, that's what I stressed we need to consider
refactor/improve lock usage.

On Wed, Jul 2, 2014 at 3:23 PM, Shu, Xinxin <xinxin.shu@intel.com> wrote:
> hi haomai,
>
>  I took a look at your keyvaluestore cache patch, you removed exclusive lock on genericobjectmap , in your commit message , you says caller should be maintain be responsible for maintain the exclusive header , what did 'caller' mean ,  in my opnion , the caller should be keyvaluestore op threads,  but I did not see any serializing code , since there could be a number of threads that manipulate key-value db concurrently , if we just removed exclusive lock , there maybe some unsafe scenarios . I am not sure whether my understanding is right ?  if my understanding is right , I think RWlock or a fine-grain lock is a good suggestion.
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@gmail.com]
> Sent: Tuesday, July 01, 2014 2:10 PM
> To: Sushma Gurram
> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> Hi Sushma,
>
> Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.
>
> As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!
>
> On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>> Hi Haomai/Greg,
>>
>> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
>> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() ->
>> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr() ->
>> KeyValueStore::getattr() -> GenericObjectMap::get_values() ->
>> GenericObjectMap::lookup_header()
>>
>> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>>
>> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>>
>> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: Haomai Wang [mailto:haomaiwang@gmail.com]
>> Sent: Friday, June 27, 2014 1:08 AM
>> To: Sushma Gurram
>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian;
>> ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> As I mentioned days ago:
>>
>> There exists two points related kvstore perf:
>> 1. The order of image and the strip
>> size are important to performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
>> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>>
>> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>>
>> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>> Delivery failure due to table format. Resending as plain text.
>>>
>>> _____________________________________________
>>> From: Sushma Gurram
>>> Sent: Thursday, June 26, 2014 5:35 PM
>>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>>
>>> Hi Xinxin,
>>>
>>> Thanks for providing the results of the performance tests.
>>>
>>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>>
>>> We ran our tests with following configuration:
>>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical
>>> cores), HT disabled, 16 GB memory
>>>
>>> rocksdb configuration has been set to the following values in ceph.conf.
>>>         rocksdb_write_buffer_size = 4194304
>>>         rocksdb_cache_size = 4194304
>>>         rocksdb_bloom_size = 0
>>>         rocksdb_max_open_files = 10240
>>>         rocksdb_compression = false
>>>         rocksdb_paranoid = false
>>>         rocksdb_log = /dev/null
>>>         rocksdb_compact_on_mount = false
>>>
>>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>>
>>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>>> -------------------------------------------------------------------
>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>> 4K writes       ~1450           ~670
>>> 4K reads        ~65000          ~2000
>>> 64K writes      ~431            ~57
>>> 64K reads       ~17500          ~180
>>>
>>>
>>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>>> -------------------------------------------------------------------
>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>> 4K writes       ~1450           ~962
>>> 4K reads        ~65000          ~1641
>>> 64K writes      ~431            ~426
>>> 64K reads       ~17500          ~209
>>>
>>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>>
>>> Thanks,
>>> Sushma
>>>
>>> -----Original Message-----
>>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>>> Sent: Sunday, June 22, 2014 6:18 PM
>>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>>
>>> Hi all,
>>>
>>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>>
>>>                                                   xfs             leveldb               rocksdb
>>>                           throughtput   latency     throughtput latency    throughtput  latency
>>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>>
>>> -----Original Message-----
>>> From: Shu, Xinxin
>>> Sent: Saturday, June 14, 2014 11:50 AM
>>> To: Sushma Gurram; Mark Nelson; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma Gurram
>>> Sent: Saturday, June 14, 2014 2:52 AM
>>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>> Hi Xinxin,
>>>
>>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>>> It doesn't seem to have any other source files and compilation fails:
>>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>>
>>> Thanks,
>>> Sushma
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>>> Sent: Monday, June 09, 2014 10:00 PM
>>> To: Mark Nelson; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: RE: [RFC] add rocksdb support
>>>
>>> Hi mark
>>>
>>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>> Sent: Tuesday, June 10, 2014 1:12 AM
>>> To: Shu, Xinxin; Sage Weil
>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> Hi Xinxin,
>>>
>>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>>> Hi sage ,
>>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>>
>>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>>
>>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>>
>>> Thanks,
>>> Mark
>>>
>>>>
>>>> -----Original Message-----
>>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>>> To: Shu, Xinxin; Sage Weil
>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>>> Hi, sage
>>>>>
>>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>>
>>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>>> To: Shu, Xinxin
>>>>> Cc: ceph-devel@vger.kernel.org
>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>
>>>>> Hi Xinxin,
>>>>>
>>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>>
>>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>>
>>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>>
>>>>> Thanks!
>>>>> sage
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [RFC] add rocksdb support
  2014-07-02 12:56                                     ` Haomai Wang
@ 2014-07-02 19:01                                       ` Sushma Gurram
  0 siblings, 0 replies; 37+ messages in thread
From: Sushma Gurram @ 2014-07-02 19:01 UTC (permalink / raw)
  To: Haomai Wang
  Cc: Somnath Roy, Shu, Xinxin, Mark Nelson, Sage Weil, Zhang, Jian,
	ceph-devel

>>> Could you give your test program or strategy?

I write and read using rados bench as follows:
./rados -p data bench 60 write -t 32 -b 4096 --no-cleanup    (Running for 60 seconds creates ~42,000 objects = ~168 MB)
./rados -p data bench 200 rand -t 32 -b 4096

With keyvaluestore header cache size = 8192, # rados objects=~1000, READ IOPs=~12000
With keyvaluestore header cache size = 8192, # rados objects=~42,000,  READ IOPs = ~4500
With keyvaluestore header cache size = 204800, #rados objects=~42,000, READ IOPs = ~12000

Based on the above, it appears that "keyvaluestore_header_cache_size" should be approximately  (#objects * 8) so as not to hit the serialization lock. I'm not sure if this conclusion is right though.

I also use fio with rbd ioengine (specifically to test scaling with more client connections using "numjobs" fio parameter). I created a 2GB rbd image and write/read different workloads.
With numjobs=16, FileStore gives ~65,000 READ IOPs (with FileStore optimized branch) while KeyValueStore gives 12,000 READ IOPs (even after your suggestion of increasing "keyvaluestore_header_cache_size". Probably there is some other serialization bottleneck elsewhere in KeyValueStore path.

Thanks,
Sushma

-----Original Message-----
From: Haomai Wang [mailto:haomaiwang@gmail.com] 
Sent: Wednesday, July 02, 2014 5:56 AM
To: Sushma Gurram
Cc: Somnath Roy; Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; ceph-devel@vger.kernel.org
Subject: Re: [RFC] add rocksdb support

On Wed, Jul 2, 2014 at 7:49 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
> Hi Haomai,
>
> We understand 4KB object size is not typical, but this would help measure IOPs and uncover any serialization bottlenecks. I also tried with 64KB and 4MB, but the 10Gbps network was the limiting factor - which would hide the serialization issues.
>
> I merged your header cache pull request and it appears that as long as the #objects in OSD is less (say 500), performance is comparable to FileStore. The moment more objects are written, the header cache doesn't seem to help and performance drops again - probably due to compaction/merging.

Could you give your test program or strategy?

Yes, you could evaluate your active object number in one OSD. If possible, your can increase "keyvaluestore_header_cache_size", I mostly set it to 204800 which can fit most of active data. Your opinion about test parallel performance is right in perf test case, in my dev test, I usually like to do large data set perf test. So I'd like to say in large data set, KeyValueStore should perform better in the same cache memory size(header cache is more lightweight and effective than fdcache because of cache implementation).


>
> Thanks,
> Sushma
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@gmail.com]
> Sent: Tuesday, July 01, 2014 10:03 AM
> To: Sushma Gurram
> Cc: Somnath Roy; Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
> ceph-devel@vger.kernel.org
> Subject: Re: [RFC] add rocksdb support
>
> On Tue, Jul 1, 2014 at 11:15 PM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>> Haomoi,
>>
>> Is there any write up on keyvalue store header cache and strip size? Based on what you stated, it appears that strip size improves performance with large object sizes. How would header cache impact 4KB object sizes?
>
> Hmm, I think we need to throw your demand firstly. I don't think 4KB object size is a good size for both FileStore and KeyValueStore. Even if using 4KB object size, The main bottleneck for FileStore will be "File", for KeyValueStore it may be more complex. I agree with "header_lock" should be a problem.
>
>> We'd like to guesstimate the improvement due to strip size and header cache. I'm not sure about header cache implementation yet, but fdcache had serialization issues and there was a sharded fdcache to address this (under review, I guess).
>
> Yes, fdcache has many problems not only concurrent operations but also the large size problem. So I introduce RandomCache to avoid it.
>
>>
>> I believe the header_lock serialization exists in all ceph branches so far, including the master.
>
> Yes, I don't query the "header_lock". My question is that whether your estimate branch has DBObjectMap header cache, if enable header cache, is the "header_lock" still be a awful point? Same as KeyValueStore, I will try to see too.
>
>
>>
>> Thanks,
>> Sushma
>>
>> -----Original Message-----
>> From: Haomai Wang [mailto:haomaiwang@gmail.com]
>> Sent: Tuesday, July 01, 2014 1:06 AM
>> To: Somnath Roy
>> Cc: Sushma Gurram; Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
>> ceph-devel@vger.kernel.org
>> Subject: Re: [RFC] add rocksdb support
>>
>> Hi,
>>
>> I don't know why OSD capacity can be PB level. Actually, most of use 
>> case should be serval TBs(1-4TB). As for cache hit, it totally depend 
>> on the IO characteristic. In my opinion, header cache in 
>> KeyValueStore can meet hit cache mostly if config object size and 
>> strip
>> size(KeyValueStore) properly.
>>
>> But I'm also interested in your lock comments, what ceph version do you estimate with serialization issue?
>>
>> On Tue, Jul 1, 2014 at 3:13 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> Hi Haomai,
>>> But, the cache hit will be very minimal or null, if the actual storage per node is very huge (say in the PB level). So, it will be mostly hitting Omap, isn't it ?
>>> How this header cache is going to resolve this serialization issue then ?
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org 
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
>>> Sent: Monday, June 30, 2014 11:10 PM
>>> To: Sushma Gurram
>>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
>>> ceph-devel@vger.kernel.org
>>> Subject: Re: [RFC] add rocksdb support
>>>
>>> Hi Sushma,
>>>
>>> Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap.
>>>
>>> As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, replace with RWLock is a good suggestion, I would like to try to estimate!
>>>
>>> On Tue, Jul 1, 2014 at 8:39 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>>> Hi Haomai/Greg,
>>>>
>>>> I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore.
>>>> ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() ->
>>>> ReplicatedPG::get_object_context() -> PGBackend::objects_get_attr()
>>>> ->
>>>> KeyValueStore::getattr() -> GenericObjectMap::get_values() ->
>>>> GenericObjectMap::lookup_header()
>>>>
>>>> I fabricated the code to avoid this lock for a specific run and noticed that the performance is similar to FileStore.
>>>>
>>>> In our earlier investigations also we noticed similar serialization issues with DBObjectMap::header_lock when rgw stores xattrs in LevelDB.
>>>>
>>>> Can you please help understand the reason for this lock and whether it can be replaced with a RWLock or any other suggestions to avoid serialization due to this lock?
>>>>
>>>> Thanks,
>>>> Sushma
>>>>
>>>> -----Original Message-----
>>>> From: Haomai Wang [mailto:haomaiwang@gmail.com]
>>>> Sent: Friday, June 27, 2014 1:08 AM
>>>> To: Sushma Gurram
>>>> Cc: Shu, Xinxin; Mark Nelson; Sage Weil; Zhang, Jian; 
>>>> ceph-devel@vger.kernel.org
>>>> Subject: Re: [RFC] add rocksdb support
>>>>
>>>> As I mentioned days ago:
>>>>
>>>> There exists two points related kvstore perf:
>>>> 1. The order of image and the strip size are important to 
>>>> performance. Because the header like inode in fs is much lightweight than fd, so the order of image is expected to be lower. And strip size can be configurated to 4kb to improve large io performance.
>>>> 2. The header cache(https://github.com/ceph/ceph/pull/1649) is not merged, the header cache is important to perf. It's just like fdcahce in FileStore.
>>>>
>>>> As for detail perf number, I think this result based on master branch is nearly correct. When strip-size and header cache are ready, I think it will be better.
>>>>
>>>> On Fri, Jun 27, 2014 at 8:44 AM, Sushma Gurram <Sushma.Gurram@sandisk.com> wrote:
>>>>> Delivery failure due to table format. Resending as plain text.
>>>>>
>>>>> _____________________________________________
>>>>> From: Sushma Gurram
>>>>> Sent: Thursday, June 26, 2014 5:35 PM
>>>>> To: 'Shu, Xinxin'; 'Mark Nelson'; 'Sage Weil'
>>>>> Cc: 'Zhang, Jian'; ceph-devel@vger.kernel.org
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>>
>>>>> Hi Xinxin,
>>>>>
>>>>> Thanks for providing the results of the performance tests.
>>>>>
>>>>> I used fio (with support for rbd ioengine) to compare XFS and RockDB with a single OSD. Also confirmed with rados bench and both numbers seem to be of the same order.
>>>>> My findings show that XFS is better than rocksdb. Can you please let us know rocksdb configuration that you used, object size and duration of run for rados bench?
>>>>> For random writes tests, I see "rocksdb:bg0" thread as the top CPU consumer (%CPU of this thread is 50, while that of all other threads in the OSD is <10% utilized).
>>>>> Is there a ceph.conf config option to configure the background threads in rocksdb?
>>>>>
>>>>> We ran our tests with following configuration:
>>>>> System : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (16 physical 
>>>>> cores), HT disabled, 16 GB memory
>>>>>
>>>>> rocksdb configuration has been set to the following values in ceph.conf.
>>>>>         rocksdb_write_buffer_size = 4194304
>>>>>         rocksdb_cache_size = 4194304
>>>>>         rocksdb_bloom_size = 0
>>>>>         rocksdb_max_open_files = 10240
>>>>>         rocksdb_compression = false
>>>>>         rocksdb_paranoid = false
>>>>>         rocksdb_log = /dev/null
>>>>>         rocksdb_compact_on_mount = false
>>>>>
>>>>> fio rbd ioengine with numjobs=1 for writes and numjobs=16 for reads, iodepth=32. Unlike rados bench, fio rbd helps to create multiple (=numjobs) client connections to the OSD, thus stressing the OSD.
>>>>>
>>>>> rbd image size = 2 GB, rocksdb_write_buffer_size=4MB
>>>>> -------------------------------------------------------------------
>>>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>>>> 4K writes       ~1450           ~670
>>>>> 4K reads        ~65000          ~2000
>>>>> 64K writes      ~431            ~57
>>>>> 64K reads       ~17500          ~180
>>>>>
>>>>>
>>>>> rbd image size = 2 GB, rocksdb_write_buffer_size=1GB
>>>>> -------------------------------------------------------------------
>>>>> IO Pattern      XFS (IOPs)      Rocksdb (IOPs)
>>>>> 4K writes       ~1450           ~962
>>>>> 4K reads        ~65000          ~1641
>>>>> 64K writes      ~431            ~426
>>>>> 64K reads       ~17500          ~209
>>>>>
>>>>> I guess theoretically lower rocksdb performance can be attributed to compaction during writes and merging during reads, but I'm not sure if READs are lower by this magnitude.
>>>>> However, your results seem to show otherwise. Can you please help us with rockdb config and how the rados bench has been run?
>>>>>
>>>>> Thanks,
>>>>> Sushma
>>>>>
>>>>> -----Original Message-----
>>>>> From: Shu, Xinxin [mailto:xinxin.shu@intel.com]
>>>>> Sent: Sunday, June 22, 2014 6:18 PM
>>>>> To: Sushma Gurram; 'Mark Nelson'; 'Sage Weil'
>>>>> Cc: 'ceph-devel@vger.kernel.org'; Zhang, Jian
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>>  We enabled rocksdb as data store in our test setup (10 osds on two servers, each server has 5 HDDs as osd , 2 ssds as journal , Intel(R) Xeon(R) CPU E31280)  and have performance tests for xfs, leveldb and rocksdb (use rados bench as our test tool),  the following chart shows details, for write ,  with small number threads , leveldb performance is lower than the other two backends , from 16 threads point ,  rocksdb perform a little better than xfs and leveldb , leveldb and rocksdb perform much better than xfs with higher thread number.
>>>>>
>>>>>                                                   xfs             leveldb               rocksdb
>>>>>                           throughtput   latency     throughtput latency    throughtput  latency
>>>>> 1 thread write       84.029       0.048             52.430         0.076              71.920    0.056
>>>>> 2 threads write      166.417      0.048             97.917         0.082             155.148    0.052
>>>>> 4 threads write       304.099     0.052             156.094         0.102            270.461    0.059
>>>>> 8 threads write       323.047     0.099             221.370         0.144            339.455    0.094
>>>>> 16 threads write     295.040      0.216             272.032         0.235       348.849 0.183
>>>>> 32 threads write     324.467      0.394             290.072          0.441           338.103    0.378
>>>>> 64 threads write     313.713      0.812             293.261          0.871     324.603  0.787
>>>>> 1 thread read         75.687      0.053              71.629          0.056      72.526  0.055
>>>>> 2 threads read        182.329     0.044             151.683           0.053     153.125 0.052
>>>>> 4 threads read        320.785     0.050             307.180           0.052      312.016        0.051
>>>>> 8 threads read         504.880    0.063             512.295           0.062      519.683        0.062
>>>>> 16 threads read       477.706     0.134             643.385           0.099      654.149        0.098
>>>>> 32 threads read       517.670     0.247              666.696          0.192      678.480        0.189
>>>>> 64 threads read       516.599     0.495              668.360           0.383      680.673       0.376
>>>>>
>>>>> -----Original Message-----
>>>>> From: Shu, Xinxin
>>>>> Sent: Saturday, June 14, 2014 11:50 AM
>>>>> To: Sushma Gurram; Mark Nelson; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>> Currently ceph will get stable rocksdb from branch 3.0.fb of  ceph/rocksdb  , since PR https://github.com/ceph/rocksdb/pull/2 has not been merged ,  so if you use 'git submodule update --init' to get rocksdb submodule , It did not support autoconf/automake .
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org 
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sushma 
>>>>> Gurram
>>>>> Sent: Saturday, June 14, 2014 2:52 AM
>>>>> To: Shu, Xinxin; Mark Nelson; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>> Hi Xinxin,
>>>>>
>>>>> I tried to compile the wip-rocksdb branch, but the src/rocksdb directory seems to be empty. Do I need toput autoconf/automake in this directory?
>>>>> It doesn't seem to have any other source files and compilation fails:
>>>>> os/RocksDBStore.cc:10:24: fatal error: rocksdb/db.h: No such file or directory compilation terminated.
>>>>>
>>>>> Thanks,
>>>>> Sushma
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org 
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Shu, Xinxin
>>>>> Sent: Monday, June 09, 2014 10:00 PM
>>>>> To: Mark Nelson; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: RE: [RFC] add rocksdb support
>>>>>
>>>>> Hi mark
>>>>>
>>>>> I have finished development of support of rocksdb submodule,  a pull request for support of autoconf/automake for rocksdb has been created , you can find https://github.com/ceph/rocksdb/pull/2 , if this patch is ok ,  I will create a pull request for rocksdb submodule support , currently this patch can be found https://github.com/xinxinsh/ceph/tree/wip-rocksdb .
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org 
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>>> Sent: Tuesday, June 10, 2014 1:12 AM
>>>>> To: Shu, Xinxin; Sage Weil
>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>
>>>>> Hi Xinxin,
>>>>>
>>>>> On 05/28/2014 05:05 AM, Shu, Xinxin wrote:
>>>>>> Hi sage ,
>>>>>> I will add two configure options to --with-librocksdb-static and --with-librocksdb , with --with-librocksdb-static option , ceph will compile the code that get from ceph repository , with  --with-librocksdb option ,  in case of distro packages for rocksdb , ceph will not compile the rocksdb code , will use pre-installed library. is that ok for you ?
>>>>>>
>>>>>> since current rocksdb does not support autoconf&automake , I will add autoconf&automake support for rocksdb , but before that , i think we should fork a stable branch (maybe 3.0) for ceph .
>>>>>
>>>>> I'm looking at testing out the rocksdb support as well, both for the OSD and for the monitor based on some issues we've been seeing lately.  Any news on the 3.0 fork and autoconf/automake support in rocksdb?
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>>>>> Sent: Wednesday, May 21, 2014 9:06 PM
>>>>>> To: Shu, Xinxin; Sage Weil
>>>>>> Cc: ceph-devel@vger.kernel.org; Zhang, Jian
>>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>>
>>>>>> On 05/21/2014 07:54 AM, Shu, Xinxin wrote:
>>>>>>> Hi, sage
>>>>>>>
>>>>>>> I will add rocksdb submodule into the makefile , currently we want to have fully performance tests on key-value db backend , both leveldb and rocksdb. Then optimize on rocksdb performance.
>>>>>>
>>>>>> I'm definitely interested in any performance tests you do here.  Last winter I started doing some fairly high level tests on raw leveldb/hyperleveldb/raikleveldb.  I'm very interested in what you see with rocksdb as a backend.
>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Sage Weil [mailto:sage@inktank.com]
>>>>>>> Sent: Wednesday, May 21, 2014 9:19 AM
>>>>>>> To: Shu, Xinxin
>>>>>>> Cc: ceph-devel@vger.kernel.org
>>>>>>> Subject: Re: [RFC] add rocksdb support
>>>>>>>
>>>>>>> Hi Xinxin,
>>>>>>>
>>>>>>> I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that includes the latest set of patches with the groundwork and your rocksdb patch.  There is also a commit that adds rocksdb as a git submodule.  I'm thinking that, since there aren't any distro packages for rocksdb at this point, this is going to be the easiest way to make this usable for people.
>>>>>>>
>>>>>>> If you can wire the submodule into the makefile, we can merge this in so that rocksdb support is in the ceph.com packages on ceph.com.  I suspect that the distros will prefer to turns this off in favor of separate shared libs, but they can do this at their option if/when they include rocksdb in the distro. I think the key is just to have both --with-librockdb and --with-librocksdb-static (or similar) options so that you can either use the static or dynamically linked one.
>>>>>>>
>>>>>>> Has your group done further testing with rocksdb?  Anything interesting to share?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> ________________________________
>>>>>
>>>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>>
>>>> Wheat
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> Best Regards,
>
> Wheat



--
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2014-07-02 19:16 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-03  2:07 [RFC] add rocksdb support Shu, Xinxin
2014-03-03 13:37 ` Mark Nelson
2014-03-04  4:48 ` Alexandre DERUMIER
2014-03-04  8:41   ` Shu, Xinxin
2014-03-05  8:23     ` Alexandre DERUMIER
2014-03-05  8:30       ` Shu, Xinxin
2014-03-05  8:31       ` Haomai Wang
2014-03-05  9:19         ` Andreas Joachim Peters
2014-03-06  9:18           ` Shu, Xinxin
2014-05-21  1:19 ` Sage Weil
2014-05-21 12:54   ` Shu, Xinxin
2014-05-21 13:06     ` Mark Nelson
2014-05-28 10:05       ` Shu, Xinxin
2014-06-03 20:01         ` Sage Weil
2014-06-09 17:11         ` Mark Nelson
2014-06-10  4:59           ` Shu, Xinxin
2014-06-13 18:51             ` Sushma Gurram
2014-06-14  0:49               ` David Zafman
2014-06-14  3:49               ` Shu, Xinxin
2014-06-23  1:18                 ` Shu, Xinxin
2014-06-27  0:44                   ` Sushma Gurram
2014-06-27  3:33                     ` Alexandre DERUMIER
2014-06-27 17:36                       ` Sushma Gurram
2014-06-27  8:08                     ` Haomai Wang
2014-07-01  0:39                       ` Sushma Gurram
2014-07-01  6:10                         ` Haomai Wang
2014-07-01  7:13                           ` Somnath Roy
2014-07-01  8:05                             ` Haomai Wang
2014-07-01 15:15                               ` Sushma Gurram
2014-07-01 17:02                                 ` Haomai Wang
2014-07-01 23:49                                   ` Sushma Gurram
2014-07-02 12:56                                     ` Haomai Wang
2014-07-02 19:01                                       ` Sushma Gurram
2014-07-01 15:11                             ` Sage Weil
2014-07-02  7:23                           ` Shu, Xinxin
2014-07-02 13:07                             ` Haomai Wang
2014-06-23  7:32                 ` Dan van der Ster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.