All of lore.kernel.org
 help / color / mirror / Atom feed
* First attempt at rocksdb monitor store stress testing
@ 2014-07-23 23:14 Mark Nelson
  2014-07-24  5:08 ` Shu, Xinxin
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Nelson @ 2014-07-23 23:14 UTC (permalink / raw)
  To: ceph-devel

Hi Guys,

So I've been interested lately in leveldb 99th percentile latency (and 
the amount of write amplification we are seeing) with leveldb. Joao 
mentioned he has written a tool called mon-store-stress in 
wip-leveldb-misc to try to provide a means to roughly guess at what's 
happening on the mons under heavy load.  I cherry-picked it over to 
wip-rocksdb and after a couple of hacks was able to get everything built 
and running with some basic tests.  There was little tuning done and I 
don't know how realistic this workload is, so don't assume this means 
anything yet, but some initial results are here:

http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf

Command that was used to run the tests:

./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
--write-min-size 50K --write-max-size 2M --percent-write 70 
--percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000 foo

The most interesting bit right now is that rocksdb seems to be hanging 
in the middle of the test (left it running for several hours).  CPU 
usage on one core was quite high during the hang.  Profiling using perf 
with dwarf symbols I see:

-  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned int 
rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned int, 
char const*, unsigned long)
    - unsigned int 
rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned int, 
char const*, unsigned long)
         51.70% rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*, 
rocksdb::Footer const&, rocksdb::ReadOptions const&, 
rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*, bool)
         48.30% 
rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, 
rocksdb::CompressionType, rocksdb::BlockHandle*)

Not sure what's going on yet, may need to try to enable 
logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)

Mark

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-23 23:14 First attempt at rocksdb monitor store stress testing Mark Nelson
@ 2014-07-24  5:08 ` Shu, Xinxin
  2014-07-24 11:13   ` Mark Nelson
  0 siblings, 1 reply; 22+ messages in thread
From: Shu, Xinxin @ 2014-07-24  5:08 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel

Hi mark,

I think this maybe related to 'verify_checksums' config option ,when ReadOptions is initialized, default this option is  true , all data read from underlying storage will be verified against corresponding checksums,  however,  this option cannot be configured in wip-rocksdb branch. I will modify code to make this option configurable .

Cheers,
xinxin

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Thursday, July 24, 2014 7:14 AM
To: ceph-devel@vger.kernel.org
Subject: First attempt at rocksdb monitor store stress testing

Hi Guys,

So I've been interested lately in leveldb 99th percentile latency (and the amount of write amplification we are seeing) with leveldb. Joao mentioned he has written a tool called mon-store-stress in wip-leveldb-misc to try to provide a means to roughly guess at what's happening on the mons under heavy load.  I cherry-picked it over to wip-rocksdb and after a couple of hacks was able to get everything built and running with some basic tests.  There was little tuning done and I don't know how realistic this workload is, so don't assume this means anything yet, but some initial results are here:

http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf

Command that was used to run the tests:

./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> --write-min-size 50K --write-max-size 2M --percent-write 70 --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000 foo

The most interesting bit right now is that rocksdb seems to be hanging in the middle of the test (left it running for several hours).  CPU usage on one core was quite high during the hang.  Profiling using perf with dwarf symbols I see:

-  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned int rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned int, char const*, unsigned long)
    - unsigned int
rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned int, char const*, unsigned long)
         51.70% rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
rocksdb::Footer const&, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*, bool)
         48.30%
rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle*)

Not sure what's going on yet, may need to try to enable logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-24  5:08 ` Shu, Xinxin
@ 2014-07-24 11:13   ` Mark Nelson
  2014-07-24 23:45     ` Mark Nelson
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Nelson @ 2014-07-24 11:13 UTC (permalink / raw)
  To: Shu, Xinxin, ceph-devel

Hi Xinxin,

Thanks! I wonder as well if it might be interesting to expose the 
options related to universal compaction?  It looks like rocksdb provides 
a lot of interesting knobs you can adjust!

Mark

On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
> Hi mark,
>
> I think this maybe related to 'verify_checksums' config option ,when ReadOptions is initialized, default this option is  true , all data read from underlying storage will be verified against corresponding checksums,  however,  this option cannot be configured in wip-rocksdb branch. I will modify code to make this option configurable .
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Thursday, July 24, 2014 7:14 AM
> To: ceph-devel@vger.kernel.org
> Subject: First attempt at rocksdb monitor store stress testing
>
> Hi Guys,
>
> So I've been interested lately in leveldb 99th percentile latency (and the amount of write amplification we are seeing) with leveldb. Joao mentioned he has written a tool called mon-store-stress in wip-leveldb-misc to try to provide a means to roughly guess at what's happening on the mons under heavy load.  I cherry-picked it over to wip-rocksdb and after a couple of hacks was able to get everything built and running with some basic tests.  There was little tuning done and I don't know how realistic this workload is, so don't assume this means anything yet, but some initial results are here:
>
> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>
> Command that was used to run the tests:
>
> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> --write-min-size 50K --write-max-size 2M --percent-write 70 --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000 foo
>
> The most interesting bit right now is that rocksdb seems to be hanging in the middle of the test (left it running for several hours).  CPU usage on one core was quite high during the hang.  Profiling using perf with dwarf symbols I see:
>
> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned int rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned int, char const*, unsigned long)
>      - unsigned int
> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned int, char const*, unsigned long)
>           51.70% rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
> rocksdb::Footer const&, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*, bool)
>           48.30%
> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>
> Not sure what's going on yet, may need to try to enable logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-24 11:13   ` Mark Nelson
@ 2014-07-24 23:45     ` Mark Nelson
  2014-07-25  1:28       ` Shu, Xinxin
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Nelson @ 2014-07-24 23:45 UTC (permalink / raw)
  To: Shu, Xinxin, ceph-devel

Earlier today I modified the rocksdb options so I could enable universal 
compaction.  Over all performance is lower but I don't see the 
hang/stall in the middle of the test either.  Instead the disk is 
basically pegged with 100% writes.  I suspect average latency is higher 
than leveldb, but the highest latency is about 5-6s while we were seeing 
30s spikes for leveldb with levelled (heh) compaction.

I haven't done much tuning either way yet.  It may be that if we keep 
level 0 and level 1 roughly the same size we can reduce stalls in the 
levelled setups.  It will also be interesting to see what happens in 
these tests on SSDs.

Mark

On 07/24/2014 06:13 AM, Mark Nelson wrote:
> Hi Xinxin,
>
> Thanks! I wonder as well if it might be interesting to expose the
> options related to universal compaction?  It looks like rocksdb provides
> a lot of interesting knobs you can adjust!
>
> Mark
>
> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I think this maybe related to 'verify_checksums' config option ,when
>> ReadOptions is initialized, default this option is  true , all data
>> read from underlying storage will be verified against corresponding
>> checksums,  however,  this option cannot be configured in wip-rocksdb
>> branch. I will modify code to make this option configurable .
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>> Sent: Thursday, July 24, 2014 7:14 AM
>> To: ceph-devel@vger.kernel.org
>> Subject: First attempt at rocksdb monitor store stress testing
>>
>> Hi Guys,
>>
>> So I've been interested lately in leveldb 99th percentile latency (and
>> the amount of write amplification we are seeing) with leveldb. Joao
>> mentioned he has written a tool called mon-store-stress in
>> wip-leveldb-misc to try to provide a means to roughly guess at what's
>> happening on the mons under heavy load.  I cherry-picked it over to
>> wip-rocksdb and after a couple of hacks was able to get everything
>> built and running with some basic tests.  There was little tuning done
>> and I don't know how realistic this workload is, so don't assume this
>> means anything yet, but some initial results are here:
>>
>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>
>> Command that was used to run the tests:
>>
>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb>
>> --write-min-size 50K --write-max-size 2M --percent-write 70
>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000 foo
>>
>> The most interesting bit right now is that rocksdb seems to be hanging
>> in the middle of the test (left it running for several hours).  CPU
>> usage on one core was quite high during the hang.  Profiling using
>> perf with dwarf symbols I see:
>>
>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned
>> int rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>> int, char const*, unsigned long)
>>      - unsigned int
>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>> int, char const*, unsigned long)
>>           51.70% rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>> rocksdb::Footer const&, rocksdb::ReadOptions const&,
>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*,
>> bool)
>>           48.30%
>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&,
>> rocksdb::CompressionType, rocksdb::BlockHandle*)
>>
>> Not sure what's going on yet, may need to try to enable
>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-24 23:45     ` Mark Nelson
@ 2014-07-25  1:28       ` Shu, Xinxin
  2014-07-25 12:08         ` Mark Nelson
  2014-07-25 16:09         ` Mark Nelson
  0 siblings, 2 replies; 22+ messages in thread
From: Shu, Xinxin @ 2014-07-25  1:28 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel

Hi mark, 

I am looking forward to your results on SSDs .
rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.

Btw , can you list your rocksdb configuration?

Cheers,
xinxin

-----Original Message-----
From: Mark Nelson [mailto:mark.nelson@inktank.com] 
Sent: Friday, July 25, 2014 7:45 AM
To: Shu, Xinxin; ceph-devel@vger.kernel.org
Subject: Re: First attempt at rocksdb monitor store stress testing

Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.

I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.

Mark

On 07/24/2014 06:13 AM, Mark Nelson wrote:
> Hi Xinxin,
>
> Thanks! I wonder as well if it might be interesting to expose the 
> options related to universal compaction?  It looks like rocksdb 
> provides a lot of interesting knobs you can adjust!
>
> Mark
>
> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I think this maybe related to 'verify_checksums' config option ,when 
>> ReadOptions is initialized, default this option is  true , all data 
>> read from underlying storage will be verified against corresponding 
>> checksums,  however,  this option cannot be configured in wip-rocksdb 
>> branch. I will modify code to make this option configurable .
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>> Sent: Thursday, July 24, 2014 7:14 AM
>> To: ceph-devel@vger.kernel.org
>> Subject: First attempt at rocksdb monitor store stress testing
>>
>> Hi Guys,
>>
>> So I've been interested lately in leveldb 99th percentile latency 
>> (and the amount of write amplification we are seeing) with leveldb. 
>> Joao mentioned he has written a tool called mon-store-stress in 
>> wip-leveldb-misc to try to provide a means to roughly guess at what's 
>> happening on the mons under heavy load.  I cherry-picked it over to 
>> wip-rocksdb and after a couple of hacks was able to get everything 
>> built and running with some basic tests.  There was little tuning 
>> done and I don't know how realistic this workload is, so don't assume 
>> this means anything yet, but some initial results are here:
>>
>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>
>> Command that was used to run the tests:
>>
>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
>> --write-min-size 50K --write-max-size 2M --percent-write 70 
>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000 
>> foo
>>
>> The most interesting bit right now is that rocksdb seems to be 
>> hanging in the middle of the test (left it running for several 
>> hours).  CPU usage on one core was quite high during the hang.  
>> Profiling using perf with dwarf symbols I see:
>>
>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned 
>> int 
>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>> int, char const*, unsigned long)
>>      - unsigned int
>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>> int, char const*, unsigned long)
>>           51.70% 
>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*,
>> bool)
>>           48.30%
>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, 
>> rocksdb::CompressionType, rocksdb::BlockHandle*)
>>
>> Not sure what's going on yet, may need to try to enable 
>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-25  1:28       ` Shu, Xinxin
@ 2014-07-25 12:08         ` Mark Nelson
  2014-07-25 16:09         ` Mark Nelson
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Nelson @ 2014-07-25 12:08 UTC (permalink / raw)
  To: Shu, Xinxin, ceph-devel

On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
> Hi mark,
>
> I am looking forward to your results on SSDs .

Me too!

> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.

I'm wondering if it might not be so bad for us given the kind of work 
the mon does.  We write out a lot of maps and incrementals, but I don't 
think the mon goes back and updates objects very often.  Assuming I 
understand how universal vs level compaction works (I might not!) this 
should help contain the number of SST files that objects get spread 
across which causes all of the extra read seeks with universal compaction.

>
> Btw , can you list your rocksdb configuration?

Sure, right now it's all the stock defaults in config_opts except the 
new option I added for universal compaction.  I am hoping I can run some 
more tests today and this weekend with tuned ones.

>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Friday, July 25, 2014 7:45 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>
> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>
> Mark
>
> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>> Hi Xinxin,
>>
>> Thanks! I wonder as well if it might be interesting to expose the
>> options related to universal compaction?  It looks like rocksdb
>> provides a lot of interesting knobs you can adjust!
>>
>> Mark
>>
>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>> Hi mark,
>>>
>>> I think this maybe related to 'verify_checksums' config option ,when
>>> ReadOptions is initialized, default this option is  true , all data
>>> read from underlying storage will be verified against corresponding
>>> checksums,  however,  this option cannot be configured in wip-rocksdb
>>> branch. I will modify code to make this option configurable .
>>>
>>> Cheers,
>>> xinxin
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>> Sent: Thursday, July 24, 2014 7:14 AM
>>> To: ceph-devel@vger.kernel.org
>>> Subject: First attempt at rocksdb monitor store stress testing
>>>
>>> Hi Guys,
>>>
>>> So I've been interested lately in leveldb 99th percentile latency
>>> (and the amount of write amplification we are seeing) with leveldb.
>>> Joao mentioned he has written a tool called mon-store-stress in
>>> wip-leveldb-misc to try to provide a means to roughly guess at what's
>>> happening on the mons under heavy load.  I cherry-picked it over to
>>> wip-rocksdb and after a couple of hacks was able to get everything
>>> built and running with some basic tests.  There was little tuning
>>> done and I don't know how realistic this workload is, so don't assume
>>> this means anything yet, but some initial results are here:
>>>
>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>
>>> Command that was used to run the tests:
>>>
>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb>
>>> --write-min-size 50K --write-max-size 2M --percent-write 70
>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000
>>> foo
>>>
>>> The most interesting bit right now is that rocksdb seems to be
>>> hanging in the middle of the test (left it running for several
>>> hours).  CPU usage on one core was quite high during the hang.
>>> Profiling using perf with dwarf symbols I see:
>>>
>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned
>>> int
>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>> int, char const*, unsigned long)
>>>       - unsigned int
>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>> int, char const*, unsigned long)
>>>            51.70%
>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>> rocksdb::Footer const&, rocksdb::ReadOptions const&,
>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*,
>>> bool)
>>>            48.30%
>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&,
>>> rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>
>>> Not sure what's going on yet, may need to try to enable
>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>
>>> Mark
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-25  1:28       ` Shu, Xinxin
  2014-07-25 12:08         ` Mark Nelson
@ 2014-07-25 16:09         ` Mark Nelson
  2014-07-28  4:45           ` Shu, Xinxin
  1 sibling, 1 reply; 22+ messages in thread
From: Mark Nelson @ 2014-07-25 16:09 UTC (permalink / raw)
  To: Shu, Xinxin, ceph-devel

Hi Xinxin,

I'm trying to enable the rocksdb log file as described in config_opts using:

rocksdb_log = <path to log file>

The file gets created but is empty.  Any ideas?

Mark

On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
> Hi mark,
>
> I am looking forward to your results on SSDs .
> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>
> Btw , can you list your rocksdb configuration?
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Friday, July 25, 2014 7:45 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>
> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>
> Mark
>
> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>> Hi Xinxin,
>>
>> Thanks! I wonder as well if it might be interesting to expose the
>> options related to universal compaction?  It looks like rocksdb
>> provides a lot of interesting knobs you can adjust!
>>
>> Mark
>>
>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>> Hi mark,
>>>
>>> I think this maybe related to 'verify_checksums' config option ,when
>>> ReadOptions is initialized, default this option is  true , all data
>>> read from underlying storage will be verified against corresponding
>>> checksums,  however,  this option cannot be configured in wip-rocksdb
>>> branch. I will modify code to make this option configurable .
>>>
>>> Cheers,
>>> xinxin
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>> Sent: Thursday, July 24, 2014 7:14 AM
>>> To: ceph-devel@vger.kernel.org
>>> Subject: First attempt at rocksdb monitor store stress testing
>>>
>>> Hi Guys,
>>>
>>> So I've been interested lately in leveldb 99th percentile latency
>>> (and the amount of write amplification we are seeing) with leveldb.
>>> Joao mentioned he has written a tool called mon-store-stress in
>>> wip-leveldb-misc to try to provide a means to roughly guess at what's
>>> happening on the mons under heavy load.  I cherry-picked it over to
>>> wip-rocksdb and after a couple of hacks was able to get everything
>>> built and running with some basic tests.  There was little tuning
>>> done and I don't know how realistic this workload is, so don't assume
>>> this means anything yet, but some initial results are here:
>>>
>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>
>>> Command that was used to run the tests:
>>>
>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb>
>>> --write-min-size 50K --write-max-size 2M --percent-write 70
>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000
>>> foo
>>>
>>> The most interesting bit right now is that rocksdb seems to be
>>> hanging in the middle of the test (left it running for several
>>> hours).  CPU usage on one core was quite high during the hang.
>>> Profiling using perf with dwarf symbols I see:
>>>
>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned
>>> int
>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>> int, char const*, unsigned long)
>>>       - unsigned int
>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>> int, char const*, unsigned long)
>>>            51.70%
>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>> rocksdb::Footer const&, rocksdb::ReadOptions const&,
>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*,
>>> bool)
>>>            48.30%
>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&,
>>> rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>
>>> Not sure what's going on yet, may need to try to enable
>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>
>>> Mark
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-25 16:09         ` Mark Nelson
@ 2014-07-28  4:45           ` Shu, Xinxin
  2014-07-28 16:55             ` Mark Nelson
  2014-07-30 17:34             ` Mark Nelson
  0 siblings, 2 replies; 22+ messages in thread
From: Shu, Xinxin @ 2014-07-28  4:45 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel

Hi mark, 

I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:

Rocksdb_log = ""

Cheers,
xinxin

-----Original Message-----
From: Mark Nelson [mailto:mark.nelson@inktank.com] 
Sent: Saturday, July 26, 2014 12:10 AM
To: Shu, Xinxin; ceph-devel@vger.kernel.org
Subject: Re: First attempt at rocksdb monitor store stress testing

Hi Xinxin,

I'm trying to enable the rocksdb log file as described in config_opts using:

rocksdb_log = <path to log file>

The file gets created but is empty.  Any ideas?

Mark

On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
> Hi mark,
>
> I am looking forward to your results on SSDs .
> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>
> Btw , can you list your rocksdb configuration?
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Friday, July 25, 2014 7:45 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>
> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>
> Mark
>
> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>> Hi Xinxin,
>>
>> Thanks! I wonder as well if it might be interesting to expose the 
>> options related to universal compaction?  It looks like rocksdb 
>> provides a lot of interesting knobs you can adjust!
>>
>> Mark
>>
>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>> Hi mark,
>>>
>>> I think this maybe related to 'verify_checksums' config option ,when 
>>> ReadOptions is initialized, default this option is  true , all data 
>>> read from underlying storage will be verified against corresponding 
>>> checksums,  however,  this option cannot be configured in 
>>> wip-rocksdb branch. I will modify code to make this option configurable .
>>>
>>> Cheers,
>>> xinxin
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org 
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>> Sent: Thursday, July 24, 2014 7:14 AM
>>> To: ceph-devel@vger.kernel.org
>>> Subject: First attempt at rocksdb monitor store stress testing
>>>
>>> Hi Guys,
>>>
>>> So I've been interested lately in leveldb 99th percentile latency 
>>> (and the amount of write amplification we are seeing) with leveldb.
>>> Joao mentioned he has written a tool called mon-store-stress in 
>>> wip-leveldb-misc to try to provide a means to roughly guess at 
>>> what's happening on the mons under heavy load.  I cherry-picked it 
>>> over to wip-rocksdb and after a couple of hacks was able to get 
>>> everything built and running with some basic tests.  There was 
>>> little tuning done and I don't know how realistic this workload is, 
>>> so don't assume this means anything yet, but some initial results are here:
>>>
>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>
>>> Command that was used to run the tests:
>>>
>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
>>> --write-min-size 50K --write-max-size 2M --percent-write 70 
>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000 
>>> foo
>>>
>>> The most interesting bit right now is that rocksdb seems to be 
>>> hanging in the middle of the test (left it running for several 
>>> hours).  CPU usage on one core was quite high during the hang.
>>> Profiling using perf with dwarf symbols I see:
>>>
>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned 
>>> int 
>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>> int, char const*, unsigned long)
>>>       - unsigned int
>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>> int, char const*, unsigned long)
>>>            51.70%
>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*,
>>> bool)
>>>            48.30%
>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice 
>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>
>>> Not sure what's going on yet, may need to try to enable 
>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>
>>> Mark
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-28  4:45           ` Shu, Xinxin
@ 2014-07-28 16:55             ` Mark Nelson
  2014-07-31  1:59               ` Shu, Xinxin
  2014-07-31  2:00               ` Shu, Xinxin
  2014-07-30 17:34             ` Mark Nelson
  1 sibling, 2 replies; 22+ messages in thread
From: Mark Nelson @ 2014-07-28 16:55 UTC (permalink / raw)
  To: Shu, Xinxin, ceph-devel

Hi Xinxin,

Thanks, I'll give it a try.  I want to figure out what's going on in 
Rocksdb when the test stalls with leveled compaction.  In the mean time, 
here are the test results with spinning disks and SSDs:

http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.pdf

Mark

On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> Hi mark,
>
> I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
>
> Rocksdb_log = ""
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Saturday, July 26, 2014 12:10 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Hi Xinxin,
>
> I'm trying to enable the rocksdb log file as described in config_opts using:
>
> rocksdb_log = <path to log file>
>
> The file gets created but is empty.  Any ideas?
>
> Mark
>
> On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I am looking forward to your results on SSDs .
>> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>>
>> Btw , can you list your rocksdb configuration?
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Friday, July 25, 2014 7:45 AM
>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>
>> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>>
>> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>>
>> Mark
>>
>> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>>> Hi Xinxin,
>>>
>>> Thanks! I wonder as well if it might be interesting to expose the
>>> options related to universal compaction?  It looks like rocksdb
>>> provides a lot of interesting knobs you can adjust!
>>>
>>> Mark
>>>
>>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>>> Hi mark,
>>>>
>>>> I think this maybe related to 'verify_checksums' config option ,when
>>>> ReadOptions is initialized, default this option is  true , all data
>>>> read from underlying storage will be verified against corresponding
>>>> checksums,  however,  this option cannot be configured in
>>>> wip-rocksdb branch. I will modify code to make this option configurable .
>>>>
>>>> Cheers,
>>>> xinxin
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>> Sent: Thursday, July 24, 2014 7:14 AM
>>>> To: ceph-devel@vger.kernel.org
>>>> Subject: First attempt at rocksdb monitor store stress testing
>>>>
>>>> Hi Guys,
>>>>
>>>> So I've been interested lately in leveldb 99th percentile latency
>>>> (and the amount of write amplification we are seeing) with leveldb.
>>>> Joao mentioned he has written a tool called mon-store-stress in
>>>> wip-leveldb-misc to try to provide a means to roughly guess at
>>>> what's happening on the mons under heavy load.  I cherry-picked it
>>>> over to wip-rocksdb and after a couple of hacks was able to get
>>>> everything built and running with some basic tests.  There was
>>>> little tuning done and I don't know how realistic this workload is,
>>>> so don't assume this means anything yet, but some initial results are here:
>>>>
>>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>>
>>>> Command that was used to run the tests:
>>>>
>>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb>
>>>> --write-min-size 50K --write-max-size 2M --percent-write 70
>>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000
>>>> foo
>>>>
>>>> The most interesting bit right now is that rocksdb seems to be
>>>> hanging in the middle of the test (left it running for several
>>>> hours).  CPU usage on one core was quite high during the hang.
>>>> Profiling using perf with dwarf symbols I see:
>>>>
>>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned
>>>> int
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>        - unsigned int
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>             51.70%
>>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>>> rocksdb::Footer const&, rocksdb::ReadOptions const&,
>>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*,
>>>> bool)
>>>>             48.30%
>>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
>>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>>
>>>> Not sure what's going on yet, may need to try to enable
>>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>>
>>>> Mark
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-28  4:45           ` Shu, Xinxin
  2014-07-28 16:55             ` Mark Nelson
@ 2014-07-30 17:34             ` Mark Nelson
  2014-07-31  1:46               ` Shu, Xinxin
  1 sibling, 1 reply; 22+ messages in thread
From: Mark Nelson @ 2014-07-30 17:34 UTC (permalink / raw)
  To: Shu, Xinxin, ceph-devel

Hi Xinxin,

Yes, that did work.  I was able to observe the log and figure out the 
stall:  Too many files open in the level0->level1 compaction thread. 
Similar to the issue that we've seen the past with leveldb.  Setting a 
higher ulimit fixed the problem.  With leveled compaction on spinning 
disks I do see latency spikes but at first glance they do not appear to 
be as bad as with leveldb.  I will now run some longer tests.

Mark

On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> Hi mark,
>
> I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
>
> Rocksdb_log = ""
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Saturday, July 26, 2014 12:10 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Hi Xinxin,
>
> I'm trying to enable the rocksdb log file as described in config_opts using:
>
> rocksdb_log = <path to log file>
>
> The file gets created but is empty.  Any ideas?
>
> Mark
>
> On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I am looking forward to your results on SSDs .
>> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>>
>> Btw , can you list your rocksdb configuration?
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Friday, July 25, 2014 7:45 AM
>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>
>> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>>
>> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>>
>> Mark
>>
>> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>>> Hi Xinxin,
>>>
>>> Thanks! I wonder as well if it might be interesting to expose the
>>> options related to universal compaction?  It looks like rocksdb
>>> provides a lot of interesting knobs you can adjust!
>>>
>>> Mark
>>>
>>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>>> Hi mark,
>>>>
>>>> I think this maybe related to 'verify_checksums' config option ,when
>>>> ReadOptions is initialized, default this option is  true , all data
>>>> read from underlying storage will be verified against corresponding
>>>> checksums,  however,  this option cannot be configured in
>>>> wip-rocksdb branch. I will modify code to make this option configurable .
>>>>
>>>> Cheers,
>>>> xinxin
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>> Sent: Thursday, July 24, 2014 7:14 AM
>>>> To: ceph-devel@vger.kernel.org
>>>> Subject: First attempt at rocksdb monitor store stress testing
>>>>
>>>> Hi Guys,
>>>>
>>>> So I've been interested lately in leveldb 99th percentile latency
>>>> (and the amount of write amplification we are seeing) with leveldb.
>>>> Joao mentioned he has written a tool called mon-store-stress in
>>>> wip-leveldb-misc to try to provide a means to roughly guess at
>>>> what's happening on the mons under heavy load.  I cherry-picked it
>>>> over to wip-rocksdb and after a couple of hacks was able to get
>>>> everything built and running with some basic tests.  There was
>>>> little tuning done and I don't know how realistic this workload is,
>>>> so don't assume this means anything yet, but some initial results are here:
>>>>
>>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>>
>>>> Command that was used to run the tests:
>>>>
>>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb>
>>>> --write-min-size 50K --write-max-size 2M --percent-write 70
>>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000
>>>> foo
>>>>
>>>> The most interesting bit right now is that rocksdb seems to be
>>>> hanging in the middle of the test (left it running for several
>>>> hours).  CPU usage on one core was quite high during the hang.
>>>> Profiling using perf with dwarf symbols I see:
>>>>
>>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned
>>>> int
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>        - unsigned int
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>             51.70%
>>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>>> rocksdb::Footer const&, rocksdb::ReadOptions const&,
>>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::Env*,
>>>> bool)
>>>>             48.30%
>>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
>>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>>
>>>> Not sure what's going on yet, may need to try to enable
>>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>>
>>>> Mark
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-30 17:34             ` Mark Nelson
@ 2014-07-31  1:46               ` Shu, Xinxin
  2014-07-31 12:30                 ` Mark Nelson
  0 siblings, 1 reply; 22+ messages in thread
From: Shu, Xinxin @ 2014-07-31  1:46 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel

Hi mark, 
Which way do you used to set a higher limitation?  use 'ulimit' command or enlarge rocksdb_max_open_files config option?

Cheers,
xinxin

-----Original Message-----
From: Mark Nelson [mailto:mark.nelson@inktank.com] 
Sent: Thursday, July 31, 2014 1:35 AM
To: Shu, Xinxin; ceph-devel@vger.kernel.org
Subject: Re: First attempt at rocksdb monitor store stress testing

Hi Xinxin,

Yes, that did work.  I was able to observe the log and figure out the
stall:  Too many files open in the level0->level1 compaction thread. 
Similar to the issue that we've seen the past with leveldb.  Setting a higher ulimit fixed the problem.  With leveled compaction on spinning disks I do see latency spikes but at first glance they do not appear to be as bad as with leveldb.  I will now run some longer tests.

Mark

On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> Hi mark,
>
> I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
>
> Rocksdb_log = ""
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Saturday, July 26, 2014 12:10 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Hi Xinxin,
>
> I'm trying to enable the rocksdb log file as described in config_opts using:
>
> rocksdb_log = <path to log file>
>
> The file gets created but is empty.  Any ideas?
>
> Mark
>
> On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I am looking forward to your results on SSDs .
>> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>>
>> Btw , can you list your rocksdb configuration?
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Friday, July 25, 2014 7:45 AM
>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>
>> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>>
>> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>>
>> Mark
>>
>> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>>> Hi Xinxin,
>>>
>>> Thanks! I wonder as well if it might be interesting to expose the 
>>> options related to universal compaction?  It looks like rocksdb 
>>> provides a lot of interesting knobs you can adjust!
>>>
>>> Mark
>>>
>>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>>> Hi mark,
>>>>
>>>> I think this maybe related to 'verify_checksums' config option 
>>>> ,when ReadOptions is initialized, default this option is  true , 
>>>> all data read from underlying storage will be verified against 
>>>> corresponding checksums,  however,  this option cannot be 
>>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
>>>>
>>>> Cheers,
>>>> xinxin
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org 
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>> Sent: Thursday, July 24, 2014 7:14 AM
>>>> To: ceph-devel@vger.kernel.org
>>>> Subject: First attempt at rocksdb monitor store stress testing
>>>>
>>>> Hi Guys,
>>>>
>>>> So I've been interested lately in leveldb 99th percentile latency 
>>>> (and the amount of write amplification we are seeing) with leveldb.
>>>> Joao mentioned he has written a tool called mon-store-stress in 
>>>> wip-leveldb-misc to try to provide a means to roughly guess at 
>>>> what's happening on the mons under heavy load.  I cherry-picked it 
>>>> over to wip-rocksdb and after a couple of hacks was able to get 
>>>> everything built and running with some basic tests.  There was 
>>>> little tuning done and I don't know how realistic this workload is, 
>>>> so don't assume this means anything yet, but some initial results are here:
>>>>
>>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>>
>>>> Command that was used to run the tests:
>>>>
>>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
>>>> --write-min-size 50K --write-max-size 2M --percent-write 70 
>>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 
>>>> 5000 foo
>>>>
>>>> The most interesting bit right now is that rocksdb seems to be 
>>>> hanging in the middle of the test (left it running for several 
>>>> hours).  CPU usage on one core was quite high during the hang.
>>>> Profiling using perf with dwarf symbols I see:
>>>>
>>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] 
>>>> unsigned int 
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>        - unsigned int
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>             51.70%
>>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
>>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, 
>>>> rocksdb::Env*,
>>>> bool)
>>>>             48.30%
>>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
>>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>>
>>>> Not sure what's going on yet, may need to try to enable 
>>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>>
>>>> Mark
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-28 16:55             ` Mark Nelson
@ 2014-07-31  1:59               ` Shu, Xinxin
  2014-07-31 12:41                 ` Mark Nelson
  2014-07-31  2:00               ` Shu, Xinxin
  1 sibling, 1 reply; 22+ messages in thread
From: Shu, Xinxin @ 2014-07-31  1:59 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel

Hi mark , 
 
There are four tables in your report , do you run four tests cases  or only run a single case (read/write mix) , if you run a single mix case , but latency in the fourth table is not same with other three tables .

In leveldb and rocksdb, level0 and level1 should be same size?

to my knowledge , in rocksdb , by default ,level1 size is ten times of level0 size , but the same size sstable file.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, July 29, 2014 12:56 AM
To: Shu, Xinxin; ceph-devel@vger.kernel.org
Subject: Re: First attempt at rocksdb monitor store stress testing

Hi Xinxin,

Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:

http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.pdf

Mark

On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> Hi mark,
>
> I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
>
> Rocksdb_log = ""
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Saturday, July 26, 2014 12:10 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Hi Xinxin,
>
> I'm trying to enable the rocksdb log file as described in config_opts using:
>
> rocksdb_log = <path to log file>
>
> The file gets created but is empty.  Any ideas?
>
> Mark
>
> On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I am looking forward to your results on SSDs .
>> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>>
>> Btw , can you list your rocksdb configuration?
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Friday, July 25, 2014 7:45 AM
>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>
>> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>>
>> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>>
>> Mark
>>
>> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>>> Hi Xinxin,
>>>
>>> Thanks! I wonder as well if it might be interesting to expose the 
>>> options related to universal compaction?  It looks like rocksdb 
>>> provides a lot of interesting knobs you can adjust!
>>>
>>> Mark
>>>
>>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>>> Hi mark,
>>>>
>>>> I think this maybe related to 'verify_checksums' config option 
>>>> ,when ReadOptions is initialized, default this option is  true , 
>>>> all data read from underlying storage will be verified against 
>>>> corresponding checksums,  however,  this option cannot be 
>>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
>>>>
>>>> Cheers,
>>>> xinxin
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org 
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>> Sent: Thursday, July 24, 2014 7:14 AM
>>>> To: ceph-devel@vger.kernel.org
>>>> Subject: First attempt at rocksdb monitor store stress testing
>>>>
>>>> Hi Guys,
>>>>
>>>> So I've been interested lately in leveldb 99th percentile latency 
>>>> (and the amount of write amplification we are seeing) with leveldb.
>>>> Joao mentioned he has written a tool called mon-store-stress in 
>>>> wip-leveldb-misc to try to provide a means to roughly guess at 
>>>> what's happening on the mons under heavy load.  I cherry-picked it 
>>>> over to wip-rocksdb and after a couple of hacks was able to get 
>>>> everything built and running with some basic tests.  There was 
>>>> little tuning done and I don't know how realistic this workload is, 
>>>> so don't assume this means anything yet, but some initial results are here:
>>>>
>>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>>
>>>> Command that was used to run the tests:
>>>>
>>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
>>>> --write-min-size 50K --write-max-size 2M --percent-write 70 
>>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 
>>>> 5000 foo
>>>>
>>>> The most interesting bit right now is that rocksdb seems to be 
>>>> hanging in the middle of the test (left it running for several 
>>>> hours).  CPU usage on one core was quite high during the hang.
>>>> Profiling using perf with dwarf symbols I see:
>>>>
>>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] 
>>>> unsigned int 
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>        - unsigned int
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>             51.70%
>>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
>>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, 
>>>> rocksdb::Env*,
>>>> bool)
>>>>             48.30%
>>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
>>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>>
>>>> Not sure what's going on yet, may need to try to enable 
>>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>>
>>>> Mark
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-28 16:55             ` Mark Nelson
  2014-07-31  1:59               ` Shu, Xinxin
@ 2014-07-31  2:00               ` Shu, Xinxin
  2014-07-31  2:08                 ` Sage Weil
  2014-08-01 17:41                 ` Mark Nelson
  1 sibling, 2 replies; 22+ messages in thread
From: Shu, Xinxin @ 2014-07-31  2:00 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel

Does your report base on wip-rocksdb-mark branch?

Cheers,
xinxin

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, July 29, 2014 12:56 AM
To: Shu, Xinxin; ceph-devel@vger.kernel.org
Subject: Re: First attempt at rocksdb monitor store stress testing

Hi Xinxin,

Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:

http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.pdf

Mark

On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> Hi mark,
>
> I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
>
> Rocksdb_log = ""
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Saturday, July 26, 2014 12:10 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Hi Xinxin,
>
> I'm trying to enable the rocksdb log file as described in config_opts using:
>
> rocksdb_log = <path to log file>
>
> The file gets created but is empty.  Any ideas?
>
> Mark
>
> On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I am looking forward to your results on SSDs .
>> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>>
>> Btw , can you list your rocksdb configuration?
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Friday, July 25, 2014 7:45 AM
>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>
>> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>>
>> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>>
>> Mark
>>
>> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>>> Hi Xinxin,
>>>
>>> Thanks! I wonder as well if it might be interesting to expose the 
>>> options related to universal compaction?  It looks like rocksdb 
>>> provides a lot of interesting knobs you can adjust!
>>>
>>> Mark
>>>
>>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>>> Hi mark,
>>>>
>>>> I think this maybe related to 'verify_checksums' config option 
>>>> ,when ReadOptions is initialized, default this option is  true , 
>>>> all data read from underlying storage will be verified against 
>>>> corresponding checksums,  however,  this option cannot be 
>>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
>>>>
>>>> Cheers,
>>>> xinxin
>>>>
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org 
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>> Sent: Thursday, July 24, 2014 7:14 AM
>>>> To: ceph-devel@vger.kernel.org
>>>> Subject: First attempt at rocksdb monitor store stress testing
>>>>
>>>> Hi Guys,
>>>>
>>>> So I've been interested lately in leveldb 99th percentile latency 
>>>> (and the amount of write amplification we are seeing) with leveldb.
>>>> Joao mentioned he has written a tool called mon-store-stress in 
>>>> wip-leveldb-misc to try to provide a means to roughly guess at 
>>>> what's happening on the mons under heavy load.  I cherry-picked it 
>>>> over to wip-rocksdb and after a couple of hacks was able to get 
>>>> everything built and running with some basic tests.  There was 
>>>> little tuning done and I don't know how realistic this workload is, 
>>>> so don't assume this means anything yet, but some initial results are here:
>>>>
>>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>>
>>>> Command that was used to run the tests:
>>>>
>>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
>>>> --write-min-size 50K --write-max-size 2M --percent-write 70 
>>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 
>>>> 5000 foo
>>>>
>>>> The most interesting bit right now is that rocksdb seems to be 
>>>> hanging in the middle of the test (left it running for several 
>>>> hours).  CPU usage on one core was quite high during the hang.
>>>> Profiling using perf with dwarf symbols I see:
>>>>
>>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] 
>>>> unsigned int 
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>        - unsigned int
>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>> int, char const*, unsigned long)
>>>>             51.70%
>>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
>>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, 
>>>> rocksdb::Env*,
>>>> bool)
>>>>             48.30%
>>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
>>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>>
>>>> Not sure what's going on yet, may need to try to enable 
>>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>>
>>>> Mark
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-31  2:00               ` Shu, Xinxin
@ 2014-07-31  2:08                 ` Sage Weil
  2014-07-31  8:41                   ` Shu, Xinxin
  2014-07-31  8:58                   ` Shu, Xinxin
  2014-08-01 17:41                 ` Mark Nelson
  1 sibling, 2 replies; 22+ messages in thread
From: Sage Weil @ 2014-07-31  2:08 UTC (permalink / raw)
  To: Shu, Xinxin; +Cc: Mark Nelson, ceph-devel

By the way, I'm getting closer to getting wip-rocksdb in a state where it 
can be merged, but it is failing to build due to this line:

	$(shell (./build_tools/build_detect_version))

in Makefile.am which results in

automake: warnings are treated as errors
warning: Makefile.am:59: shell (./build_tools/build_detect_version: 
non-POSIX variable name
Makefile.am:59: (probably a GNU make extension)
Makefile.am: installing './depcomp'
autoreconf: automake failed with exit status: 1

Any suggestions?  You can see these build results at

	http://ceph.com/gitbuilder.cgi
	http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-amd64-basic/log.cgi?log=92212c722100065922468e4185759be0435877ff

sage


On Thu, 31 Jul 2014, Shu, Xinxin wrote:

> Does your report base on wip-rocksdb-mark branch?
> 
> Cheers,
> xinxin
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, July 29, 2014 12:56 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
> 
> Hi Xinxin,
> 
> Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:
> 
> http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.pdf
> 
> Mark
> 
> On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> > Hi mark,
> >
> > I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
> >
> > Rocksdb_log = ""
> >
> > Cheers,
> > xinxin
> >
> > -----Original Message-----
> > From: Mark Nelson [mailto:mark.nelson@inktank.com]
> > Sent: Saturday, July 26, 2014 12:10 AM
> > To: Shu, Xinxin; ceph-devel@vger.kernel.org
> > Subject: Re: First attempt at rocksdb monitor store stress testing
> >
> > Hi Xinxin,
> >
> > I'm trying to enable the rocksdb log file as described in config_opts using:
> >
> > rocksdb_log = <path to log file>
> >
> > The file gets created but is empty.  Any ideas?
> >
> > Mark
> >
> > On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
> >> Hi mark,
> >>
> >> I am looking forward to your results on SSDs .
> >> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
> >>
> >> Btw , can you list your rocksdb configuration?
> >>
> >> Cheers,
> >> xinxin
> >>
> >> -----Original Message-----
> >> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> >> Sent: Friday, July 25, 2014 7:45 AM
> >> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> >> Subject: Re: First attempt at rocksdb monitor store stress testing
> >>
> >> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
> >>
> >> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
> >>
> >> Mark
> >>
> >> On 07/24/2014 06:13 AM, Mark Nelson wrote:
> >>> Hi Xinxin,
> >>>
> >>> Thanks! I wonder as well if it might be interesting to expose the 
> >>> options related to universal compaction?  It looks like rocksdb 
> >>> provides a lot of interesting knobs you can adjust!
> >>>
> >>> Mark
> >>>
> >>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
> >>>> Hi mark,
> >>>>
> >>>> I think this maybe related to 'verify_checksums' config option 
> >>>> ,when ReadOptions is initialized, default this option is  true , 
> >>>> all data read from underlying storage will be verified against 
> >>>> corresponding checksums,  however,  this option cannot be 
> >>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
> >>>>
> >>>> Cheers,
> >>>> xinxin
> >>>>
> >>>> -----Original Message-----
> >>>> From: ceph-devel-owner@vger.kernel.org 
> >>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> >>>> Sent: Thursday, July 24, 2014 7:14 AM
> >>>> To: ceph-devel@vger.kernel.org
> >>>> Subject: First attempt at rocksdb monitor store stress testing
> >>>>
> >>>> Hi Guys,
> >>>>
> >>>> So I've been interested lately in leveldb 99th percentile latency 
> >>>> (and the amount of write amplification we are seeing) with leveldb.
> >>>> Joao mentioned he has written a tool called mon-store-stress in 
> >>>> wip-leveldb-misc to try to provide a means to roughly guess at 
> >>>> what's happening on the mons under heavy load.  I cherry-picked it 
> >>>> over to wip-rocksdb and after a couple of hacks was able to get 
> >>>> everything built and running with some basic tests.  There was 
> >>>> little tuning done and I don't know how realistic this workload is, 
> >>>> so don't assume this means anything yet, but some initial results are here:
> >>>>
> >>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
> >>>>
> >>>> Command that was used to run the tests:
> >>>>
> >>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
> >>>> --write-min-size 50K --write-max-size 2M --percent-write 70 
> >>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at 
> >>>> 5000 foo
> >>>>
> >>>> The most interesting bit right now is that rocksdb seems to be 
> >>>> hanging in the middle of the test (left it running for several 
> >>>> hours).  CPU usage on one core was quite high during the hang.
> >>>> Profiling using perf with dwarf symbols I see:
> >>>>
> >>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] 
> >>>> unsigned int 
> >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
> >>>> int, char const*, unsigned long)
> >>>>        - unsigned int
> >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
> >>>> int, char const*, unsigned long)
> >>>>             51.70%
> >>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
> >>>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
> >>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, 
> >>>> rocksdb::Env*,
> >>>> bool)
> >>>>             48.30%
> >>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
> >>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
> >>>>
> >>>> Not sure what's going on yet, may need to try to enable 
> >>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
> >>>>
> >>>> Mark
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>>> in the body of a message to majordomo@vger.kernel.org More 
> >>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>
> >>
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-31  2:08                 ` Sage Weil
@ 2014-07-31  8:41                   ` Shu, Xinxin
  2014-07-31  8:58                   ` Shu, Xinxin
  1 sibling, 0 replies; 22+ messages in thread
From: Shu, Xinxin @ 2014-07-31  8:41 UTC (permalink / raw)
  To: Sage Weil; +Cc: Mark Nelson, ceph-devel

Hi sage , 

This maybe due to  $(shell) is a feature of GNU make ,   I think there are two solutions:
1)  run the script at configure time rather than at run time.
2)  $(shell (./build_tools/build_detect_version)) will generated util/build_version.cc , the file only contain some version info (git version , compile time) , since we may not care about thess infos , we can remove this line from Makefile.am , generate util/build_version.cc by myself.

Cheers,
xinxin

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com] 
Sent: Thursday, July 31, 2014 10:08 AM
To: Shu, Xinxin
Cc: Mark Nelson; ceph-devel@vger.kernel.org
Subject: RE: First attempt at rocksdb monitor store stress testing

By the way, I'm getting closer to getting wip-rocksdb in a state where it can be merged, but it is failing to build due to this line:

	$(shell (./build_tools/build_detect_version))

in Makefile.am which results in

automake: warnings are treated as errors
warning: Makefile.am:59: shell (./build_tools/build_detect_version: 
non-POSIX variable name
Makefile.am:59: (probably a GNU make extension)
Makefile.am: installing './depcomp'
autoreconf: automake failed with exit status: 1

Any suggestions?  You can see these build results at

	http://ceph.com/gitbuilder.cgi
	http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-amd64-basic/log.cgi?log=92212c722100065922468e4185759be0435877ff

sage


On Thu, 31 Jul 2014, Shu, Xinxin wrote:

> Does your report base on wip-rocksdb-mark branch?
> 
> Cheers,
> xinxin
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, July 29, 2014 12:56 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
> 
> Hi Xinxin,
> 
> Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:
> 
> http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.
> pdf
> 
> Mark
> 
> On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> > Hi mark,
> >
> > I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
> >
> > Rocksdb_log = ""
> >
> > Cheers,
> > xinxin
> >
> > -----Original Message-----
> > From: Mark Nelson [mailto:mark.nelson@inktank.com]
> > Sent: Saturday, July 26, 2014 12:10 AM
> > To: Shu, Xinxin; ceph-devel@vger.kernel.org
> > Subject: Re: First attempt at rocksdb monitor store stress testing
> >
> > Hi Xinxin,
> >
> > I'm trying to enable the rocksdb log file as described in config_opts using:
> >
> > rocksdb_log = <path to log file>
> >
> > The file gets created but is empty.  Any ideas?
> >
> > Mark
> >
> > On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
> >> Hi mark,
> >>
> >> I am looking forward to your results on SSDs .
> >> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
> >>
> >> Btw , can you list your rocksdb configuration?
> >>
> >> Cheers,
> >> xinxin
> >>
> >> -----Original Message-----
> >> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> >> Sent: Friday, July 25, 2014 7:45 AM
> >> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> >> Subject: Re: First attempt at rocksdb monitor store stress testing
> >>
> >> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
> >>
> >> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
> >>
> >> Mark
> >>
> >> On 07/24/2014 06:13 AM, Mark Nelson wrote:
> >>> Hi Xinxin,
> >>>
> >>> Thanks! I wonder as well if it might be interesting to expose the 
> >>> options related to universal compaction?  It looks like rocksdb 
> >>> provides a lot of interesting knobs you can adjust!
> >>>
> >>> Mark
> >>>
> >>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
> >>>> Hi mark,
> >>>>
> >>>> I think this maybe related to 'verify_checksums' config option 
> >>>> ,when ReadOptions is initialized, default this option is  true , 
> >>>> all data read from underlying storage will be verified against 
> >>>> corresponding checksums,  however,  this option cannot be 
> >>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
> >>>>
> >>>> Cheers,
> >>>> xinxin
> >>>>
> >>>> -----Original Message-----
> >>>> From: ceph-devel-owner@vger.kernel.org 
> >>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark 
> >>>> Nelson
> >>>> Sent: Thursday, July 24, 2014 7:14 AM
> >>>> To: ceph-devel@vger.kernel.org
> >>>> Subject: First attempt at rocksdb monitor store stress testing
> >>>>
> >>>> Hi Guys,
> >>>>
> >>>> So I've been interested lately in leveldb 99th percentile latency 
> >>>> (and the amount of write amplification we are seeing) with leveldb.
> >>>> Joao mentioned he has written a tool called mon-store-stress in 
> >>>> wip-leveldb-misc to try to provide a means to roughly guess at 
> >>>> what's happening on the mons under heavy load.  I cherry-picked 
> >>>> it over to wip-rocksdb and after a couple of hacks was able to 
> >>>> get everything built and running with some basic tests.  There 
> >>>> was little tuning done and I don't know how realistic this 
> >>>> workload is, so don't assume this means anything yet, but some initial results are here:
> >>>>
> >>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
> >>>>
> >>>> Command that was used to run the tests:
> >>>>
> >>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
> >>>> --write-min-size 50K --write-max-size 2M --percent-write 70 
> >>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at
> >>>> 5000 foo
> >>>>
> >>>> The most interesting bit right now is that rocksdb seems to be 
> >>>> hanging in the middle of the test (left it running for several 
> >>>> hours).  CPU usage on one core was quite high during the hang.
> >>>> Profiling using perf with dwarf symbols I see:
> >>>>
> >>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] 
> >>>> unsigned int 
> >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigne
> >>>> d
> >>>> int, char const*, unsigned long)
> >>>>        - unsigned int
> >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigne
> >>>> d
> >>>> int, char const*, unsigned long)
> >>>>             51.70%
> >>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
> >>>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
> >>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, 
> >>>> rocksdb::Env*,
> >>>> bool)
> >>>>             48.30%
> >>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
> >>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
> >>>>
> >>>> Not sure what's going on yet, may need to try to enable 
> >>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
> >>>>
> >>>> Mark
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>>> in the body of a message to majordomo@vger.kernel.org More 
> >>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>
> >>
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-31  2:08                 ` Sage Weil
  2014-07-31  8:41                   ` Shu, Xinxin
@ 2014-07-31  8:58                   ` Shu, Xinxin
  2014-07-31 12:47                     ` Mark Nelson
  2014-08-01 22:30                     ` Sage Weil
  1 sibling, 2 replies; 22+ messages in thread
From: Shu, Xinxin @ 2014-07-31  8:58 UTC (permalink / raw)
  To: Sage Weil; +Cc: Mark Nelson, ceph-devel

Hi sage , 
 
I create a pull request https://github.com/ceph/rocksdb/pull/3 , please help review.

Cheers,
xinxin

-----Original Message-----
From: Shu, Xinxin 
Sent: Thursday, July 31, 2014 4:42 PM
To: 'Sage Weil'
Cc: Mark Nelson; ceph-devel@vger.kernel.org
Subject: RE: First attempt at rocksdb monitor store stress testing

Hi sage , 

This maybe due to  $(shell) is a feature of GNU make ,   I think there are two solutions:
1)  run the script at configure time rather than at run time.
2)  $(shell (./build_tools/build_detect_version)) will generated util/build_version.cc , the file only contain some version info (git version , compile time) , since we may not care about thess infos , we can remove this line from Makefile.am , generate util/build_version.cc by myself.

Cheers,
xinxin

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com]
Sent: Thursday, July 31, 2014 10:08 AM
To: Shu, Xinxin
Cc: Mark Nelson; ceph-devel@vger.kernel.org
Subject: RE: First attempt at rocksdb monitor store stress testing

By the way, I'm getting closer to getting wip-rocksdb in a state where it can be merged, but it is failing to build due to this line:

	$(shell (./build_tools/build_detect_version))

in Makefile.am which results in

automake: warnings are treated as errors
warning: Makefile.am:59: shell (./build_tools/build_detect_version: 
non-POSIX variable name
Makefile.am:59: (probably a GNU make extension)
Makefile.am: installing './depcomp'
autoreconf: automake failed with exit status: 1

Any suggestions?  You can see these build results at

	http://ceph.com/gitbuilder.cgi
	http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-amd64-basic/log.cgi?log=92212c722100065922468e4185759be0435877ff

sage


On Thu, 31 Jul 2014, Shu, Xinxin wrote:

> Does your report base on wip-rocksdb-mark branch?
> 
> Cheers,
> xinxin
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, July 29, 2014 12:56 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
> 
> Hi Xinxin,
> 
> Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:
> 
> http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.
> pdf
> 
> Mark
> 
> On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> > Hi mark,
> >
> > I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
> >
> > Rocksdb_log = ""
> >
> > Cheers,
> > xinxin
> >
> > -----Original Message-----
> > From: Mark Nelson [mailto:mark.nelson@inktank.com]
> > Sent: Saturday, July 26, 2014 12:10 AM
> > To: Shu, Xinxin; ceph-devel@vger.kernel.org
> > Subject: Re: First attempt at rocksdb monitor store stress testing
> >
> > Hi Xinxin,
> >
> > I'm trying to enable the rocksdb log file as described in config_opts using:
> >
> > rocksdb_log = <path to log file>
> >
> > The file gets created but is empty.  Any ideas?
> >
> > Mark
> >
> > On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
> >> Hi mark,
> >>
> >> I am looking forward to your results on SSDs .
> >> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
> >>
> >> Btw , can you list your rocksdb configuration?
> >>
> >> Cheers,
> >> xinxin
> >>
> >> -----Original Message-----
> >> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> >> Sent: Friday, July 25, 2014 7:45 AM
> >> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> >> Subject: Re: First attempt at rocksdb monitor store stress testing
> >>
> >> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
> >>
> >> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
> >>
> >> Mark
> >>
> >> On 07/24/2014 06:13 AM, Mark Nelson wrote:
> >>> Hi Xinxin,
> >>>
> >>> Thanks! I wonder as well if it might be interesting to expose the 
> >>> options related to universal compaction?  It looks like rocksdb 
> >>> provides a lot of interesting knobs you can adjust!
> >>>
> >>> Mark
> >>>
> >>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
> >>>> Hi mark,
> >>>>
> >>>> I think this maybe related to 'verify_checksums' config option 
> >>>> ,when ReadOptions is initialized, default this option is  true , 
> >>>> all data read from underlying storage will be verified against 
> >>>> corresponding checksums,  however,  this option cannot be 
> >>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
> >>>>
> >>>> Cheers,
> >>>> xinxin
> >>>>
> >>>> -----Original Message-----
> >>>> From: ceph-devel-owner@vger.kernel.org 
> >>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark 
> >>>> Nelson
> >>>> Sent: Thursday, July 24, 2014 7:14 AM
> >>>> To: ceph-devel@vger.kernel.org
> >>>> Subject: First attempt at rocksdb monitor store stress testing
> >>>>
> >>>> Hi Guys,
> >>>>
> >>>> So I've been interested lately in leveldb 99th percentile latency 
> >>>> (and the amount of write amplification we are seeing) with leveldb.
> >>>> Joao mentioned he has written a tool called mon-store-stress in 
> >>>> wip-leveldb-misc to try to provide a means to roughly guess at 
> >>>> what's happening on the mons under heavy load.  I cherry-picked 
> >>>> it over to wip-rocksdb and after a couple of hacks was able to 
> >>>> get everything built and running with some basic tests.  There 
> >>>> was little tuning done and I don't know how realistic this 
> >>>> workload is, so don't assume this means anything yet, but some initial results are here:
> >>>>
> >>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
> >>>>
> >>>> Command that was used to run the tests:
> >>>>
> >>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
> >>>> --write-min-size 50K --write-max-size 2M --percent-write 70 
> >>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at
> >>>> 5000 foo
> >>>>
> >>>> The most interesting bit right now is that rocksdb seems to be 
> >>>> hanging in the middle of the test (left it running for several 
> >>>> hours).  CPU usage on one core was quite high during the hang.
> >>>> Profiling using perf with dwarf symbols I see:
> >>>>
> >>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] 
> >>>> unsigned int 
> >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigne
> >>>> d
> >>>> int, char const*, unsigned long)
> >>>>        - unsigned int
> >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigne
> >>>> d
> >>>> int, char const*, unsigned long)
> >>>>             51.70%
> >>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
> >>>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
> >>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, 
> >>>> rocksdb::Env*,
> >>>> bool)
> >>>>             48.30%
> >>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
> >>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
> >>>>
> >>>> Not sure what's going on yet, may need to try to enable 
> >>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
> >>>>
> >>>> Mark
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>>> in the body of a message to majordomo@vger.kernel.org More 
> >>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>
> >>
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-31  1:46               ` Shu, Xinxin
@ 2014-07-31 12:30                 ` Mark Nelson
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Nelson @ 2014-07-31 12:30 UTC (permalink / raw)
  To: Shu, Xinxin, ceph-devel

On 07/30/2014 08:46 PM, Shu, Xinxin wrote:
> Hi mark,
> Which way do you used to set a higher limitation?  use 'ulimit' command or enlarge rocksdb_max_open_files config option?

ulimit command, though you might be able to limit the max open files in 
rocksdb to be smaller than the default too.  Setting a higher ulimit is 
what we already do for ceph processes so I just mimicked our existing 
solution.

>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Thursday, July 31, 2014 1:35 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Hi Xinxin,
>
> Yes, that did work.  I was able to observe the log and figure out the
> stall:  Too many files open in the level0->level1 compaction thread.
> Similar to the issue that we've seen the past with leveldb.  Setting a higher ulimit fixed the problem.  With leveled compaction on spinning disks I do see latency spikes but at first glance they do not appear to be as bad as with leveldb.  I will now run some longer tests.
>
> Mark
>
> On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
>>
>> Rocksdb_log = ""
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Saturday, July 26, 2014 12:10 AM
>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>
>> Hi Xinxin,
>>
>> I'm trying to enable the rocksdb log file as described in config_opts using:
>>
>> rocksdb_log = <path to log file>
>>
>> The file gets created but is empty.  Any ideas?
>>
>> Mark
>>
>> On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
>>> Hi mark,
>>>
>>> I am looking forward to your results on SSDs .
>>> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>>>
>>> Btw , can you list your rocksdb configuration?
>>>
>>> Cheers,
>>> xinxin
>>>
>>> -----Original Message-----
>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>> Sent: Friday, July 25, 2014 7:45 AM
>>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>>
>>> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>>>
>>> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>>>
>>> Mark
>>>
>>> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>>>> Hi Xinxin,
>>>>
>>>> Thanks! I wonder as well if it might be interesting to expose the
>>>> options related to universal compaction?  It looks like rocksdb
>>>> provides a lot of interesting knobs you can adjust!
>>>>
>>>> Mark
>>>>
>>>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>>>> Hi mark,
>>>>>
>>>>> I think this maybe related to 'verify_checksums' config option
>>>>> ,when ReadOptions is initialized, default this option is  true ,
>>>>> all data read from underlying storage will be verified against
>>>>> corresponding checksums,  however,  this option cannot be
>>>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
>>>>>
>>>>> Cheers,
>>>>> xinxin
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>>> Sent: Thursday, July 24, 2014 7:14 AM
>>>>> To: ceph-devel@vger.kernel.org
>>>>> Subject: First attempt at rocksdb monitor store stress testing
>>>>>
>>>>> Hi Guys,
>>>>>
>>>>> So I've been interested lately in leveldb 99th percentile latency
>>>>> (and the amount of write amplification we are seeing) with leveldb.
>>>>> Joao mentioned he has written a tool called mon-store-stress in
>>>>> wip-leveldb-misc to try to provide a means to roughly guess at
>>>>> what's happening on the mons under heavy load.  I cherry-picked it
>>>>> over to wip-rocksdb and after a couple of hacks was able to get
>>>>> everything built and running with some basic tests.  There was
>>>>> little tuning done and I don't know how realistic this workload is,
>>>>> so don't assume this means anything yet, but some initial results are here:
>>>>>
>>>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>>>
>>>>> Command that was used to run the tests:
>>>>>
>>>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb>
>>>>> --write-min-size 50K --write-max-size 2M --percent-write 70
>>>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at
>>>>> 5000 foo
>>>>>
>>>>> The most interesting bit right now is that rocksdb seems to be
>>>>> hanging in the middle of the test (left it running for several
>>>>> hours).  CPU usage on one core was quite high during the hang.
>>>>> Profiling using perf with dwarf symbols I see:
>>>>>
>>>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.]
>>>>> unsigned int
>>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>>> int, char const*, unsigned long)
>>>>>         - unsigned int
>>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>>> int, char const*, unsigned long)
>>>>>              51.70%
>>>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>>>> rocksdb::Footer const&, rocksdb::ReadOptions const&,
>>>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*,
>>>>> rocksdb::Env*,
>>>>> bool)
>>>>>              48.30%
>>>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
>>>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>>>
>>>>> Not sure what's going on yet, may need to try to enable
>>>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>>>
>>>>> Mark
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-31  1:59               ` Shu, Xinxin
@ 2014-07-31 12:41                 ` Mark Nelson
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Nelson @ 2014-07-31 12:41 UTC (permalink / raw)
  To: Shu, Xinxin, ceph-devel

Hi Xinxin,

On the first page, the first three tables are for latency, and the 4th 
is for total times.  It's all the same workload, 30% read / 70% write 
with a random distribution of object sizes, but there are 6 different tests:

leveldb on spinning disk
leveldb on ssd
rocksdb with leveled compaction on spinning disk
rocksdb with leveled compaction on ssd
rocksdb with universal compaction on spinning disk
rocksdb with universal compaction on ssd

I'm using the same workload Joao did when he wrote the test tool.  The 
graphs on the other pages are the latencies over time for each test 
case.  I'm not sure how representative the workload actually is, but at 
least as it is it might start to give us some ideas.  It would be really 
great if we could actually record workload at the object level (rather 
than using something like blktrace to record it at the block level).

Mark

On 07/30/2014 08:59 PM, Shu, Xinxin wrote:
> Hi mark ,
>
> There are four tables in your report , do you run four tests cases  or only run a single case (read/write mix) , if you run a single mix case , but latency in the fourth table is not same with other three tables .
>
> In leveldb and rocksdb, level0 and level1 should be same size?
>
> to my knowledge , in rocksdb , by default ,level1 size is ten times of level0 size , but the same size sstable file.
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, July 29, 2014 12:56 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Hi Xinxin,
>
> Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:
>
> http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.pdf
>
> Mark
>
> On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
>>
>> Rocksdb_log = ""
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Saturday, July 26, 2014 12:10 AM
>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>
>> Hi Xinxin,
>>
>> I'm trying to enable the rocksdb log file as described in config_opts using:
>>
>> rocksdb_log = <path to log file>
>>
>> The file gets created but is empty.  Any ideas?
>>
>> Mark
>>
>> On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
>>> Hi mark,
>>>
>>> I am looking forward to your results on SSDs .
>>> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>>>
>>> Btw , can you list your rocksdb configuration?
>>>
>>> Cheers,
>>> xinxin
>>>
>>> -----Original Message-----
>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>> Sent: Friday, July 25, 2014 7:45 AM
>>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>>
>>> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>>>
>>> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>>>
>>> Mark
>>>
>>> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>>>> Hi Xinxin,
>>>>
>>>> Thanks! I wonder as well if it might be interesting to expose the
>>>> options related to universal compaction?  It looks like rocksdb
>>>> provides a lot of interesting knobs you can adjust!
>>>>
>>>> Mark
>>>>
>>>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>>>> Hi mark,
>>>>>
>>>>> I think this maybe related to 'verify_checksums' config option
>>>>> ,when ReadOptions is initialized, default this option is  true ,
>>>>> all data read from underlying storage will be verified against
>>>>> corresponding checksums,  however,  this option cannot be
>>>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
>>>>>
>>>>> Cheers,
>>>>> xinxin
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>>> Sent: Thursday, July 24, 2014 7:14 AM
>>>>> To: ceph-devel@vger.kernel.org
>>>>> Subject: First attempt at rocksdb monitor store stress testing
>>>>>
>>>>> Hi Guys,
>>>>>
>>>>> So I've been interested lately in leveldb 99th percentile latency
>>>>> (and the amount of write amplification we are seeing) with leveldb.
>>>>> Joao mentioned he has written a tool called mon-store-stress in
>>>>> wip-leveldb-misc to try to provide a means to roughly guess at
>>>>> what's happening on the mons under heavy load.  I cherry-picked it
>>>>> over to wip-rocksdb and after a couple of hacks was able to get
>>>>> everything built and running with some basic tests.  There was
>>>>> little tuning done and I don't know how realistic this workload is,
>>>>> so don't assume this means anything yet, but some initial results are here:
>>>>>
>>>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>>>
>>>>> Command that was used to run the tests:
>>>>>
>>>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb>
>>>>> --write-min-size 50K --write-max-size 2M --percent-write 70
>>>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at
>>>>> 5000 foo
>>>>>
>>>>> The most interesting bit right now is that rocksdb seems to be
>>>>> hanging in the middle of the test (left it running for several
>>>>> hours).  CPU usage on one core was quite high during the hang.
>>>>> Profiling using perf with dwarf symbols I see:
>>>>>
>>>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.]
>>>>> unsigned int
>>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>>> int, char const*, unsigned long)
>>>>>         - unsigned int
>>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>>> int, char const*, unsigned long)
>>>>>              51.70%
>>>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>>>> rocksdb::Footer const&, rocksdb::ReadOptions const&,
>>>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*,
>>>>> rocksdb::Env*,
>>>>> bool)
>>>>>              48.30%
>>>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
>>>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>>>
>>>>> Not sure what's going on yet, may need to try to enable
>>>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>>>
>>>>> Mark
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-31  8:58                   ` Shu, Xinxin
@ 2014-07-31 12:47                     ` Mark Nelson
  2014-08-01 22:30                     ` Sage Weil
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Nelson @ 2014-07-31 12:47 UTC (permalink / raw)
  To: Shu, Xinxin, Sage Weil; +Cc: ceph-devel

FWIW this was the problem I ran into and mentioned in #ceph-devel the 
other day.  The way I solved it was to add -Wno-portability to the 
configure.ac file in the rocksdb distribution.  Perhaps this is a better 
solution though...

Mark

On 07/31/2014 03:58 AM, Shu, Xinxin wrote:
> Hi sage ,
>
> I create a pull request https://github.com/ceph/rocksdb/pull/3 , please help review.
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Shu, Xinxin
> Sent: Thursday, July 31, 2014 4:42 PM
> To: 'Sage Weil'
> Cc: Mark Nelson; ceph-devel@vger.kernel.org
> Subject: RE: First attempt at rocksdb monitor store stress testing
>
> Hi sage ,
>
> This maybe due to  $(shell) is a feature of GNU make ,   I think there are two solutions:
> 1)  run the script at configure time rather than at run time.
> 2)  $(shell (./build_tools/build_detect_version)) will generated util/build_version.cc , the file only contain some version info (git version , compile time) , since we may not care about thess infos , we can remove this line from Makefile.am , generate util/build_version.cc by myself.
>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Thursday, July 31, 2014 10:08 AM
> To: Shu, Xinxin
> Cc: Mark Nelson; ceph-devel@vger.kernel.org
> Subject: RE: First attempt at rocksdb monitor store stress testing
>
> By the way, I'm getting closer to getting wip-rocksdb in a state where it can be merged, but it is failing to build due to this line:
>
> 	$(shell (./build_tools/build_detect_version))
>
> in Makefile.am which results in
>
> automake: warnings are treated as errors
> warning: Makefile.am:59: shell (./build_tools/build_detect_version:
> non-POSIX variable name
> Makefile.am:59: (probably a GNU make extension)
> Makefile.am: installing './depcomp'
> autoreconf: automake failed with exit status: 1
>
> Any suggestions?  You can see these build results at
>
> 	http://ceph.com/gitbuilder.cgi
> 	http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-amd64-basic/log.cgi?log=92212c722100065922468e4185759be0435877ff
>
> sage
>
>
> On Thu, 31 Jul 2014, Shu, Xinxin wrote:
>
>> Does your report base on wip-rocksdb-mark branch?
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>> Sent: Tuesday, July 29, 2014 12:56 AM
>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>
>> Hi Xinxin,
>>
>> Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:
>>
>> http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.
>> pdf
>>
>> Mark
>>
>> On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
>>> Hi mark,
>>>
>>> I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
>>>
>>> Rocksdb_log = ""
>>>
>>> Cheers,
>>> xinxin
>>>
>>> -----Original Message-----
>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>> Sent: Saturday, July 26, 2014 12:10 AM
>>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>>
>>> Hi Xinxin,
>>>
>>> I'm trying to enable the rocksdb log file as described in config_opts using:
>>>
>>> rocksdb_log = <path to log file>
>>>
>>> The file gets created but is empty.  Any ideas?
>>>
>>> Mark
>>>
>>> On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
>>>> Hi mark,
>>>>
>>>> I am looking forward to your results on SSDs .
>>>> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>>>>
>>>> Btw , can you list your rocksdb configuration?
>>>>
>>>> Cheers,
>>>> xinxin
>>>>
>>>> -----Original Message-----
>>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>>> Sent: Friday, July 25, 2014 7:45 AM
>>>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>>>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>>>
>>>> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>>>>
>>>> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>>>>
>>>> Mark
>>>>
>>>> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>>>>> Hi Xinxin,
>>>>>
>>>>> Thanks! I wonder as well if it might be interesting to expose the
>>>>> options related to universal compaction?  It looks like rocksdb
>>>>> provides a lot of interesting knobs you can adjust!
>>>>>
>>>>> Mark
>>>>>
>>>>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>>>>> Hi mark,
>>>>>>
>>>>>> I think this maybe related to 'verify_checksums' config option
>>>>>> ,when ReadOptions is initialized, default this option is  true ,
>>>>>> all data read from underlying storage will be verified against
>>>>>> corresponding checksums,  however,  this option cannot be
>>>>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
>>>>>>
>>>>>> Cheers,
>>>>>> xinxin
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: ceph-devel-owner@vger.kernel.org
>>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark
>>>>>> Nelson
>>>>>> Sent: Thursday, July 24, 2014 7:14 AM
>>>>>> To: ceph-devel@vger.kernel.org
>>>>>> Subject: First attempt at rocksdb monitor store stress testing
>>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>> So I've been interested lately in leveldb 99th percentile latency
>>>>>> (and the amount of write amplification we are seeing) with leveldb.
>>>>>> Joao mentioned he has written a tool called mon-store-stress in
>>>>>> wip-leveldb-misc to try to provide a means to roughly guess at
>>>>>> what's happening on the mons under heavy load.  I cherry-picked
>>>>>> it over to wip-rocksdb and after a couple of hacks was able to
>>>>>> get everything built and running with some basic tests.  There
>>>>>> was little tuning done and I don't know how realistic this
>>>>>> workload is, so don't assume this means anything yet, but some initial results are here:
>>>>>>
>>>>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>>>>
>>>>>> Command that was used to run the tests:
>>>>>>
>>>>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb>
>>>>>> --write-min-size 50K --write-max-size 2M --percent-write 70
>>>>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at
>>>>>> 5000 foo
>>>>>>
>>>>>> The most interesting bit right now is that rocksdb seems to be
>>>>>> hanging in the middle of the test (left it running for several
>>>>>> hours).  CPU usage on one core was quite high during the hang.
>>>>>> Profiling using perf with dwarf symbols I see:
>>>>>>
>>>>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.]
>>>>>> unsigned int
>>>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigne
>>>>>> d
>>>>>> int, char const*, unsigned long)
>>>>>>         - unsigned int
>>>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigne
>>>>>> d
>>>>>> int, char const*, unsigned long)
>>>>>>              51.70%
>>>>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>>>>> rocksdb::Footer const&, rocksdb::ReadOptions const&,
>>>>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*,
>>>>>> rocksdb::Env*,
>>>>>> bool)
>>>>>>              48.30%
>>>>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
>>>>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>>>>
>>>>>> Not sure what's going on yet, may need to try to enable
>>>>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>>>>
>>>>>> Mark
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: First attempt at rocksdb monitor store stress testing
  2014-07-31  2:00               ` Shu, Xinxin
  2014-07-31  2:08                 ` Sage Weil
@ 2014-08-01 17:41                 ` Mark Nelson
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Nelson @ 2014-08-01 17:41 UTC (permalink / raw)
  To: Shu, Xinxin, ceph-devel

On 07/30/2014 09:00 PM, Shu, Xinxin wrote:
> Does your report base on wip-rocksdb-mark branch?

Yes, though I've been tweaking Joao's test tool a bit.

I ran more tests with the higher ulimit for rocksdb, and also did 10,000 
objects instead of 5,000.  There are some interesting effects.  Leveldb 
appears to be faster for reads, but compaction causes horrible 
predictable stalls.  rocksdb with leveled compaction appears to reduce 
and/or eliminate the length and duration of the compaction stalls. 
Surprisingly the spinning disk test had no high outliers while the SSD 
test had a few.  Universal Compaction seems to fair poorly with more 
objects. The maximum latencies are still bound, but the averages are 
significantly worse.

http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Medium_Tests.pdf

Next I will need to examine the test parameters and run much longer 
tests.  Switch to GNUPlot as well.  Open office can't handle this much 
data. :)

Mark

>
> Cheers,
> xinxin
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, July 29, 2014 12:56 AM
> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> Subject: Re: First attempt at rocksdb monitor store stress testing
>
> Hi Xinxin,
>
> Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:
>
> http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.pdf
>
> Mark
>
> On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
>> Hi mark,
>>
>> I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
>>
>> Rocksdb_log = ""
>>
>> Cheers,
>> xinxin
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Saturday, July 26, 2014 12:10 AM
>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>
>> Hi Xinxin,
>>
>> I'm trying to enable the rocksdb log file as described in config_opts using:
>>
>> rocksdb_log = <path to log file>
>>
>> The file gets created but is empty.  Any ideas?
>>
>> Mark
>>
>> On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
>>> Hi mark,
>>>
>>> I am looking forward to your results on SSDs .
>>> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
>>>
>>> Btw , can you list your rocksdb configuration?
>>>
>>> Cheers,
>>> xinxin
>>>
>>> -----Original Message-----
>>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>>> Sent: Friday, July 25, 2014 7:45 AM
>>> To: Shu, Xinxin; ceph-devel@vger.kernel.org
>>> Subject: Re: First attempt at rocksdb monitor store stress testing
>>>
>>> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
>>>
>>> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
>>>
>>> Mark
>>>
>>> On 07/24/2014 06:13 AM, Mark Nelson wrote:
>>>> Hi Xinxin,
>>>>
>>>> Thanks! I wonder as well if it might be interesting to expose the
>>>> options related to universal compaction?  It looks like rocksdb
>>>> provides a lot of interesting knobs you can adjust!
>>>>
>>>> Mark
>>>>
>>>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
>>>>> Hi mark,
>>>>>
>>>>> I think this maybe related to 'verify_checksums' config option
>>>>> ,when ReadOptions is initialized, default this option is  true ,
>>>>> all data read from underlying storage will be verified against
>>>>> corresponding checksums,  however,  this option cannot be
>>>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
>>>>>
>>>>> Cheers,
>>>>> xinxin
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>>>>> Sent: Thursday, July 24, 2014 7:14 AM
>>>>> To: ceph-devel@vger.kernel.org
>>>>> Subject: First attempt at rocksdb monitor store stress testing
>>>>>
>>>>> Hi Guys,
>>>>>
>>>>> So I've been interested lately in leveldb 99th percentile latency
>>>>> (and the amount of write amplification we are seeing) with leveldb.
>>>>> Joao mentioned he has written a tool called mon-store-stress in
>>>>> wip-leveldb-misc to try to provide a means to roughly guess at
>>>>> what's happening on the mons under heavy load.  I cherry-picked it
>>>>> over to wip-rocksdb and after a couple of hacks was able to get
>>>>> everything built and running with some basic tests.  There was
>>>>> little tuning done and I don't know how realistic this workload is,
>>>>> so don't assume this means anything yet, but some initial results are here:
>>>>>
>>>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
>>>>>
>>>>> Command that was used to run the tests:
>>>>>
>>>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb>
>>>>> --write-min-size 50K --write-max-size 2M --percent-write 70
>>>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at
>>>>> 5000 foo
>>>>>
>>>>> The most interesting bit right now is that rocksdb seems to be
>>>>> hanging in the middle of the test (left it running for several
>>>>> hours).  CPU usage on one core was quite high during the hang.
>>>>> Profiling using perf with dwarf symbols I see:
>>>>>
>>>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.]
>>>>> unsigned int
>>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>>> int, char const*, unsigned long)
>>>>>         - unsigned int
>>>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigned
>>>>> int, char const*, unsigned long)
>>>>>              51.70%
>>>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
>>>>> rocksdb::Footer const&, rocksdb::ReadOptions const&,
>>>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*,
>>>>> rocksdb::Env*,
>>>>> bool)
>>>>>              48.30%
>>>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
>>>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
>>>>>
>>>>> Not sure what's going on yet, may need to try to enable
>>>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
>>>>>
>>>>> Mark
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-07-31  8:58                   ` Shu, Xinxin
  2014-07-31 12:47                     ` Mark Nelson
@ 2014-08-01 22:30                     ` Sage Weil
  2014-08-05  5:19                       ` Shu, Xinxin
  1 sibling, 1 reply; 22+ messages in thread
From: Sage Weil @ 2014-08-01 22:30 UTC (permalink / raw)
  To: Shu, Xinxin; +Cc: Mark Nelson, ceph-devel

Hi xinxin,

It's merged!  We've hit one other snag, though.. rocksdb is failing to 
build in i386.  See

	http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-i386-basic/log.cgi?log=52c2182fe833e8a0206787ecd878bd010cc2e529

Do you mind taking a look?  Probably the int type in the parent class 
doesn't match.

Thanks!
sage


On Thu, 31 Jul 2014, Shu, Xinxin wrote:

> Hi sage , 
>  
> I create a pull request https://github.com/ceph/rocksdb/pull/3 , please help review.
> 
> Cheers,
> xinxin
> 
> -----Original Message-----
> From: Shu, Xinxin 
> Sent: Thursday, July 31, 2014 4:42 PM
> To: 'Sage Weil'
> Cc: Mark Nelson; ceph-devel@vger.kernel.org
> Subject: RE: First attempt at rocksdb monitor store stress testing
> 
> Hi sage , 
> 
> This maybe due to  $(shell) is a feature of GNU make ,   I think there are two solutions:
> 1)  run the script at configure time rather than at run time.
> 2)  $(shell (./build_tools/build_detect_version)) will generated util/build_version.cc , the file only contain some version info (git version , compile time) , since we may not care about thess infos , we can remove this line from Makefile.am , generate util/build_version.cc by myself.
> 
> Cheers,
> xinxin
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Thursday, July 31, 2014 10:08 AM
> To: Shu, Xinxin
> Cc: Mark Nelson; ceph-devel@vger.kernel.org
> Subject: RE: First attempt at rocksdb monitor store stress testing
> 
> By the way, I'm getting closer to getting wip-rocksdb in a state where it can be merged, but it is failing to build due to this line:
> 
> 	$(shell (./build_tools/build_detect_version))
> 
> in Makefile.am which results in
> 
> automake: warnings are treated as errors
> warning: Makefile.am:59: shell (./build_tools/build_detect_version: 
> non-POSIX variable name
> Makefile.am:59: (probably a GNU make extension)
> Makefile.am: installing './depcomp'
> autoreconf: automake failed with exit status: 1
> 
> Any suggestions?  You can see these build results at
> 
> 	http://ceph.com/gitbuilder.cgi
> 	http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-amd64-basic/log.cgi?log=92212c722100065922468e4185759be0435877ff
> 
> sage
> 
> 
> On Thu, 31 Jul 2014, Shu, Xinxin wrote:
> 
> > Does your report base on wip-rocksdb-mark branch?
> > 
> > Cheers,
> > xinxin
> > 
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org 
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> > Sent: Tuesday, July 29, 2014 12:56 AM
> > To: Shu, Xinxin; ceph-devel@vger.kernel.org
> > Subject: Re: First attempt at rocksdb monitor store stress testing
> > 
> > Hi Xinxin,
> > 
> > Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:
> > 
> > http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.
> > pdf
> > 
> > Mark
> > 
> > On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> > > Hi mark,
> > >
> > > I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
> > >
> > > Rocksdb_log = ""
> > >
> > > Cheers,
> > > xinxin
> > >
> > > -----Original Message-----
> > > From: Mark Nelson [mailto:mark.nelson@inktank.com]
> > > Sent: Saturday, July 26, 2014 12:10 AM
> > > To: Shu, Xinxin; ceph-devel@vger.kernel.org
> > > Subject: Re: First attempt at rocksdb monitor store stress testing
> > >
> > > Hi Xinxin,
> > >
> > > I'm trying to enable the rocksdb log file as described in config_opts using:
> > >
> > > rocksdb_log = <path to log file>
> > >
> > > The file gets created but is empty.  Any ideas?
> > >
> > > Mark
> > >
> > > On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
> > >> Hi mark,
> > >>
> > >> I am looking forward to your results on SSDs .
> > >> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
> > >>
> > >> Btw , can you list your rocksdb configuration?
> > >>
> > >> Cheers,
> > >> xinxin
> > >>
> > >> -----Original Message-----
> > >> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> > >> Sent: Friday, July 25, 2014 7:45 AM
> > >> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> > >> Subject: Re: First attempt at rocksdb monitor store stress testing
> > >>
> > >> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
> > >>
> > >> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
> > >>
> > >> Mark
> > >>
> > >> On 07/24/2014 06:13 AM, Mark Nelson wrote:
> > >>> Hi Xinxin,
> > >>>
> > >>> Thanks! I wonder as well if it might be interesting to expose the 
> > >>> options related to universal compaction?  It looks like rocksdb 
> > >>> provides a lot of interesting knobs you can adjust!
> > >>>
> > >>> Mark
> > >>>
> > >>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
> > >>>> Hi mark,
> > >>>>
> > >>>> I think this maybe related to 'verify_checksums' config option 
> > >>>> ,when ReadOptions is initialized, default this option is  true , 
> > >>>> all data read from underlying storage will be verified against 
> > >>>> corresponding checksums,  however,  this option cannot be 
> > >>>> configured in wip-rocksdb branch. I will modify code to make this option configurable .
> > >>>>
> > >>>> Cheers,
> > >>>> xinxin
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: ceph-devel-owner@vger.kernel.org 
> > >>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark 
> > >>>> Nelson
> > >>>> Sent: Thursday, July 24, 2014 7:14 AM
> > >>>> To: ceph-devel@vger.kernel.org
> > >>>> Subject: First attempt at rocksdb monitor store stress testing
> > >>>>
> > >>>> Hi Guys,
> > >>>>
> > >>>> So I've been interested lately in leveldb 99th percentile latency 
> > >>>> (and the amount of write amplification we are seeing) with leveldb.
> > >>>> Joao mentioned he has written a tool called mon-store-stress in 
> > >>>> wip-leveldb-misc to try to provide a means to roughly guess at 
> > >>>> what's happening on the mons under heavy load.  I cherry-picked 
> > >>>> it over to wip-rocksdb and after a couple of hacks was able to 
> > >>>> get everything built and running with some basic tests.  There 
> > >>>> was little tuning done and I don't know how realistic this 
> > >>>> workload is, so don't assume this means anything yet, but some initial results are here:
> > >>>>
> > >>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
> > >>>>
> > >>>> Command that was used to run the tests:
> > >>>>
> > >>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
> > >>>> --write-min-size 50K --write-max-size 2M --percent-write 70 
> > >>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at
> > >>>> 5000 foo
> > >>>>
> > >>>> The most interesting bit right now is that rocksdb seems to be 
> > >>>> hanging in the middle of the test (left it running for several 
> > >>>> hours).  CPU usage on one core was quite high during the hang.
> > >>>> Profiling using perf with dwarf symbols I see:
> > >>>>
> > >>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] 
> > >>>> unsigned int 
> > >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigne
> > >>>> d
> > >>>> int, char const*, unsigned long)
> > >>>>        - unsigned int
> > >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsigne
> > >>>> d
> > >>>> int, char const*, unsigned long)
> > >>>>             51.70%
> > >>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
> > >>>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
> > >>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, 
> > >>>> rocksdb::Env*,
> > >>>> bool)
> > >>>>             48.30%
> > >>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
> > >>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
> > >>>>
> > >>>> Not sure what's going on yet, may need to try to enable 
> > >>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
> > >>>>
> > >>>> Mark
> > >>>> --
> > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > >>>> in the body of a message to majordomo@vger.kernel.org More 
> > >>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>>>
> > >>>
> > >>
> > >
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: First attempt at rocksdb monitor store stress testing
  2014-08-01 22:30                     ` Sage Weil
@ 2014-08-05  5:19                       ` Shu, Xinxin
  0 siblings, 0 replies; 22+ messages in thread
From: Shu, Xinxin @ 2014-08-05  5:19 UTC (permalink / raw)
  To: Sage Weil; +Cc: Mark Nelson, ceph-devel

Hi sage,

I created a pull request https://github.com/ceph/rocksdb/pull/4 to fix the issue, please help review.

Cheers,
xinxin

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com] 
Sent: Saturday, August 02, 2014 6:30 AM
To: Shu, Xinxin
Cc: Mark Nelson; ceph-devel@vger.kernel.org
Subject: RE: First attempt at rocksdb monitor store stress testing

Hi xinxin,

It's merged!  We've hit one other snag, though.. rocksdb is failing to build in i386.  See

	http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-i386-basic/log.cgi?log=52c2182fe833e8a0206787ecd878bd010cc2e529

Do you mind taking a look?  Probably the int type in the parent class doesn't match.

Thanks!
sage


On Thu, 31 Jul 2014, Shu, Xinxin wrote:

> Hi sage ,
>  
> I create a pull request https://github.com/ceph/rocksdb/pull/3 , please help review.
> 
> Cheers,
> xinxin
> 
> -----Original Message-----
> From: Shu, Xinxin
> Sent: Thursday, July 31, 2014 4:42 PM
> To: 'Sage Weil'
> Cc: Mark Nelson; ceph-devel@vger.kernel.org
> Subject: RE: First attempt at rocksdb monitor store stress testing
> 
> Hi sage ,
> 
> This maybe due to  $(shell) is a feature of GNU make ,   I think there are two solutions:
> 1)  run the script at configure time rather than at run time.
> 2)  $(shell (./build_tools/build_detect_version)) will generated util/build_version.cc , the file only contain some version info (git version , compile time) , since we may not care about thess infos , we can remove this line from Makefile.am , generate util/build_version.cc by myself.
> 
> Cheers,
> xinxin
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Thursday, July 31, 2014 10:08 AM
> To: Shu, Xinxin
> Cc: Mark Nelson; ceph-devel@vger.kernel.org
> Subject: RE: First attempt at rocksdb monitor store stress testing
> 
> By the way, I'm getting closer to getting wip-rocksdb in a state where it can be merged, but it is failing to build due to this line:
> 
> 	$(shell (./build_tools/build_detect_version))
> 
> in Makefile.am which results in
> 
> automake: warnings are treated as errors
> warning: Makefile.am:59: shell (./build_tools/build_detect_version: 
> non-POSIX variable name
> Makefile.am:59: (probably a GNU make extension)
> Makefile.am: installing './depcomp'
> autoreconf: automake failed with exit status: 1
> 
> Any suggestions?  You can see these build results at
> 
> 	http://ceph.com/gitbuilder.cgi
> 	
> http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-amd64-basi
> c/log.cgi?log=92212c722100065922468e4185759be0435877ff
> 
> sage
> 
> 
> On Thu, 31 Jul 2014, Shu, Xinxin wrote:
> 
> > Does your report base on wip-rocksdb-mark branch?
> > 
> > Cheers,
> > xinxin
> > 
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org 
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> > Sent: Tuesday, July 29, 2014 12:56 AM
> > To: Shu, Xinxin; ceph-devel@vger.kernel.org
> > Subject: Re: First attempt at rocksdb monitor store stress testing
> > 
> > Hi Xinxin,
> > 
> > Thanks, I'll give it a try.  I want to figure out what's going on in Rocksdb when the test stalls with leveled compaction.  In the mean time, here are the test results with spinning disks and SSDs:
> > 
> > http://nhm.ceph.com/mon-store-stress/Monitor_Store_Stress_Short_Tests.
> > pdf
> > 
> > Mark
> > 
> > On 07/27/2014 11:45 PM, Shu, Xinxin wrote:
> > > Hi mark,
> > >
> > > I tested this option on my setup , same issue happened , I will dig into it , if you want to get info log , there is a workaround, set this option to none:
> > >
> > > Rocksdb_log = ""
> > >
> > > Cheers,
> > > xinxin
> > >
> > > -----Original Message-----
> > > From: Mark Nelson [mailto:mark.nelson@inktank.com]
> > > Sent: Saturday, July 26, 2014 12:10 AM
> > > To: Shu, Xinxin; ceph-devel@vger.kernel.org
> > > Subject: Re: First attempt at rocksdb monitor store stress testing
> > >
> > > Hi Xinxin,
> > >
> > > I'm trying to enable the rocksdb log file as described in config_opts using:
> > >
> > > rocksdb_log = <path to log file>
> > >
> > > The file gets created but is empty.  Any ideas?
> > >
> > > Mark
> > >
> > > On 07/24/2014 08:28 PM, Shu, Xinxin wrote:
> > >> Hi mark,
> > >>
> > >> I am looking forward to your results on SSDs .
> > >> rocksdb generates a crc of data to be written , this cannot be switch off (but can be subsititued with xxhash),  there are two options , Option. verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we disable these two options , i think cpu usage will goes down . If we use universal compaction , this is not friendly with read operation.
> > >>
> > >> Btw , can you list your rocksdb configuration?
> > >>
> > >> Cheers,
> > >> xinxin
> > >>
> > >> -----Original Message-----
> > >> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> > >> Sent: Friday, July 25, 2014 7:45 AM
> > >> To: Shu, Xinxin; ceph-devel@vger.kernel.org
> > >> Subject: Re: First attempt at rocksdb monitor store stress 
> > >> testing
> > >>
> > >> Earlier today I modified the rocksdb options so I could enable universal compaction.  Over all performance is lower but I don't see the hang/stall in the middle of the test either.  Instead the disk is basically pegged with 100% writes.  I suspect average latency is higher than leveldb, but the highest latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled (heh) compaction.
> > >>
> > >> I haven't done much tuning either way yet.  It may be that if we keep level 0 and level 1 roughly the same size we can reduce stalls in the levelled setups.  It will also be interesting to see what happens in these tests on SSDs.
> > >>
> > >> Mark
> > >>
> > >> On 07/24/2014 06:13 AM, Mark Nelson wrote:
> > >>> Hi Xinxin,
> > >>>
> > >>> Thanks! I wonder as well if it might be interesting to expose 
> > >>> the options related to universal compaction?  It looks like 
> > >>> rocksdb provides a lot of interesting knobs you can adjust!
> > >>>
> > >>> Mark
> > >>>
> > >>> On 07/24/2014 12:08 AM, Shu, Xinxin wrote:
> > >>>> Hi mark,
> > >>>>
> > >>>> I think this maybe related to 'verify_checksums' config option 
> > >>>> ,when ReadOptions is initialized, default this option is  true 
> > >>>> , all data read from underlying storage will be verified 
> > >>>> against corresponding checksums,  however,  this option cannot 
> > >>>> be configured in wip-rocksdb branch. I will modify code to make this option configurable .
> > >>>>
> > >>>> Cheers,
> > >>>> xinxin
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: ceph-devel-owner@vger.kernel.org 
> > >>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark 
> > >>>> Nelson
> > >>>> Sent: Thursday, July 24, 2014 7:14 AM
> > >>>> To: ceph-devel@vger.kernel.org
> > >>>> Subject: First attempt at rocksdb monitor store stress testing
> > >>>>
> > >>>> Hi Guys,
> > >>>>
> > >>>> So I've been interested lately in leveldb 99th percentile 
> > >>>> latency (and the amount of write amplification we are seeing) with leveldb.
> > >>>> Joao mentioned he has written a tool called mon-store-stress in 
> > >>>> wip-leveldb-misc to try to provide a means to roughly guess at 
> > >>>> what's happening on the mons under heavy load.  I cherry-picked 
> > >>>> it over to wip-rocksdb and after a couple of hacks was able to 
> > >>>> get everything built and running with some basic tests.  There 
> > >>>> was little tuning done and I don't know how realistic this 
> > >>>> workload is, so don't assume this means anything yet, but some initial results are here:
> > >>>>
> > >>>> http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf
> > >>>>
> > >>>> Command that was used to run the tests:
> > >>>>
> > >>>> ./ceph-test-mon-store-stress --mon-keyvaluedb <leveldb|rocksdb> 
> > >>>> --write-min-size 50K --write-max-size 2M --percent-write 70 
> > >>>> --percent-read 30 --keep-state --test-seed 1406137270 --stop-at
> > >>>> 5000 foo
> > >>>>
> > >>>> The most interesting bit right now is that rocksdb seems to be 
> > >>>> hanging in the middle of the test (left it running for several 
> > >>>> hours).  CPU usage on one core was quite high during the hang.
> > >>>> Profiling using perf with dwarf symbols I see:
> > >>>>
> > >>>> -  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] 
> > >>>> unsigned int 
> > >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsig
> > >>>> ne
> > >>>> d
> > >>>> int, char const*, unsigned long)
> > >>>>        - unsigned int
> > >>>> rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>(unsig
> > >>>> ne
> > >>>> d
> > >>>> int, char const*, unsigned long)
> > >>>>             51.70%
> > >>>> rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
> > >>>> rocksdb::Footer const&, rocksdb::ReadOptions const&, 
> > >>>> rocksdb::BlockHandle const&, rocksdb::BlockContents*, 
> > >>>> rocksdb::Env*,
> > >>>> bool)
> > >>>>             48.30%
> > >>>> rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice
> > >>>> const&, rocksdb::CompressionType, rocksdb::BlockHandle*)
> > >>>>
> > >>>> Not sure what's going on yet, may need to try to enable 
> > >>>> logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)
> > >>>>
> > >>>> Mark
> > >>>> --
> > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > >>>> in the body of a message to majordomo@vger.kernel.org More 
> > >>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>>>
> > >>>
> > >>
> > >
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-08-05  5:19 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-23 23:14 First attempt at rocksdb monitor store stress testing Mark Nelson
2014-07-24  5:08 ` Shu, Xinxin
2014-07-24 11:13   ` Mark Nelson
2014-07-24 23:45     ` Mark Nelson
2014-07-25  1:28       ` Shu, Xinxin
2014-07-25 12:08         ` Mark Nelson
2014-07-25 16:09         ` Mark Nelson
2014-07-28  4:45           ` Shu, Xinxin
2014-07-28 16:55             ` Mark Nelson
2014-07-31  1:59               ` Shu, Xinxin
2014-07-31 12:41                 ` Mark Nelson
2014-07-31  2:00               ` Shu, Xinxin
2014-07-31  2:08                 ` Sage Weil
2014-07-31  8:41                   ` Shu, Xinxin
2014-07-31  8:58                   ` Shu, Xinxin
2014-07-31 12:47                     ` Mark Nelson
2014-08-01 22:30                     ` Sage Weil
2014-08-05  5:19                       ` Shu, Xinxin
2014-08-01 17:41                 ` Mark Nelson
2014-07-30 17:34             ` Mark Nelson
2014-07-31  1:46               ` Shu, Xinxin
2014-07-31 12:30                 ` Mark Nelson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.