All of lore.kernel.org
 help / color / mirror / Atom feed
* RocksDB Incorrect API Usage
@ 2016-05-31 16:49 Haomai Wang
  2016-05-31 16:56 ` Mark Nelson
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Haomai Wang @ 2016-05-31 16:49 UTC (permalink / raw)
  To: Sage Weil, Mark Nelson; +Cc: ceph-devel

Hi Sage and Mark,

As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
won't use bloom filter like *Get*.

*Get* impl: it will look at filter firstly
https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369

Iterator *Seek*: it will do binary search, by default we don't specify
prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94

So I use a simple tests:

./db_bench -num 10000000  -benchmarks fillbatch
fill the db firstly with 1000w records.

./db_bench -use_existing_db  -benchmarks readrandomfast
readrandomfast case will use *Get* API to retrive data

[root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
readrandomfast

LevelDB:    version 4.3
Date:       Wed Jun  1 00:29:16 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Writes per second: 0
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-0/dbbench]
readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
1000100 found, issued 46639 non-exist keys)

===========================
then I modify readrandomfast to use Iterator API[0]:

[root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
readrandomfast
LevelDB:    version 4.3
Date:       Wed Jun  1 00:33:03 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Writes per second: 0
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-0/dbbench]
readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
1000100 found, issued 46639 non-exist keys)


45.18 us/op vs 4.57us/op!

The test can be repeated and easy to do! Plz correct if I'm doing
foolish thing I'm not aware..

So I proposal this PR: https://github.com/ceph/ceph/pull/9411

We still can make further improvements by scanning all iterate usage
to make it better!

[0]:
--- a/db/db_bench.cc
+++ b/db/db_bench.cc
@@ -2923,14 +2923,12 @@ class Benchmark {
         int64_t key_rand = thread->rand.Next() & (pot - 1);
         GenerateKeyFromInt(key_rand, FLAGS_num, &key);
         ++read;
-        auto status = db->Get(options, key, &value);
-        if (status.ok()) {
-          ++found;
-        } else if (!status.IsNotFound()) {
-          fprintf(stderr, "Get returned an error: %s\n",
-                  status.ToString().c_str());
-          abort();
-        }
+        Iterator* iter = db->NewIterator(options);
+      iter->Seek(key);
+      if (iter->Valid() && iter->key().compare(key) == 0) {
+        found++;
+      }
+
         if (key_rand >= FLAGS_num) {
           ++nonexist;
         }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RocksDB Incorrect API Usage
  2016-05-31 16:49 RocksDB Incorrect API Usage Haomai Wang
@ 2016-05-31 16:56 ` Mark Nelson
  2016-05-31 17:17 ` Piotr Dałek
  2016-05-31 18:08 ` Jianjian Huo
  2 siblings, 0 replies; 14+ messages in thread
From: Mark Nelson @ 2016-05-31 16:56 UTC (permalink / raw)
  To: Haomai Wang, Sage Weil; +Cc: ceph-devel

On 05/31/2016 11:49 AM, Haomai Wang wrote:
> Hi Sage and Mark,
>
> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
> won't use bloom filter like *Get*.
>
> *Get* impl: it will look at filter firstly
> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
>
> Iterator *Seek*: it will do binary search, by default we don't specify
> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>
> So I use a simple tests:
>
> ./db_bench -num 10000000  -benchmarks fillbatch
> fill the db firstly with 1000w records.
>
> ./db_bench -use_existing_db  -benchmarks readrandomfast
> readrandomfast case will use *Get* API to retrive data
>
> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
> readrandomfast
>
> LevelDB:    version 4.3
> Date:       Wed Jun  1 00:29:16 2016
> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> CPUCache:   20480 KB
> Keys:       16 bytes each
> Values:     100 bytes each (50 bytes after compression)
> Entries:    1000000
> Prefix:    0 bytes
> Keys per prefix:    0
> RawSize:    110.6 MB (estimated)
> FileSize:   62.9 MB (estimated)
> Writes per second: 0
> Compression: Snappy
> Memtablerep: skip_list
> Perf Level: 0
> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> ------------------------------------------------
> DB path: [/tmp/rocksdbtest-0/dbbench]
> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
> 1000100 found, issued 46639 non-exist keys)
>
> ===========================
> then I modify readrandomfast to use Iterator API[0]:
>
> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
> readrandomfast
> LevelDB:    version 4.3
> Date:       Wed Jun  1 00:33:03 2016
> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> CPUCache:   20480 KB
> Keys:       16 bytes each
> Values:     100 bytes each (50 bytes after compression)
> Entries:    1000000
> Prefix:    0 bytes
> Keys per prefix:    0
> RawSize:    110.6 MB (estimated)
> FileSize:   62.9 MB (estimated)
> Writes per second: 0
> Compression: Snappy
> Memtablerep: skip_list
> Perf Level: 0
> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> ------------------------------------------------
> DB path: [/tmp/rocksdbtest-0/dbbench]
> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
> 1000100 found, issued 46639 non-exist keys)
>
>
> 45.18 us/op vs 4.57us/op!
>
> The test can be repeated and easy to do! Plz correct if I'm doing
> foolish thing I'm not aware..

Excellent catch Haomai!  I'm not sure I will be able to test before I 
leave on holiday, but if I do I will report back.  Do you think upstream 
rocksdb can be improved to make the iterator implementation faster?

>
> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>
> We still can make further improvements by scanning all iterate usage
> to make it better!
>
> [0]:
> --- a/db/db_bench.cc
> +++ b/db/db_bench.cc
> @@ -2923,14 +2923,12 @@ class Benchmark {
>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>          ++read;
> -        auto status = db->Get(options, key, &value);
> -        if (status.ok()) {
> -          ++found;
> -        } else if (!status.IsNotFound()) {
> -          fprintf(stderr, "Get returned an error: %s\n",
> -                  status.ToString().c_str());
> -          abort();
> -        }
> +        Iterator* iter = db->NewIterator(options);
> +      iter->Seek(key);
> +      if (iter->Valid() && iter->key().compare(key) == 0) {
> +        found++;
> +      }
> +
>          if (key_rand >= FLAGS_num) {
>            ++nonexist;
>          }
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RocksDB Incorrect API Usage
  2016-05-31 16:49 RocksDB Incorrect API Usage Haomai Wang
  2016-05-31 16:56 ` Mark Nelson
@ 2016-05-31 17:17 ` Piotr Dałek
  2016-05-31 17:47   ` Haomai Wang
  2016-05-31 18:08 ` Jianjian Huo
  2 siblings, 1 reply; 14+ messages in thread
From: Piotr Dałek @ 2016-05-31 17:17 UTC (permalink / raw)
  To: ceph-devel

On Wed, Jun 01, 2016 at 12:49:53AM +0800, Haomai Wang wrote:
> Hi Sage and Mark,
> 
> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
> won't use bloom filter like *Get*.
> 
> *Get* impl: it will look at filter firstly
> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
> 
> Iterator *Seek*: it will do binary search, by default we don't specify
> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
> 
> [..]
> --- a/db/db_bench.cc
> +++ b/db/db_bench.cc
> @@ -2923,14 +2923,12 @@ class Benchmark {
>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>          ++read;
> -        auto status = db->Get(options, key, &value);
> -        if (status.ok()) {
> -          ++found;
> -        } else if (!status.IsNotFound()) {
> -          fprintf(stderr, "Get returned an error: %s\n",
> -                  status.ToString().c_str());
> -          abort();
> -        }
> +        Iterator* iter = db->NewIterator(options);
> +      iter->Seek(key);
> +      if (iter->Valid() && iter->key().compare(key) == 0) {
> +        found++;
> +      }
> +
>          if (key_rand >= FLAGS_num) {
>            ++nonexist;
>          }

Aren't you missing "delete iter" here?

-- 
Piotr Dałek
branch@predictor.org.pl
http://blog.predictor.org.pl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RocksDB Incorrect API Usage
  2016-05-31 17:17 ` Piotr Dałek
@ 2016-05-31 17:47   ` Haomai Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Haomai Wang @ 2016-05-31 17:47 UTC (permalink / raw)
  To: Piotr Dałek; +Cc: ceph-devel

On Wed, Jun 1, 2016 at 1:17 AM, Piotr Dałek <branch@predictor.org.pl> wrote:
> On Wed, Jun 01, 2016 at 12:49:53AM +0800, Haomai Wang wrote:
>> Hi Sage and Mark,
>>
>> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
>> won't use bloom filter like *Get*.
>>
>> *Get* impl: it will look at filter firstly
>> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
>>
>> Iterator *Seek*: it will do binary search, by default we don't specify
>> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>>
>> [..]
>> --- a/db/db_bench.cc
>> +++ b/db/db_bench.cc
>> @@ -2923,14 +2923,12 @@ class Benchmark {
>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>>          ++read;
>> -        auto status = db->Get(options, key, &value);
>> -        if (status.ok()) {
>> -          ++found;
>> -        } else if (!status.IsNotFound()) {
>> -          fprintf(stderr, "Get returned an error: %s\n",
>> -                  status.ToString().c_str());
>> -          abort();
>> -        }
>> +        Iterator* iter = db->NewIterator(options);
>> +      iter->Seek(key);
>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
>> +        found++;
>> +      }
>> +
>>          if (key_rand >= FLAGS_num) {
>>            ++nonexist;
>>          }
>
> Aren't you missing "delete iter" here?

oh, sure. I retest this, now it's 20us/op. Much better :-). 20 vs 4.

And actually this test set is seq int key and the value is small, it
will be make the gap not so much. Consider the binary search vs bloom
filter, I think the latency is huge. We shouldn't use any iterate
except we ensure it's a continuous key set.

>
> --
> Piotr Dałek
> branch@predictor.org.pl
> http://blog.predictor.org.pl
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: RocksDB Incorrect API Usage
  2016-05-31 16:49 RocksDB Incorrect API Usage Haomai Wang
  2016-05-31 16:56 ` Mark Nelson
  2016-05-31 17:17 ` Piotr Dałek
@ 2016-05-31 18:08 ` Jianjian Huo
  2016-05-31 18:12   ` Haomai Wang
  2 siblings, 1 reply; 14+ messages in thread
From: Jianjian Huo @ 2016-05-31 18:08 UTC (permalink / raw)
  To: Haomai Wang, Sage Weil, Mark Nelson; +Cc: ceph-devel

Hi Haomai,

I noticed this as well, and made same changes to RocksDBStore in this PR last week:
https://github.com/ceph/ceph/pull/9215

One thing which is even worse,  seek will bypass row cache, so kv pairs won't be able to be cached in row cache.
I am working to benchmark the performance impact, will publish the results after I am done this week.

Jianjian

On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xsky.com> wrote:
> Hi Sage and Mark,
>
> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
> won't use bloom filter like *Get*.
>
> *Get* impl: it will look at filter firstly
> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
>
> Iterator *Seek*: it will do binary search, by default we don't specify
> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>
> So I use a simple tests:
>
> ./db_bench -num 10000000  -benchmarks fillbatch
> fill the db firstly with 1000w records.
>
> ./db_bench -use_existing_db  -benchmarks readrandomfast
> readrandomfast case will use *Get* API to retrive data
>
> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
> readrandomfast
>
> LevelDB:    version 4.3
> Date:       Wed Jun  1 00:29:16 2016
> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> CPUCache:   20480 KB
> Keys:       16 bytes each
> Values:     100 bytes each (50 bytes after compression)
> Entries:    1000000
> Prefix:    0 bytes
> Keys per prefix:    0
> RawSize:    110.6 MB (estimated)
> FileSize:   62.9 MB (estimated)
> Writes per second: 0
> Compression: Snappy
> Memtablerep: skip_list
> Perf Level: 0
> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> ------------------------------------------------
> DB path: [/tmp/rocksdbtest-0/dbbench]
> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
> 1000100 found, issued 46639 non-exist keys)
>
> ===========================
> then I modify readrandomfast to use Iterator API[0]:
>
> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
> readrandomfast
> LevelDB:    version 4.3
> Date:       Wed Jun  1 00:33:03 2016
> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> CPUCache:   20480 KB
> Keys:       16 bytes each
> Values:     100 bytes each (50 bytes after compression)
> Entries:    1000000
> Prefix:    0 bytes
> Keys per prefix:    0
> RawSize:    110.6 MB (estimated)
> FileSize:   62.9 MB (estimated)
> Writes per second: 0
> Compression: Snappy
> Memtablerep: skip_list
> Perf Level: 0
> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> ------------------------------------------------
> DB path: [/tmp/rocksdbtest-0/dbbench]
> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
> 1000100 found, issued 46639 non-exist keys)
>
>
> 45.18 us/op vs 4.57us/op!
>
> The test can be repeated and easy to do! Plz correct if I'm doing
> foolish thing I'm not aware..
>
> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>
> We still can make further improvements by scanning all iterate usage
> to make it better!
>
> [0]:
> --- a/db/db_bench.cc
> +++ b/db/db_bench.cc
> @@ -2923,14 +2923,12 @@ class Benchmark {
>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>          ++read;
> -        auto status = db->Get(options, key, &value);
> -        if (status.ok()) {
> -          ++found;
> -        } else if (!status.IsNotFound()) {
> -          fprintf(stderr, "Get returned an error: %s\n",
> -                  status.ToString().c_str());
> -          abort();
> -        }
> +        Iterator* iter = db->NewIterator(options);
> +      iter->Seek(key);
> +      if (iter->Valid() && iter->key().compare(key) == 0) {
> +        found++;
> +      }
> +
>          if (key_rand >= FLAGS_num) {
>            ++nonexist;
>          }
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RocksDB Incorrect API Usage
  2016-05-31 18:08 ` Jianjian Huo
@ 2016-05-31 18:12   ` Haomai Wang
  2016-05-31 19:40     ` Jianjian Huo
  0 siblings, 1 reply; 14+ messages in thread
From: Haomai Wang @ 2016-05-31 18:12 UTC (permalink / raw)
  To: Jianjian Huo; +Cc: Sage Weil, Mark Nelson, ceph-devel

On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
> Hi Haomai,
>
> I noticed this as well, and made same changes to RocksDBStore in this PR last week:
> https://github.com/ceph/ceph/pull/9215
>
> One thing which is even worse,  seek will bypass row cache, so kv pairs won't be able to be cached in row cache.
> I am working to benchmark the performance impact, will publish the results after I am done this week.

Oh, cool! I think you can cherry-pick my another leveldb fix.

BTW, do you pay an attention to prefix seek api? I think it will be
more suitable than column family in ceph case. If we can have
well-defined prefix rule, we can make most of range query cheaper!

>
> Jianjian
>
> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xsky.com> wrote:
>> Hi Sage and Mark,
>>
>> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
>> won't use bloom filter like *Get*.
>>
>> *Get* impl: it will look at filter firstly
>> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
>>
>> Iterator *Seek*: it will do binary search, by default we don't specify
>> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>>
>> So I use a simple tests:
>>
>> ./db_bench -num 10000000  -benchmarks fillbatch
>> fill the db firstly with 1000w records.
>>
>> ./db_bench -use_existing_db  -benchmarks readrandomfast
>> readrandomfast case will use *Get* API to retrive data
>>
>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>> readrandomfast
>>
>> LevelDB:    version 4.3
>> Date:       Wed Jun  1 00:29:16 2016
>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>> CPUCache:   20480 KB
>> Keys:       16 bytes each
>> Values:     100 bytes each (50 bytes after compression)
>> Entries:    1000000
>> Prefix:    0 bytes
>> Keys per prefix:    0
>> RawSize:    110.6 MB (estimated)
>> FileSize:   62.9 MB (estimated)
>> Writes per second: 0
>> Compression: Snappy
>> Memtablerep: skip_list
>> Perf Level: 0
>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>> ------------------------------------------------
>> DB path: [/tmp/rocksdbtest-0/dbbench]
>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
>> 1000100 found, issued 46639 non-exist keys)
>>
>> ===========================
>> then I modify readrandomfast to use Iterator API[0]:
>>
>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>> readrandomfast
>> LevelDB:    version 4.3
>> Date:       Wed Jun  1 00:33:03 2016
>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>> CPUCache:   20480 KB
>> Keys:       16 bytes each
>> Values:     100 bytes each (50 bytes after compression)
>> Entries:    1000000
>> Prefix:    0 bytes
>> Keys per prefix:    0
>> RawSize:    110.6 MB (estimated)
>> FileSize:   62.9 MB (estimated)
>> Writes per second: 0
>> Compression: Snappy
>> Memtablerep: skip_list
>> Perf Level: 0
>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>> ------------------------------------------------
>> DB path: [/tmp/rocksdbtest-0/dbbench]
>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
>> 1000100 found, issued 46639 non-exist keys)
>>
>>
>> 45.18 us/op vs 4.57us/op!
>>
>> The test can be repeated and easy to do! Plz correct if I'm doing
>> foolish thing I'm not aware..
>>
>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>>
>> We still can make further improvements by scanning all iterate usage
>> to make it better!
>>
>> [0]:
>> --- a/db/db_bench.cc
>> +++ b/db/db_bench.cc
>> @@ -2923,14 +2923,12 @@ class Benchmark {
>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>>          ++read;
>> -        auto status = db->Get(options, key, &value);
>> -        if (status.ok()) {
>> -          ++found;
>> -        } else if (!status.IsNotFound()) {
>> -          fprintf(stderr, "Get returned an error: %s\n",
>> -                  status.ToString().c_str());
>> -          abort();
>> -        }
>> +        Iterator* iter = db->NewIterator(options);
>> +      iter->Seek(key);
>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
>> +        found++;
>> +      }
>> +
>>          if (key_rand >= FLAGS_num) {
>>            ++nonexist;
>>          }
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: RocksDB Incorrect API Usage
  2016-05-31 18:12   ` Haomai Wang
@ 2016-05-31 19:40     ` Jianjian Huo
  2016-06-01  2:38       ` Haomai Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Jianjian Huo @ 2016-05-31 19:40 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Sage Weil, Mark Nelson, ceph-devel


On Tue, May 31, 2016 at 11:12 AM, Haomai Wang <haomai@xsky.com> wrote:
> On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>> Hi Haomai,
>>
>> I noticed this as well, and made same changes to RocksDBStore in this PR last week:
>> https://github.com/ceph/ceph/pull/9215
>>
>> One thing which is even worse,  seek will bypass row cache, so kv pairs won't be able to be cached in row cache.
>> I am working to benchmark the performance impact, will publish the results after I am done this week.
>
> Oh, cool! I think you can cherry-pick my another leveldb fix.

No problem, I will do that. 
>
> BTW, do you pay an attention to prefix seek api? I think it will be
> more suitable than column family in ceph case. If we can have
> well-defined prefix rule, we can make most of range query cheaper!

When keys with same prefix are stored in their own SST files(CF case), even seeking without prefix will be faster than seeking with prefix but mixed with different prefixed keys?
I am not sure what optimization prefix seek will use internally for block based format, but to me, it's hard to beat the case when you only have one prefixed keys stored separately.
>
>>
>> Jianjian
>>
>> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xsky.com> wrote:
>>> Hi Sage and Mark,
>>>
>>> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
>>> won't use bloom filter like *Get*.
>>>
>>> *Get* impl: it will look at filter firstly
>>> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
>>>
>>> Iterator *Seek*: it will do binary search, by default we don't specify
>>> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
>>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>>>
>>> So I use a simple tests:
>>>
>>> ./db_bench -num 10000000  -benchmarks fillbatch
>>> fill the db firstly with 1000w records.
>>>
>>> ./db_bench -use_existing_db  -benchmarks readrandomfast
>>> readrandomfast case will use *Get* API to retrive data
>>>
>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>>> readrandomfast
>>>
>>> LevelDB:    version 4.3
>>> Date:       Wed Jun  1 00:29:16 2016
>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>> CPUCache:   20480 KB
>>> Keys:       16 bytes each
>>> Values:     100 bytes each (50 bytes after compression)
>>> Entries:    1000000
>>> Prefix:    0 bytes
>>> Keys per prefix:    0
>>> RawSize:    110.6 MB (estimated)
>>> FileSize:   62.9 MB (estimated)
>>> Writes per second: 0
>>> Compression: Snappy
>>> Memtablerep: skip_list
>>> Perf Level: 0
>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>> ------------------------------------------------
>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
>>> 1000100 found, issued 46639 non-exist keys)
>>>
>>> ===========================
>>> then I modify readrandomfast to use Iterator API[0]:
>>>
>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>>> readrandomfast
>>> LevelDB:    version 4.3
>>> Date:       Wed Jun  1 00:33:03 2016
>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>> CPUCache:   20480 KB
>>> Keys:       16 bytes each
>>> Values:     100 bytes each (50 bytes after compression)
>>> Entries:    1000000
>>> Prefix:    0 bytes
>>> Keys per prefix:    0
>>> RawSize:    110.6 MB (estimated)
>>> FileSize:   62.9 MB (estimated)
>>> Writes per second: 0
>>> Compression: Snappy
>>> Memtablerep: skip_list
>>> Perf Level: 0
>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>> ------------------------------------------------
>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
>>> 1000100 found, issued 46639 non-exist keys)
>>>
>>>
>>> 45.18 us/op vs 4.57us/op!
>>>
>>> The test can be repeated and easy to do! Plz correct if I'm doing
>>> foolish thing I'm not aware..
>>>
>>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>>>
>>> We still can make further improvements by scanning all iterate usage
>>> to make it better!
>>>
>>> [0]:
>>> --- a/db/db_bench.cc
>>> +++ b/db/db_bench.cc
>>> @@ -2923,14 +2923,12 @@ class Benchmark {
>>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>>>          ++read;
>>> -        auto status = db->Get(options, key, &value);
>>> -        if (status.ok()) {
>>> -          ++found;
>>> -        } else if (!status.IsNotFound()) {
>>> -          fprintf(stderr, "Get returned an error: %s\n",
>>> -                  status.ToString().c_str());
>>> -          abort();
>>> -        }
>>> +        Iterator* iter = db->NewIterator(options);
>>> +      iter->Seek(key);
>>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
>>> +        found++;
>>> +      }
>>> +
>>>          if (key_rand >= FLAGS_num) {
>>>            ++nonexist;
>>>          }
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RocksDB Incorrect API Usage
  2016-05-31 19:40     ` Jianjian Huo
@ 2016-06-01  2:38       ` Haomai Wang
  2016-06-01  2:52         ` Allen Samuels
  2016-06-01  3:13         ` Jianjian Huo
  0 siblings, 2 replies; 14+ messages in thread
From: Haomai Wang @ 2016-06-01  2:38 UTC (permalink / raw)
  To: Jianjian Huo; +Cc: Sage Weil, Mark Nelson, ceph-devel

On Wed, Jun 1, 2016 at 3:40 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>
> On Tue, May 31, 2016 at 11:12 AM, Haomai Wang <haomai@xsky.com> wrote:
>> On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>> Hi Haomai,
>>>
>>> I noticed this as well, and made same changes to RocksDBStore in this PR last week:
>>> https://github.com/ceph/ceph/pull/9215
>>>
>>> One thing which is even worse,  seek will bypass row cache, so kv pairs won't be able to be cached in row cache.
>>> I am working to benchmark the performance impact, will publish the results after I am done this week.
>>
>> Oh, cool! I think you can cherry-pick my another leveldb fix.
>
> No problem, I will do that.
>>
>> BTW, do you pay an attention to prefix seek api? I think it will be
>> more suitable than column family in ceph case. If we can have
>> well-defined prefix rule, we can make most of range query cheaper!
>
> When keys with same prefix are stored in their own SST files(CF case), even seeking without prefix will be faster than seeking with prefix but mixed with different prefixed keys?

From my current view, prefix could be more flexible. For example, each
rgw bucket index could use one prefix to make each bucket index object
seek separated. For CF, it would be too heavry.

> I am not sure what optimization prefix seek will use internally for block based format, but to me, it's hard to beat the case when you only have one prefixed keys stored separately.
>>
>>>
>>> Jianjian
>>>
>>> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xsky.com> wrote:
>>>> Hi Sage and Mark,
>>>>
>>>> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
>>>> won't use bloom filter like *Get*.
>>>>
>>>> *Get* impl: it will look at filter firstly
>>>> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
>>>>
>>>> Iterator *Seek*: it will do binary search, by default we don't specify
>>>> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
>>>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>>>>
>>>> So I use a simple tests:
>>>>
>>>> ./db_bench -num 10000000  -benchmarks fillbatch
>>>> fill the db firstly with 1000w records.
>>>>
>>>> ./db_bench -use_existing_db  -benchmarks readrandomfast
>>>> readrandomfast case will use *Get* API to retrive data
>>>>
>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>>>> readrandomfast
>>>>
>>>> LevelDB:    version 4.3
>>>> Date:       Wed Jun  1 00:29:16 2016
>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>>> CPUCache:   20480 KB
>>>> Keys:       16 bytes each
>>>> Values:     100 bytes each (50 bytes after compression)
>>>> Entries:    1000000
>>>> Prefix:    0 bytes
>>>> Keys per prefix:    0
>>>> RawSize:    110.6 MB (estimated)
>>>> FileSize:   62.9 MB (estimated)
>>>> Writes per second: 0
>>>> Compression: Snappy
>>>> Memtablerep: skip_list
>>>> Perf Level: 0
>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>>> ------------------------------------------------
>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
>>>> 1000100 found, issued 46639 non-exist keys)
>>>>
>>>> ===========================
>>>> then I modify readrandomfast to use Iterator API[0]:
>>>>
>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>>>> readrandomfast
>>>> LevelDB:    version 4.3
>>>> Date:       Wed Jun  1 00:33:03 2016
>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>>> CPUCache:   20480 KB
>>>> Keys:       16 bytes each
>>>> Values:     100 bytes each (50 bytes after compression)
>>>> Entries:    1000000
>>>> Prefix:    0 bytes
>>>> Keys per prefix:    0
>>>> RawSize:    110.6 MB (estimated)
>>>> FileSize:   62.9 MB (estimated)
>>>> Writes per second: 0
>>>> Compression: Snappy
>>>> Memtablerep: skip_list
>>>> Perf Level: 0
>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>>> ------------------------------------------------
>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
>>>> 1000100 found, issued 46639 non-exist keys)
>>>>
>>>>
>>>> 45.18 us/op vs 4.57us/op!
>>>>
>>>> The test can be repeated and easy to do! Plz correct if I'm doing
>>>> foolish thing I'm not aware..
>>>>
>>>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>>>>
>>>> We still can make further improvements by scanning all iterate usage
>>>> to make it better!
>>>>
>>>> [0]:
>>>> --- a/db/db_bench.cc
>>>> +++ b/db/db_bench.cc
>>>> @@ -2923,14 +2923,12 @@ class Benchmark {
>>>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>>>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>>>>          ++read;
>>>> -        auto status = db->Get(options, key, &value);
>>>> -        if (status.ok()) {
>>>> -          ++found;
>>>> -        } else if (!status.IsNotFound()) {
>>>> -          fprintf(stderr, "Get returned an error: %s\n",
>>>> -                  status.ToString().c_str());
>>>> -          abort();
>>>> -        }
>>>> +        Iterator* iter = db->NewIterator(options);
>>>> +      iter->Seek(key);
>>>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
>>>> +        found++;
>>>> +      }
>>>> +
>>>>          if (key_rand >= FLAGS_num) {
>>>>            ++nonexist;
>>>>          }
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: RocksDB Incorrect API Usage
  2016-06-01  2:38       ` Haomai Wang
@ 2016-06-01  2:52         ` Allen Samuels
  2016-06-01  3:24           ` Jianjian Huo
  2016-06-01  3:13         ` Jianjian Huo
  1 sibling, 1 reply; 14+ messages in thread
From: Allen Samuels @ 2016-06-01  2:52 UTC (permalink / raw)
  To: Haomai Wang, Jianjian Huo; +Cc: Sage Weil, Mark Nelson, ceph-devel

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Haomai Wang
> Sent: Tuesday, May 31, 2016 7:38 PM
> To: Jianjian Huo <jianjian.huo@samsung.com>
> Cc: Sage Weil <sweil@redhat.com>; Mark Nelson <mnelson@redhat.com>;
> ceph-devel@vger.kernel.org
> Subject: Re: RocksDB Incorrect API Usage
> 
> On Wed, Jun 1, 2016 at 3:40 AM, Jianjian Huo <jianjian.huo@samsung.com>
> wrote:
> >
> > On Tue, May 31, 2016 at 11:12 AM, Haomai Wang <haomai@xsky.com>
> wrote:
> >> On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo
> <jianjian.huo@samsung.com> wrote:
> >>> Hi Haomai,
> >>>
> >>> I noticed this as well, and made same changes to RocksDBStore in this PR
> last week:
> >>> https://github.com/ceph/ceph/pull/9215
> >>>
> >>> One thing which is even worse,  seek will bypass row cache, so kv pairs
> won't be able to be cached in row cache.
> >>> I am working to benchmark the performance impact, will publish the
> results after I am done this week.
> >>
> >> Oh, cool! I think you can cherry-pick my another leveldb fix.
> >
> > No problem, I will do that.
> >>
> >> BTW, do you pay an attention to prefix seek api? I think it will be
> >> more suitable than column family in ceph case. If we can have
> >> well-defined prefix rule, we can make most of range query cheaper!
> >
> > When keys with same prefix are stored in their own SST files(CF case),
> even seeking without prefix will be faster than seeking with prefix but mixed
> with different prefixed keys?
> 
> From my current view, prefix could be more flexible. For example, each rgw
> bucket index could use one prefix to make each bucket index object seek
> separated. For CF, it would be too heavry.

This might cause other problems. Would it dramatically increase the number of files that BlueFS needs to manage? If so, that might effectively break that code too (of course it's fixable also :))
> 
> > I am not sure what optimization prefix seek will use internally for block
> based format, but to me, it's hard to beat the case when you only have one
> prefixed keys stored separately.
> >>
> >>>
> >>> Jianjian
> >>>
> >>> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xsky.com>
> wrote:
> >>>> Hi Sage and Mark,
> >>>>
> >>>> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
> >>>> won't use bloom filter like *Get*.
> >>>>
> >>>> *Get* impl: it will look at filter firstly
> >>>>
> https://github.com/facebook/rocksdb/blob/master/table/block_based_t
> >>>> able_reader.cc#L1369
> >>>>
> >>>> Iterator *Seek*: it will do binary search, by default we don't
> >>>> specify prefix
> feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-
> Changes).
> >>>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
> >>>>
> >>>> So I use a simple tests:
> >>>>
> >>>> ./db_bench -num 10000000  -benchmarks fillbatch fill the db firstly
> >>>> with 1000w records.
> >>>>
> >>>> ./db_bench -use_existing_db  -benchmarks readrandomfast
> >>>> readrandomfast case will use *Get* API to retrive data
> >>>>
> >>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
> >>>> -benchmarks readrandomfast
> >>>>
> >>>> LevelDB:    version 4.3
> >>>> Date:       Wed Jun  1 00:29:16 2016
> >>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> >>>> CPUCache:   20480 KB
> >>>> Keys:       16 bytes each
> >>>> Values:     100 bytes each (50 bytes after compression)
> >>>> Entries:    1000000
> >>>> Prefix:    0 bytes
> >>>> Keys per prefix:    0
> >>>> RawSize:    110.6 MB (estimated)
> >>>> FileSize:   62.9 MB (estimated)
> >>>> Writes per second: 0
> >>>> Compression: Snappy
> >>>> Memtablerep: skip_list
> >>>> Perf Level: 0
> >>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> >>>> ------------------------------------------------
> >>>> DB path: [/tmp/rocksdbtest-0/dbbench]
> >>>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
> >>>> 1000100 found, issued 46639 non-exist keys)
> >>>>
> >>>> ===========================
> >>>> then I modify readrandomfast to use Iterator API[0]:
> >>>>
> >>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
> >>>> -benchmarks readrandomfast
> >>>> LevelDB:    version 4.3
> >>>> Date:       Wed Jun  1 00:33:03 2016
> >>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> >>>> CPUCache:   20480 KB
> >>>> Keys:       16 bytes each
> >>>> Values:     100 bytes each (50 bytes after compression)
> >>>> Entries:    1000000
> >>>> Prefix:    0 bytes
> >>>> Keys per prefix:    0
> >>>> RawSize:    110.6 MB (estimated)
> >>>> FileSize:   62.9 MB (estimated)
> >>>> Writes per second: 0
> >>>> Compression: Snappy
> >>>> Memtablerep: skip_list
> >>>> Perf Level: 0
> >>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> >>>> ------------------------------------------------
> >>>> DB path: [/tmp/rocksdbtest-0/dbbench]
> >>>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
> >>>> 1000100 found, issued 46639 non-exist keys)
> >>>>
> >>>>
> >>>> 45.18 us/op vs 4.57us/op!
> >>>>
> >>>> The test can be repeated and easy to do! Plz correct if I'm doing
> >>>> foolish thing I'm not aware..
> >>>>
> >>>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
> >>>>
> >>>> We still can make further improvements by scanning all iterate
> >>>> usage to make it better!
> >>>>
> >>>> [0]:
> >>>> --- a/db/db_bench.cc
> >>>> +++ b/db/db_bench.cc
> >>>> @@ -2923,14 +2923,12 @@ class Benchmark {
> >>>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
> >>>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
> >>>>          ++read;
> >>>> -        auto status = db->Get(options, key, &value);
> >>>> -        if (status.ok()) {
> >>>> -          ++found;
> >>>> -        } else if (!status.IsNotFound()) {
> >>>> -          fprintf(stderr, "Get returned an error: %s\n",
> >>>> -                  status.ToString().c_str());
> >>>> -          abort();
> >>>> -        }
> >>>> +        Iterator* iter = db->NewIterator(options);
> >>>> +      iter->Seek(key);
> >>>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
> >>>> +        found++;
> >>>> +      }
> >>>> +
> >>>>          if (key_rand >= FLAGS_num) {
> >>>>            ++nonexist;
> >>>>          }
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe
> >>>> ceph-devel" in the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >> in the body of a message to majordomo@vger.kernel.org More
> majordomo
> >> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: RocksDB Incorrect API Usage
  2016-06-01  2:38       ` Haomai Wang
  2016-06-01  2:52         ` Allen Samuels
@ 2016-06-01  3:13         ` Jianjian Huo
  1 sibling, 0 replies; 14+ messages in thread
From: Jianjian Huo @ 2016-06-01  3:13 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Sage Weil, Mark Nelson, ceph-devel


On Tue, May 31, 2016 at 7:38 PM, Haomai Wang <haomai@xsky.com> wrote:
> On Wed, Jun 1, 2016 at 3:40 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>
>> On Tue, May 31, 2016 at 11:12 AM, Haomai Wang <haomai@xsky.com> wrote:
>>> On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>>> Hi Haomai,
>>>>
>>>> I noticed this as well, and made same changes to RocksDBStore in this PR last week:
>>>> https://github.com/ceph/ceph/pull/9215
>>>>
>>>> One thing which is even worse,  seek will bypass row cache, so kv pairs won't be able to be cached in row cache.
>>>> I am working to benchmark the performance impact, will publish the results after I am done this week.
>>>
>>> Oh, cool! I think you can cherry-pick my another leveldb fix.
>>
>> No problem, I will do that.
>>>
>>> BTW, do you pay an attention to prefix seek api? I think it will be
>>> more suitable than column family in ceph case. If we can have
>>> well-defined prefix rule, we can make most of range query cheaper!
>>
>> When keys with same prefix are stored in their own SST files(CF case), even seeking without prefix will be faster than seeking with prefix but mixed with different prefixed keys?
>
> From my current view, prefix could be more flexible. For example, each
> rgw bucket index could use one prefix to make each bucket index object
> seek separated. For CF, it would be too heavry.

Sure, no doubt prefix seek will help some cases, CF will help others. And prefix will mostly benefit seek, while CF has broader impact.
>
>> I am not sure what optimization prefix seek will use internally for block based format, but to me, it's hard to beat the case when you only have one prefixed keys stored separately.
>>>
>>>>
>>>> Jianjian
>>>>
>>>> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xsky.com> wrote:
>>>>> Hi Sage and Mark,
>>>>>
>>>>> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
>>>>> won't use bloom filter like *Get*.
>>>>>
>>>>> *Get* impl: it will look at filter firstly
>>>>> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
>>>>>
>>>>> Iterator *Seek*: it will do binary search, by default we don't specify
>>>>> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
>>>>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>>>>>
>>>>> So I use a simple tests:
>>>>>
>>>>> ./db_bench -num 10000000  -benchmarks fillbatch
>>>>> fill the db firstly with 1000w records.
>>>>>
>>>>> ./db_bench -use_existing_db  -benchmarks readrandomfast
>>>>> readrandomfast case will use *Get* API to retrive data
>>>>>
>>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>>>>> readrandomfast
>>>>>
>>>>> LevelDB:    version 4.3
>>>>> Date:       Wed Jun  1 00:29:16 2016
>>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>>>> CPUCache:   20480 KB
>>>>> Keys:       16 bytes each
>>>>> Values:     100 bytes each (50 bytes after compression)
>>>>> Entries:    1000000
>>>>> Prefix:    0 bytes
>>>>> Keys per prefix:    0
>>>>> RawSize:    110.6 MB (estimated)
>>>>> FileSize:   62.9 MB (estimated)
>>>>> Writes per second: 0
>>>>> Compression: Snappy
>>>>> Memtablerep: skip_list
>>>>> Perf Level: 0
>>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>>>> ------------------------------------------------
>>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>>>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
>>>>> 1000100 found, issued 46639 non-exist keys)
>>>>>
>>>>> ===========================
>>>>> then I modify readrandomfast to use Iterator API[0]:
>>>>>
>>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>>>>> readrandomfast
>>>>> LevelDB:    version 4.3
>>>>> Date:       Wed Jun  1 00:33:03 2016
>>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>>>> CPUCache:   20480 KB
>>>>> Keys:       16 bytes each
>>>>> Values:     100 bytes each (50 bytes after compression)
>>>>> Entries:    1000000
>>>>> Prefix:    0 bytes
>>>>> Keys per prefix:    0
>>>>> RawSize:    110.6 MB (estimated)
>>>>> FileSize:   62.9 MB (estimated)
>>>>> Writes per second: 0
>>>>> Compression: Snappy
>>>>> Memtablerep: skip_list
>>>>> Perf Level: 0
>>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>>>> ------------------------------------------------
>>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>>>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
>>>>> 1000100 found, issued 46639 non-exist keys)
>>>>>
>>>>>
>>>>> 45.18 us/op vs 4.57us/op!
>>>>>
>>>>> The test can be repeated and easy to do! Plz correct if I'm doing
>>>>> foolish thing I'm not aware..
>>>>>
>>>>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>>>>>
>>>>> We still can make further improvements by scanning all iterate usage
>>>>> to make it better!
>>>>>
>>>>> [0]:
>>>>> --- a/db/db_bench.cc
>>>>> +++ b/db/db_bench.cc
>>>>> @@ -2923,14 +2923,12 @@ class Benchmark {
>>>>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>>>>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>>>>>          ++read;
>>>>> -        auto status = db->Get(options, key, &value);
>>>>> -        if (status.ok()) {
>>>>> -          ++found;
>>>>> -        } else if (!status.IsNotFound()) {
>>>>> -          fprintf(stderr, "Get returned an error: %s\n",
>>>>> -                  status.ToString().c_str());
>>>>> -          abort();
>>>>> -        }
>>>>> +        Iterator* iter = db->NewIterator(options);
>>>>> +      iter->Seek(key);
>>>>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
>>>>> +        found++;
>>>>> +      }
>>>>> +
>>>>>          if (key_rand >= FLAGS_num) {
>>>>>            ++nonexist;
>>>>>          }
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: RocksDB Incorrect API Usage
  2016-06-01  2:52         ` Allen Samuels
@ 2016-06-01  3:24           ` Jianjian Huo
  2016-06-01  3:37             ` Varada Kari
  0 siblings, 1 reply; 14+ messages in thread
From: Jianjian Huo @ 2016-06-01  3:24 UTC (permalink / raw)
  To: Allen Samuels, Haomai Wang; +Cc: Sage Weil, Mark Nelson, ceph-devel

Hi Allen,

On Tue, May 31, 2016 at 7:52 PM, Allen Samuels <Allen.Samuels@sandisk.com> wrote:
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>> owner@vger.kernel.org] On Behalf Of Haomai Wang
>> Sent: Tuesday, May 31, 2016 7:38 PM
>> To: Jianjian Huo <jianjian.huo@samsung.com>
>> Cc: Sage Weil <sweil@redhat.com>; Mark Nelson <mnelson@redhat.com>;
>> ceph-devel@vger.kernel.org
>> Subject: Re: RocksDB Incorrect API Usage
>>
>> On Wed, Jun 1, 2016 at 3:40 AM, Jianjian Huo <jianjian.huo@samsung.com>
>> wrote:
>> >
>> > On Tue, May 31, 2016 at 11:12 AM, Haomai Wang <haomai@xsky.com>
>> wrote:
>> >> On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo
>> <jianjian.huo@samsung.com> wrote:
>> >>> Hi Haomai,
>> >>>
>> >>> I noticed this as well, and made same changes to RocksDBStore in this PR
>> last week:
>> >>> https://github.com/ceph/ceph/pull/9215
>> >>>
>> >>> One thing which is even worse,  seek will bypass row cache, so kv pairs
>> won't be able to be cached in row cache.
>> >>> I am working to benchmark the performance impact, will publish the
>> results after I am done this week.
>> >>
>> >> Oh, cool! I think you can cherry-pick my another leveldb fix.
>> >
>> > No problem, I will do that.
>> >>
>> >> BTW, do you pay an attention to prefix seek api? I think it will be
>> >> more suitable than column family in ceph case. If we can have
>> >> well-defined prefix rule, we can make most of range query cheaper!
>> >
>> > When keys with same prefix are stored in their own SST files(CF case),
>> even seeking without prefix will be faster than seeking with prefix but mixed
>> with different prefixed keys?
>>
>> From my current view, prefix could be more flexible. For example, each rgw
>> bucket index could use one prefix to make each bucket index object seek
>> separated. For CF, it would be too heavry.
>
> This might cause other problems. Would it dramatically increase the number of files that BlueFS needs to manage? If so, that might effectively break that code too (of course it's fixable also :))

Thanks for bringing up this issue. What's current limit of number of files for BlueFS, from your experience? 
Jianjian
>>
>> > I am not sure what optimization prefix seek will use internally for block
>> based format, but to me, it's hard to beat the case when you only have one
>> prefixed keys stored separately.
>> >>
>> >>>
>> >>> Jianjian
>> >>>
>> >>> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xsky.com>
>> wrote:
>> >>>> Hi Sage and Mark,
>> >>>>
>> >>>> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
>> >>>> won't use bloom filter like *Get*.
>> >>>>
>> >>>> *Get* impl: it will look at filter firstly
>> >>>>
>> https://github.com/facebook/rocksdb/blob/master/table/block_based_t
>> >>>> able_reader.cc#L1369
>> >>>>
>> >>>> Iterator *Seek*: it will do binary search, by default we don't
>> >>>> specify prefix
>> feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-
>> Changes).
>> >>>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>> >>>>
>> >>>> So I use a simple tests:
>> >>>>
>> >>>> ./db_bench -num 10000000  -benchmarks fillbatch fill the db firstly
>> >>>> with 1000w records.
>> >>>>
>> >>>> ./db_bench -use_existing_db  -benchmarks readrandomfast
>> >>>> readrandomfast case will use *Get* API to retrive data
>> >>>>
>> >>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
>> >>>> -benchmarks readrandomfast
>> >>>>
>> >>>> LevelDB:    version 4.3
>> >>>> Date:       Wed Jun  1 00:29:16 2016
>> >>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>> >>>> CPUCache:   20480 KB
>> >>>> Keys:       16 bytes each
>> >>>> Values:     100 bytes each (50 bytes after compression)
>> >>>> Entries:    1000000
>> >>>> Prefix:    0 bytes
>> >>>> Keys per prefix:    0
>> >>>> RawSize:    110.6 MB (estimated)
>> >>>> FileSize:   62.9 MB (estimated)
>> >>>> Writes per second: 0
>> >>>> Compression: Snappy
>> >>>> Memtablerep: skip_list
>> >>>> Perf Level: 0
>> >>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>> >>>> ------------------------------------------------
>> >>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>> >>>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
>> >>>> 1000100 found, issued 46639 non-exist keys)
>> >>>>
>> >>>> ===========================
>> >>>> then I modify readrandomfast to use Iterator API[0]:
>> >>>>
>> >>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
>> >>>> -benchmarks readrandomfast
>> >>>> LevelDB:    version 4.3
>> >>>> Date:       Wed Jun  1 00:33:03 2016
>> >>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>> >>>> CPUCache:   20480 KB
>> >>>> Keys:       16 bytes each
>> >>>> Values:     100 bytes each (50 bytes after compression)
>> >>>> Entries:    1000000
>> >>>> Prefix:    0 bytes
>> >>>> Keys per prefix:    0
>> >>>> RawSize:    110.6 MB (estimated)
>> >>>> FileSize:   62.9 MB (estimated)
>> >>>> Writes per second: 0
>> >>>> Compression: Snappy
>> >>>> Memtablerep: skip_list
>> >>>> Perf Level: 0
>> >>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>> >>>> ------------------------------------------------
>> >>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>> >>>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
>> >>>> 1000100 found, issued 46639 non-exist keys)
>> >>>>
>> >>>>
>> >>>> 45.18 us/op vs 4.57us/op!
>> >>>>
>> >>>> The test can be repeated and easy to do! Plz correct if I'm doing
>> >>>> foolish thing I'm not aware..
>> >>>>
>> >>>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>> >>>>
>> >>>> We still can make further improvements by scanning all iterate
>> >>>> usage to make it better!
>> >>>>
>> >>>> [0]:
>> >>>> --- a/db/db_bench.cc
>> >>>> +++ b/db/db_bench.cc
>> >>>> @@ -2923,14 +2923,12 @@ class Benchmark {
>> >>>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>> >>>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>> >>>>          ++read;
>> >>>> -        auto status = db->Get(options, key, &value);
>> >>>> -        if (status.ok()) {
>> >>>> -          ++found;
>> >>>> -        } else if (!status.IsNotFound()) {
>> >>>> -          fprintf(stderr, "Get returned an error: %s\n",
>> >>>> -                  status.ToString().c_str());
>> >>>> -          abort();
>> >>>> -        }
>> >>>> +        Iterator* iter = db->NewIterator(options);
>> >>>> +      iter->Seek(key);
>> >>>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
>> >>>> +        found++;
>> >>>> +      }
>> >>>> +
>> >>>>          if (key_rand >= FLAGS_num) {
>> >>>>            ++nonexist;
>> >>>>          }
>> >>>> --
>> >>>> To unsubscribe from this list: send the line "unsubscribe
>> >>>> ceph-devel" in the body of a message to majordomo@vger.kernel.org
>> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> >> in the body of a message to majordomo@vger.kernel.org More
>> majordomo
>> >> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
>> body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RocksDB Incorrect API Usage
  2016-06-01  3:24           ` Jianjian Huo
@ 2016-06-01  3:37             ` Varada Kari
  2016-06-01 13:46               ` Allen Samuels
  0 siblings, 1 reply; 14+ messages in thread
From: Varada Kari @ 2016-06-01  3:37 UTC (permalink / raw)
  To: Jianjian Huo, Allen Samuels, Haomai Wang
  Cc: Sage Weil, Mark Nelson, ceph-devel



On Wednesday 01 June 2016 08:54 AM, Jianjian Huo wrote:
> Hi Allen,
>
> On Tue, May 31, 2016 at 7:52 PM, Allen Samuels <Allen.Samuels@sandisk.com> wrote:
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>>> owner@vger.kernel.org] On Behalf Of Haomai Wang
>>> Sent: Tuesday, May 31, 2016 7:38 PM
>>> To: Jianjian Huo <jianjian.huo@samsung.com>
>>> Cc: Sage Weil <sweil@redhat.com>; Mark Nelson <mnelson@redhat.com>;
>>> ceph-devel@vger.kernel.org
>>> Subject: Re: RocksDB Incorrect API Usage
>>>
>>> On Wed, Jun 1, 2016 at 3:40 AM, Jianjian Huo <jianjian.huo@samsung.com>
>>> wrote:
>>>> On Tue, May 31, 2016 at 11:12 AM, Haomai Wang <haomai@xsky.com>
>>> wrote:
>>>>> On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo
>>> <jianjian.huo@samsung.com> wrote:
>>>>>> Hi Haomai,
>>>>>>
>>>>>> I noticed this as well, and made same changes to RocksDBStore in this PR
>>> last week:
>>>>>> https://github.com/ceph/ceph/pull/9215
>>>>>>
>>>>>> One thing which is even worse,  seek will bypass row cache, so kv pairs
>>> won't be able to be cached in row cache.
>>>>>> I am working to benchmark the performance impact, will publish the
>>> results after I am done this week.
>>>>> Oh, cool! I think you can cherry-pick my another leveldb fix.
>>>> No problem, I will do that.
>>>>> BTW, do you pay an attention to prefix seek api? I think it will be
>>>>> more suitable than column family in ceph case. If we can have
>>>>> well-defined prefix rule, we can make most of range query cheaper!
>>>> When keys with same prefix are stored in their own SST files(CF case),
>>> even seeking without prefix will be faster than seeking with prefix but mixed
>>> with different prefixed keys?
>>>
>>> From my current view, prefix could be more flexible. For example, each rgw
>>> bucket index could use one prefix to make each bucket index object seek
>>> separated. For CF, it would be too heavry.
>> This might cause other problems. Would it dramatically increase the number of files that BlueFS needs to manage? If so, that might effectively break that code too (of course it's fixable also :))
> Thanks for bringing up this issue. What's current limit of number of files for BlueFS, from your experience?
> Jianjian
There is no limit right now, fnode ino is a unit64_t.  Directory is a
ref counted object contains a file map. As the number of inodes grow,
file_map will grow, will consume more memory.

Varada
>>>> I am not sure what optimization prefix seek will use internally for block
>>> based format, but to me, it's hard to beat the case when you only have one
>>> prefixed keys stored separately.
>>>>>> Jianjian
>>>>>>
>>>>>> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xsky.com>
>>> wrote:
>>>>>>> Hi Sage and Mark,
>>>>>>>
>>>>>>> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
>>>>>>> won't use bloom filter like *Get*.
>>>>>>>
>>>>>>> *Get* impl: it will look at filter firstly
>>>>>>>
>>> https://github.com/facebook/rocksdb/blob/master/table/block_based_t
>>>>>>> able_reader.cc#L1369
>>>>>>>
>>>>>>> Iterator *Seek*: it will do binary search, by default we don't
>>>>>>> specify prefix
>>> feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-
>>> Changes).
>>>>>>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>>>>>>>
>>>>>>> So I use a simple tests:
>>>>>>>
>>>>>>> ./db_bench -num 10000000  -benchmarks fillbatch fill the db firstly
>>>>>>> with 1000w records.
>>>>>>>
>>>>>>> ./db_bench -use_existing_db  -benchmarks readrandomfast
>>>>>>> readrandomfast case will use *Get* API to retrive data
>>>>>>>
>>>>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
>>>>>>> -benchmarks readrandomfast
>>>>>>>
>>>>>>> LevelDB:    version 4.3
>>>>>>> Date:       Wed Jun  1 00:29:16 2016
>>>>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>>>>>> CPUCache:   20480 KB
>>>>>>> Keys:       16 bytes each
>>>>>>> Values:     100 bytes each (50 bytes after compression)
>>>>>>> Entries:    1000000
>>>>>>> Prefix:    0 bytes
>>>>>>> Keys per prefix:    0
>>>>>>> RawSize:    110.6 MB (estimated)
>>>>>>> FileSize:   62.9 MB (estimated)
>>>>>>> Writes per second: 0
>>>>>>> Compression: Snappy
>>>>>>> Memtablerep: skip_list
>>>>>>> Perf Level: 0
>>>>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>>>>>> ------------------------------------------------
>>>>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>>>>>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
>>>>>>> 1000100 found, issued 46639 non-exist keys)
>>>>>>>
>>>>>>> ===========================
>>>>>>> then I modify readrandomfast to use Iterator API[0]:
>>>>>>>
>>>>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
>>>>>>> -benchmarks readrandomfast
>>>>>>> LevelDB:    version 4.3
>>>>>>> Date:       Wed Jun  1 00:33:03 2016
>>>>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>>>>>> CPUCache:   20480 KB
>>>>>>> Keys:       16 bytes each
>>>>>>> Values:     100 bytes each (50 bytes after compression)
>>>>>>> Entries:    1000000
>>>>>>> Prefix:    0 bytes
>>>>>>> Keys per prefix:    0
>>>>>>> RawSize:    110.6 MB (estimated)
>>>>>>> FileSize:   62.9 MB (estimated)
>>>>>>> Writes per second: 0
>>>>>>> Compression: Snappy
>>>>>>> Memtablerep: skip_list
>>>>>>> Perf Level: 0
>>>>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>>>>>> ------------------------------------------------
>>>>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>>>>>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
>>>>>>> 1000100 found, issued 46639 non-exist keys)
>>>>>>>
>>>>>>>
>>>>>>> 45.18 us/op vs 4.57us/op!
>>>>>>>
>>>>>>> The test can be repeated and easy to do! Plz correct if I'm doing
>>>>>>> foolish thing I'm not aware..
>>>>>>>
>>>>>>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>>>>>>>
>>>>>>> We still can make further improvements by scanning all iterate
>>>>>>> usage to make it better!
>>>>>>>
>>>>>>> [0]:
>>>>>>> --- a/db/db_bench.cc
>>>>>>> +++ b/db/db_bench.cc
>>>>>>> @@ -2923,14 +2923,12 @@ class Benchmark {
>>>>>>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>>>>>>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>>>>>>>          ++read;
>>>>>>> -        auto status = db->Get(options, key, &value);
>>>>>>> -        if (status.ok()) {
>>>>>>> -          ++found;
>>>>>>> -        } else if (!status.IsNotFound()) {
>>>>>>> -          fprintf(stderr, "Get returned an error: %s\n",
>>>>>>> -                  status.ToString().c_str());
>>>>>>> -          abort();
>>>>>>> -        }
>>>>>>> +        Iterator* iter = db->NewIterator(options);
>>>>>>> +      iter->Seek(key);
>>>>>>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
>>>>>>> +        found++;
>>>>>>> +      }
>>>>>>> +
>>>>>>>          if (key_rand >= FLAGS_num) {
>>>>>>>            ++nonexist;
>>>>>>>          }
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> ceph-devel" in the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>> majordomo
>>>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
>>> body of a message to majordomo@vger.kernel.org More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+�����ݢj"��!�i

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: RocksDB Incorrect API Usage
  2016-06-01  3:37             ` Varada Kari
@ 2016-06-01 13:46               ` Allen Samuels
  2016-06-01 14:24                 ` Varada Kari
  0 siblings, 1 reply; 14+ messages in thread
From: Allen Samuels @ 2016-06-01 13:46 UTC (permalink / raw)
  To: Varada Kari, Jianjian Huo, Haomai Wang; +Cc: Sage Weil, Mark Nelson, ceph-devel

> -----Original Message-----
> From: Varada Kari
> Sent: Tuesday, May 31, 2016 8:38 PM
> To: Jianjian Huo <jianjian.huo@samsung.com>; Allen Samuels
> <Allen.Samuels@sandisk.com>; Haomai Wang <haomai@xsky.com>
> Cc: Sage Weil <sweil@redhat.com>; Mark Nelson <mnelson@redhat.com>;
> ceph-devel@vger.kernel.org
> Subject: Re: RocksDB Incorrect API Usage
> 
> 
> 
> On Wednesday 01 June 2016 08:54 AM, Jianjian Huo wrote:
> > Hi Allen,
> >
> > On Tue, May 31, 2016 at 7:52 PM, Allen Samuels
> <Allen.Samuels@sandisk.com> wrote:
> >>> -----Original Message-----
> >>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> >>> owner@vger.kernel.org] On Behalf Of Haomai Wang
> >>> Sent: Tuesday, May 31, 2016 7:38 PM
> >>> To: Jianjian Huo <jianjian.huo@samsung.com>
> >>> Cc: Sage Weil <sweil@redhat.com>; Mark Nelson
> <mnelson@redhat.com>;
> >>> ceph-devel@vger.kernel.org
> >>> Subject: Re: RocksDB Incorrect API Usage
> >>>
> >>> On Wed, Jun 1, 2016 at 3:40 AM, Jianjian Huo
> >>> <jianjian.huo@samsung.com>
> >>> wrote:
> >>>> On Tue, May 31, 2016 at 11:12 AM, Haomai Wang <haomai@xsky.com>
> >>> wrote:
> >>>>> On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo
> >>> <jianjian.huo@samsung.com> wrote:
> >>>>>> Hi Haomai,
> >>>>>>
> >>>>>> I noticed this as well, and made same changes to RocksDBStore in
> >>>>>> this PR
> >>> last week:
> >>>>>> https://github.com/ceph/ceph/pull/9215
> >>>>>>
> >>>>>> One thing which is even worse,  seek will bypass row cache, so kv
> >>>>>> pairs
> >>> won't be able to be cached in row cache.
> >>>>>> I am working to benchmark the performance impact, will publish
> >>>>>> the
> >>> results after I am done this week.
> >>>>> Oh, cool! I think you can cherry-pick my another leveldb fix.
> >>>> No problem, I will do that.
> >>>>> BTW, do you pay an attention to prefix seek api? I think it will
> >>>>> be more suitable than column family in ceph case. If we can have
> >>>>> well-defined prefix rule, we can make most of range query cheaper!
> >>>> When keys with same prefix are stored in their own SST files(CF
> >>>> case),
> >>> even seeking without prefix will be faster than seeking with prefix
> >>> but mixed with different prefixed keys?
> >>>
> >>> From my current view, prefix could be more flexible. For example,
> >>> each rgw bucket index could use one prefix to make each bucket index
> >>> object seek separated. For CF, it would be too heavry.
> >> This might cause other problems. Would it dramatically increase the
> >> number of files that BlueFS needs to manage? If so, that might
> >> effectively break that code too (of course it's fixable also :))
> > Thanks for bringing up this issue. What's current limit of number of files for
> BlueFS, from your experience?
> > Jianjian
> There is no limit right now, fnode ino is a unit64_t.  Directory is a ref counted
> object contains a file map. As the number of inodes grow, file_map will grow,
> will consume more memory.
> 
> Varada

Technically correct, my concern is one of performance for BlueFS journal compaction.
 
> >>>> I am not sure what optimization prefix seek will use internally for
> >>>> block
> >>> based format, but to me, it's hard to beat the case when you only
> >>> have one prefixed keys stored separately.
> >>>>>> Jianjian
> >>>>>>
> >>>>>> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang
> <haomai@xsky.com>
> >>> wrote:
> >>>>>>> Hi Sage and Mark,
> >>>>>>>
> >>>>>>> As mentioned in BlueStore standup, I found rocksdb iterator
> >>>>>>> *Seek* won't use bloom filter like *Get*.
> >>>>>>>
> >>>>>>> *Get* impl: it will look at filter firstly
> >>>>>>>
> >>>
> https://github.com/facebook/rocksdb/blob/master/table/block_based_t
> >>>>>>> able_reader.cc#L1369
> >>>>>>>
> >>>>>>> Iterator *Seek*: it will do binary search, by default we don't
> >>>>>>> specify prefix
> >>> feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-
> >>> Changes).
> >>>>>>>
> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L
> >>>>>>> 94
> >>>>>>>
> >>>>>>> So I use a simple tests:
> >>>>>>>
> >>>>>>> ./db_bench -num 10000000  -benchmarks fillbatch fill the db
> >>>>>>> firstly with 1000w records.
> >>>>>>>
> >>>>>>> ./db_bench -use_existing_db  -benchmarks readrandomfast
> >>>>>>> readrandomfast case will use *Get* API to retrive data
> >>>>>>>
> >>>>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
> >>>>>>> -benchmarks readrandomfast
> >>>>>>>
> >>>>>>> LevelDB:    version 4.3
> >>>>>>> Date:       Wed Jun  1 00:29:16 2016
> >>>>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> >>>>>>> CPUCache:   20480 KB
> >>>>>>> Keys:       16 bytes each
> >>>>>>> Values:     100 bytes each (50 bytes after compression)
> >>>>>>> Entries:    1000000
> >>>>>>> Prefix:    0 bytes
> >>>>>>> Keys per prefix:    0
> >>>>>>> RawSize:    110.6 MB (estimated)
> >>>>>>> FileSize:   62.9 MB (estimated)
> >>>>>>> Writes per second: 0
> >>>>>>> Compression: Snappy
> >>>>>>> Memtablerep: skip_list
> >>>>>>> Perf Level: 0
> >>>>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> >>>>>>> ------------------------------------------------
> >>>>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
> >>>>>>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
> >>>>>>> 1000100 found, issued 46639 non-exist keys)
> >>>>>>>
> >>>>>>> ===========================
> >>>>>>> then I modify readrandomfast to use Iterator API[0]:
> >>>>>>>
> >>>>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
> >>>>>>> -benchmarks readrandomfast
> >>>>>>> LevelDB:    version 4.3
> >>>>>>> Date:       Wed Jun  1 00:33:03 2016
> >>>>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> >>>>>>> CPUCache:   20480 KB
> >>>>>>> Keys:       16 bytes each
> >>>>>>> Values:     100 bytes each (50 bytes after compression)
> >>>>>>> Entries:    1000000
> >>>>>>> Prefix:    0 bytes
> >>>>>>> Keys per prefix:    0
> >>>>>>> RawSize:    110.6 MB (estimated)
> >>>>>>> FileSize:   62.9 MB (estimated)
> >>>>>>> Writes per second: 0
> >>>>>>> Compression: Snappy
> >>>>>>> Memtablerep: skip_list
> >>>>>>> Perf Level: 0
> >>>>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> >>>>>>> ------------------------------------------------
> >>>>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
> >>>>>>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
> >>>>>>> 1000100 found, issued 46639 non-exist keys)
> >>>>>>>
> >>>>>>>
> >>>>>>> 45.18 us/op vs 4.57us/op!
> >>>>>>>
> >>>>>>> The test can be repeated and easy to do! Plz correct if I'm
> >>>>>>> doing foolish thing I'm not aware..
> >>>>>>>
> >>>>>>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
> >>>>>>>
> >>>>>>> We still can make further improvements by scanning all iterate
> >>>>>>> usage to make it better!
> >>>>>>>
> >>>>>>> [0]:
> >>>>>>> --- a/db/db_bench.cc
> >>>>>>> +++ b/db/db_bench.cc
> >>>>>>> @@ -2923,14 +2923,12 @@ class Benchmark {
> >>>>>>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
> >>>>>>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
> >>>>>>>          ++read;
> >>>>>>> -        auto status = db->Get(options, key, &value);
> >>>>>>> -        if (status.ok()) {
> >>>>>>> -          ++found;
> >>>>>>> -        } else if (!status.IsNotFound()) {
> >>>>>>> -          fprintf(stderr, "Get returned an error: %s\n",
> >>>>>>> -                  status.ToString().c_str());
> >>>>>>> -          abort();
> >>>>>>> -        }
> >>>>>>> +        Iterator* iter = db->NewIterator(options);
> >>>>>>> +      iter->Seek(key);
> >>>>>>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
> >>>>>>> +        found++;
> >>>>>>> +      }
> >>>>>>> +
> >>>>>>>          if (key_rand >= FLAGS_num) {
> >>>>>>>            ++nonexist;
> >>>>>>>          }
> >>>>>>> --
> >>>>>>> To unsubscribe from this list: send the line "unsubscribe
> >>>>>>> ceph-devel" in the body of a message to
> >>>>>>> majordomo@vger.kernel.org More majordomo info at
> >>>>>>> http://vger.kernel.org/majordomo-info.html
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>>>> in the body of a message to majordomo@vger.kernel.org More
> >>> majordomo
> >>>>> info at  http://vger.kernel.org/majordomo-info.html
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe
> >>> ceph-devel" in the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay ʇڙ ,j  f   h   z  w      j
> > :+v   w j m        zZ+     ݢj"  ! i


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RocksDB Incorrect API Usage
  2016-06-01 13:46               ` Allen Samuels
@ 2016-06-01 14:24                 ` Varada Kari
  0 siblings, 0 replies; 14+ messages in thread
From: Varada Kari @ 2016-06-01 14:24 UTC (permalink / raw)
  To: Allen Samuels, Jianjian Huo, Haomai Wang
  Cc: Sage Weil, Mark Nelson, ceph-devel



On Wednesday 01 June 2016 07:16 PM, Allen Samuels wrote:
>> -----Original Message-----
>> From: Varada Kari
>> Sent: Tuesday, May 31, 2016 8:38 PM
>> To: Jianjian Huo <jianjian.huo@samsung.com>; Allen Samuels
>> <Allen.Samuels@sandisk.com>; Haomai Wang <haomai@xsky.com>
>> Cc: Sage Weil <sweil@redhat.com>; Mark Nelson <mnelson@redhat.com>;
>> ceph-devel@vger.kernel.org
>> Subject: Re: RocksDB Incorrect API Usage
>>
>>
>>
>> On Wednesday 01 June 2016 08:54 AM, Jianjian Huo wrote:
>>> Hi Allen,
>>>
>>> On Tue, May 31, 2016 at 7:52 PM, Allen Samuels
>> <Allen.Samuels@sandisk.com> wrote:
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>>>>> owner@vger.kernel.org] On Behalf Of Haomai Wang
>>>>> Sent: Tuesday, May 31, 2016 7:38 PM
>>>>> To: Jianjian Huo <jianjian.huo@samsung.com>
>>>>> Cc: Sage Weil <sweil@redhat.com>; Mark Nelson
>> <mnelson@redhat.com>;
>>>>> ceph-devel@vger.kernel.org
>>>>> Subject: Re: RocksDB Incorrect API Usage
>>>>>
>>>>> On Wed, Jun 1, 2016 at 3:40 AM, Jianjian Huo
>>>>> <jianjian.huo@samsung.com>
>>>>> wrote:
>>>>>> On Tue, May 31, 2016 at 11:12 AM, Haomai Wang <haomai@xsky.com>
>>>>> wrote:
>>>>>>> On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo
>>>>> <jianjian.huo@samsung.com> wrote:
>>>>>>>> Hi Haomai,
>>>>>>>>
>>>>>>>> I noticed this as well, and made same changes to RocksDBStore in
>>>>>>>> this PR
>>>>> last week:
>>>>>>>> https://github.com/ceph/ceph/pull/9215
>>>>>>>>
>>>>>>>> One thing which is even worse,  seek will bypass row cache, so kv
>>>>>>>> pairs
>>>>> won't be able to be cached in row cache.
>>>>>>>> I am working to benchmark the performance impact, will publish
>>>>>>>> the
>>>>> results after I am done this week.
>>>>>>> Oh, cool! I think you can cherry-pick my another leveldb fix.
>>>>>> No problem, I will do that.
>>>>>>> BTW, do you pay an attention to prefix seek api? I think it will
>>>>>>> be more suitable than column family in ceph case. If we can have
>>>>>>> well-defined prefix rule, we can make most of range query cheaper!
>>>>>> When keys with same prefix are stored in their own SST files(CF
>>>>>> case),
>>>>> even seeking without prefix will be faster than seeking with prefix
>>>>> but mixed with different prefixed keys?
>>>>>
>>>>> From my current view, prefix could be more flexible. For example,
>>>>> each rgw bucket index could use one prefix to make each bucket index
>>>>> object seek separated. For CF, it would be too heavry.
>>>> This might cause other problems. Would it dramatically increase the
>>>> number of files that BlueFS needs to manage? If so, that might
>>>> effectively break that code too (of course it's fixable also :))
>>> Thanks for bringing up this issue. What's current limit of number of files for
>> BlueFS, from your experience?
>>> Jianjian
>> There is no limit right now, fnode ino is a unit64_t.  Directory is a ref counted
>> object contains a file map. As the number of inodes grow, file_map will grow,
>> will consume more memory.
>>
>> Varada
> Technically correct, my concern is one of performance for BlueFS journal compaction.
>
Yes. I am working on optimizing the latency part during the compaction.

Varada
>>>>>> I am not sure what optimization prefix seek will use internally for
>>>>>> block
>>>>> based format, but to me, it's hard to beat the case when you only
>>>>> have one prefixed keys stored separately.
>>>>>>>> Jianjian
>>>>>>>>
>>>>>>>> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang
>> <haomai@xsky.com>
>>>>> wrote:
>>>>>>>>> Hi Sage and Mark,
>>>>>>>>>
>>>>>>>>> As mentioned in BlueStore standup, I found rocksdb iterator
>>>>>>>>> *Seek* won't use bloom filter like *Get*.
>>>>>>>>>
>>>>>>>>> *Get* impl: it will look at filter firstly
>>>>>>>>>
>> https://github.com/facebook/rocksdb/blob/master/table/block_based_t
>>>>>>>>> able_reader.cc#L1369
>>>>>>>>>
>>>>>>>>> Iterator *Seek*: it will do binary search, by default we don't
>>>>>>>>> specify prefix
>>>>> feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-
>>>>> Changes).
>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L
>>>>>>>>> 94
>>>>>>>>>
>>>>>>>>> So I use a simple tests:
>>>>>>>>>
>>>>>>>>> ./db_bench -num 10000000  -benchmarks fillbatch fill the db
>>>>>>>>> firstly with 1000w records.
>>>>>>>>>
>>>>>>>>> ./db_bench -use_existing_db  -benchmarks readrandomfast
>>>>>>>>> readrandomfast case will use *Get* API to retrive data
>>>>>>>>>
>>>>>>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
>>>>>>>>> -benchmarks readrandomfast
>>>>>>>>>
>>>>>>>>> LevelDB:    version 4.3
>>>>>>>>> Date:       Wed Jun  1 00:29:16 2016
>>>>>>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>>>>>>>> CPUCache:   20480 KB
>>>>>>>>> Keys:       16 bytes each
>>>>>>>>> Values:     100 bytes each (50 bytes after compression)
>>>>>>>>> Entries:    1000000
>>>>>>>>> Prefix:    0 bytes
>>>>>>>>> Keys per prefix:    0
>>>>>>>>> RawSize:    110.6 MB (estimated)
>>>>>>>>> FileSize:   62.9 MB (estimated)
>>>>>>>>> Writes per second: 0
>>>>>>>>> Compression: Snappy
>>>>>>>>> Memtablerep: skip_list
>>>>>>>>> Perf Level: 0
>>>>>>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>>>>>>>> ------------------------------------------------
>>>>>>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>>>>>>>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
>>>>>>>>> 1000100 found, issued 46639 non-exist keys)
>>>>>>>>>
>>>>>>>>> ===========================
>>>>>>>>> then I modify readrandomfast to use Iterator API[0]:
>>>>>>>>>
>>>>>>>>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db
>>>>>>>>> -benchmarks readrandomfast
>>>>>>>>> LevelDB:    version 4.3
>>>>>>>>> Date:       Wed Jun  1 00:33:03 2016
>>>>>>>>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>>>>>>>>> CPUCache:   20480 KB
>>>>>>>>> Keys:       16 bytes each
>>>>>>>>> Values:     100 bytes each (50 bytes after compression)
>>>>>>>>> Entries:    1000000
>>>>>>>>> Prefix:    0 bytes
>>>>>>>>> Keys per prefix:    0
>>>>>>>>> RawSize:    110.6 MB (estimated)
>>>>>>>>> FileSize:   62.9 MB (estimated)
>>>>>>>>> Writes per second: 0
>>>>>>>>> Compression: Snappy
>>>>>>>>> Memtablerep: skip_list
>>>>>>>>> Perf Level: 0
>>>>>>>>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>>>>>>>>> ------------------------------------------------
>>>>>>>>> DB path: [/tmp/rocksdbtest-0/dbbench]
>>>>>>>>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
>>>>>>>>> 1000100 found, issued 46639 non-exist keys)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 45.18 us/op vs 4.57us/op!
>>>>>>>>>
>>>>>>>>> The test can be repeated and easy to do! Plz correct if I'm
>>>>>>>>> doing foolish thing I'm not aware..
>>>>>>>>>
>>>>>>>>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>>>>>>>>>
>>>>>>>>> We still can make further improvements by scanning all iterate
>>>>>>>>> usage to make it better!
>>>>>>>>>
>>>>>>>>> [0]:
>>>>>>>>> --- a/db/db_bench.cc
>>>>>>>>> +++ b/db/db_bench.cc
>>>>>>>>> @@ -2923,14 +2923,12 @@ class Benchmark {
>>>>>>>>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>>>>>>>>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>>>>>>>>>          ++read;
>>>>>>>>> -        auto status = db->Get(options, key, &value);
>>>>>>>>> -        if (status.ok()) {
>>>>>>>>> -          ++found;
>>>>>>>>> -        } else if (!status.IsNotFound()) {
>>>>>>>>> -          fprintf(stderr, "Get returned an error: %s\n",
>>>>>>>>> -                  status.ToString().c_str());
>>>>>>>>> -          abort();
>>>>>>>>> -        }
>>>>>>>>> +        Iterator* iter = db->NewIterator(options);
>>>>>>>>> +      iter->Seek(key);
>>>>>>>>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
>>>>>>>>> +        found++;
>>>>>>>>> +      }
>>>>>>>>> +
>>>>>>>>>          if (key_rand >= FLAGS_num) {
>>>>>>>>>            ++nonexist;
>>>>>>>>>          }
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> ceph-devel" in the body of a message to
>>>>>>>>> majordomo@vger.kernel.org More majordomo info at
>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo
>>>>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> ceph-devel" in the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay ʇڙ ,j  f   h   z  w      j
>>> :+v   w j m        zZ+     ݢj"  ! i
>

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-06-01 14:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-31 16:49 RocksDB Incorrect API Usage Haomai Wang
2016-05-31 16:56 ` Mark Nelson
2016-05-31 17:17 ` Piotr Dałek
2016-05-31 17:47   ` Haomai Wang
2016-05-31 18:08 ` Jianjian Huo
2016-05-31 18:12   ` Haomai Wang
2016-05-31 19:40     ` Jianjian Huo
2016-06-01  2:38       ` Haomai Wang
2016-06-01  2:52         ` Allen Samuels
2016-06-01  3:24           ` Jianjian Huo
2016-06-01  3:37             ` Varada Kari
2016-06-01 13:46               ` Allen Samuels
2016-06-01 14:24                 ` Varada Kari
2016-06-01  3:13         ` Jianjian Huo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.