All of lore.kernel.org
 help / color / mirror / Atom feed
* parallel transaction submit
@ 2016-08-25  7:45 Tang, Haodong
  2016-08-25  7:55 ` Haomai Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Tang, Haodong @ 2016-08-25  7:45 UTC (permalink / raw)
  To: sweil, varada.kari, ceph-devel

Hi Sage, Varada
 
Noticed you are making parallel transaction submits, we also worked out a prototype that looks similar, here is the link for the implementation: https://github.com/ceph/ceph/pull/10856
 
Background: 
From the perf counter we added, found it spent a lot time in kv_queue, that is, single thread transaction submits is not competent to handle the transaction from OSD.
 
Implementation: 
The key thought is to use multiple thread and assign each TransContext to one of the processing threads. In order to parallelize transaction submit, add different kv_locks and kv_conds for each thread.
 
Performance evaluation: 
Test ENV:
	4 x server, 4 x client, 16 x Intel S3700 as block device, and 4 x Intel P3600 as Rocksdb/WAL device. 
Performance: 
We also did several quick tests to verify the performance benefit, the results showed that parallel transaction submission will brought 10% performance improvement if using memdb, but little performance improvement with rocksdb. 
 
What's more, without parallel transaction submits, we also see performance boost if just changing to MemDB, but a little.
 
Test summary:
QD Scaling Test - 4k Random Write:
                                                                                  QD = 1      QD = 16     QD = 32      QD = 64      QD = 128
With rocksdb (IOPS)                                              682            173000       190000        203000       204000
With memdb (IOPS)                                              704            180000       194000        206000       218000
With rocksdb+multiple_kv_thread(IOPS)          /                164243        167037        180961      201752
With memdb+multiple_kv_thread(IOPS)          /                 176000       200000        221000      227000
 

It seems single thread of transaction submits will be a bottleneck if using MemDB.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: parallel transaction submit
  2016-08-25  7:45 parallel transaction submit Tang, Haodong
@ 2016-08-25  7:55 ` Haomai Wang
  2016-08-25  8:47   ` Varada Kari
  2016-08-25  8:48   ` Tang, Haodong
  0 siblings, 2 replies; 6+ messages in thread
From: Haomai Wang @ 2016-08-25  7:55 UTC (permalink / raw)
  To: Tang, Haodong; +Cc: sweil, varada.kari, ceph-devel

looks very litlle improvements. rocksdb result meet my expectation
because rocksdb internal has lock for multi sync write. But memdb
improments is a little confusing.

On Thu, Aug 25, 2016 at 3:45 PM, Tang, Haodong <haodong.tang@intel.com> wrote:
> Hi Sage, Varada
>
> Noticed you are making parallel transaction submits, we also worked out a prototype that looks similar, here is the link for the implementation: https://github.com/ceph/ceph/pull/10856
>
> Background:
> From the perf counter we added, found it spent a lot time in kv_queue, that is, single thread transaction submits is not competent to handle the transaction from OSD.
>
> Implementation:
> The key thought is to use multiple thread and assign each TransContext to one of the processing threads. In order to parallelize transaction submit, add different kv_locks and kv_conds for each thread.
>
> Performance evaluation:
> Test ENV:
>         4 x server, 4 x client, 16 x Intel S3700 as block device, and 4 x Intel P3600 as Rocksdb/WAL device.
> Performance:
> We also did several quick tests to verify the performance benefit, the results showed that parallel transaction submission will brought 10% performance improvement if using memdb, but little performance improvement with rocksdb.
>
> What's more, without parallel transaction submits, we also see performance boost if just changing to MemDB, but a little.
>
> Test summary:
> QD Scaling Test - 4k Random Write:
>                                                                                   QD = 1      QD = 16     QD = 32      QD = 64      QD = 128
> With rocksdb (IOPS)                                              682            173000       190000        203000       204000
> With memdb (IOPS)                                              704            180000       194000        206000       218000
> With rocksdb+multiple_kv_thread(IOPS)          /                164243        167037        180961      201752
> With memdb+multiple_kv_thread(IOPS)          /                 176000       200000        221000      227000
>
>
> It seems single thread of transaction submits will be a bottleneck if using MemDB.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: parallel transaction submit
  2016-08-25  7:55 ` Haomai Wang
@ 2016-08-25  8:47   ` Varada Kari
  2016-08-25 14:11     ` Sage Weil
  2016-08-25  8:48   ` Tang, Haodong
  1 sibling, 1 reply; 6+ messages in thread
From: Varada Kari @ 2016-08-25  8:47 UTC (permalink / raw)
  To: Haomai Wang, Tang, Haodong; +Cc: sweil, ceph-devel

Hi,

Increasing the number of the kv_sync_threads is not giving much of
performance. In the current threading model, shard_worker submits the IO
to the block device which are handled by aio_callback thread(which is
one) and submits to the kv_sync thread, which batches the requests and
submits to the rocksdb. Because kv_sync batches the requests and submits
the requests, we might observe more time spent on kv_sync_thread
routine. And i haven't observed much of an improvement by adding more
threads here.

But when increased the number of callbacks thread from aio(still needs
some refinements in polling for the request completions) and completing
the write completion in the same thread context increased some
performance. I don't have the numbers to say how much, but that is
better than having multiple kv_sync threads, adding one more queue and
lock. You can refer to
https://github.com/varadakari/ceph/commits/wip-parallel-aiocb (ignore
the first commit, was trying to do sync transaction in the same thread
context of sharded worker to measure the latency).

was exploring a way to have the aio callback thread matached/reserved at
the time of io submission, so that we don't need to do io_getevents(),
kind of a async callback to the specified thread so that we can avoid
some waiting logic in io_getevents() and process the request in the same
thread context. You can refer to
http://manpages.ubuntu.com/manpages/wily/man3/io_set_callback.3.html. I
don't have the working code ready for this. FWIW, that is worth
experimenting and see if it reduces any latency.

Varada

On Thursday 25 August 2016 01:25 PM, Haomai Wang wrote:
> looks very litlle improvements. rocksdb result meet my expectation
> because rocksdb internal has lock for multi sync write. But memdb
> improments is a little confusing.
>
> On Thu, Aug 25, 2016 at 3:45 PM, Tang, Haodong <haodong.tang@intel.com> wrote:
>> Hi Sage, Varada
>>
>> Noticed you are making parallel transaction submits, we also worked out a prototype that looks similar, here is the link for the implementation: https://github.com/ceph/ceph/pull/10856
>>
>> Background:
>> From the perf counter we added, found it spent a lot time in kv_queue, that is, single thread transaction submits is not competent to handle the transaction from OSD.
>>
>> Implementation:
>> The key thought is to use multiple thread and assign each TransContext to one of the processing threads. In order to parallelize transaction submit, add different kv_locks and kv_conds for each thread.
>>
>> Performance evaluation:
>> Test ENV:
>>         4 x server, 4 x client, 16 x Intel S3700 as block device, and 4 x Intel P3600 as Rocksdb/WAL device.
>> Performance:
>> We also did several quick tests to verify the performance benefit, the results showed that parallel transaction submission will brought 10% performance improvement if using memdb, but little performance improvement with rocksdb.
>>
>> What's more, without parallel transaction submits, we also see performance boost if just changing to MemDB, but a little.
>>
>> Test summary:
>> QD Scaling Test - 4k Random Write:
>>                                                                                   QD = 1      QD = 16     QD = 32      QD = 64      QD = 128
>> With rocksdb (IOPS)                                              682            173000       190000        203000       204000
>> With memdb (IOPS)                                              704            180000       194000        206000       218000
>> With rocksdb+multiple_kv_thread(IOPS)          /                164243        167037        180961      201752
>> With memdb+multiple_kv_thread(IOPS)          /                 176000       200000        221000      227000
>>
>>
>> It seems single thread of transaction submits will be a bottleneck if using MemDB.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: parallel transaction submit
  2016-08-25  7:55 ` Haomai Wang
  2016-08-25  8:47   ` Varada Kari
@ 2016-08-25  8:48   ` Tang, Haodong
  1 sibling, 0 replies; 6+ messages in thread
From: Tang, Haodong @ 2016-08-25  8:48 UTC (permalink / raw)
  To: Haomai Wang; +Cc: sweil, varada.kari, ceph-devel

Yeah, from the perf counter, we do reduce time waiting for kv_queue, but increase rocksdb submisstion time. Seems it is better to do batch submission for rocksdb instead of parallel submission. Besides, parallel submission also make bdev->flush more frequent.

From the test result, there is a little performance improvement if using memdb, but compared with rocksdb, throughput is more stable.

-----Original Message-----
From: Haomai Wang [mailto:haomai@xsky.com] 
Sent: Thursday, August 25, 2016 3:55 PM
To: Tang, Haodong <haodong.tang@intel.com>
Cc: sweil@redhat.com; varada.kari@sandisk.com; ceph-devel@vger.kernel.org
Subject: Re: parallel transaction submit

looks very litlle improvements. rocksdb result meet my expectation because rocksdb internal has lock for multi sync write. But memdb improments is a little confusing.

On Thu, Aug 25, 2016 at 3:45 PM, Tang, Haodong <haodong.tang@intel.com> wrote:
> Hi Sage, Varada
>
> Noticed you are making parallel transaction submits, we also worked 
> out a prototype that looks similar, here is the link for the 
> implementation: https://github.com/ceph/ceph/pull/10856
>
> Background:
> From the perf counter we added, found it spent a lot time in kv_queue, that is, single thread transaction submits is not competent to handle the transaction from OSD.
>
> Implementation:
> The key thought is to use multiple thread and assign each TransContext to one of the processing threads. In order to parallelize transaction submit, add different kv_locks and kv_conds for each thread.
>
> Performance evaluation:
> Test ENV:
>         4 x server, 4 x client, 16 x Intel S3700 as block device, and 4 x Intel P3600 as Rocksdb/WAL device.
> Performance:
> We also did several quick tests to verify the performance benefit, the results showed that parallel transaction submission will brought 10% performance improvement if using memdb, but little performance improvement with rocksdb.
>
> What's more, without parallel transaction submits, we also see performance boost if just changing to MemDB, but a little.
>
> Test summary:
> QD Scaling Test - 4k Random Write:
>                                                                                   QD = 1      QD = 16     QD = 32      QD = 64      QD = 128
> With rocksdb (IOPS)                                              682            173000       190000        203000       204000
> With memdb (IOPS)                                              704            180000       194000        206000       218000
> With rocksdb+multiple_kv_thread(IOPS)          /                164243        167037        180961      201752
> With memdb+multiple_kv_thread(IOPS)          /                 176000       200000        221000      227000
>
>
> It seems single thread of transaction submits will be a bottleneck if using MemDB.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: parallel transaction submit
  2016-08-25  8:47   ` Varada Kari
@ 2016-08-25 14:11     ` Sage Weil
  2016-08-25 14:26       ` Varada Kari
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2016-08-25 14:11 UTC (permalink / raw)
  To: Varada Kari; +Cc: Haomai Wang, Tang, Haodong, ceph-devel

On Thu, 25 Aug 2016, Varada Kari wrote:
> Hi,
> 
> Increasing the number of the kv_sync_threads is not giving much of
> performance. In the current threading model, shard_worker submits the IO
> to the block device which are handled by aio_callback thread(which is
> one) and submits to the kv_sync thread, which batches the requests and
> submits to the rocksdb. Because kv_sync batches the requests and submits
> the requests, we might observe more time spent on kv_sync_thread
> routine. And i haven't observed much of an improvement by adding more
> threads here.
> 
> But when increased the number of callbacks thread from aio(still needs
> some refinements in polling for the request completions) and completing
> the write completion in the same thread context increased some
> performance. I don't have the numbers to say how much, but that is
> better than having multiple kv_sync threads, adding one more queue and
> lock. You can refer to
> https://github.com/varadakari/ceph/commits/wip-parallel-aiocb (ignore
> the first commit, was trying to do sync transaction in the same thread
> context of sharded worker to measure the latency).

Yeah, I think this is right.  I see two avenues of attack:

- Try to eliminate the handoff to _kv_sync_thread by having the 
transaction submitted to rocksdb in the calling thread.  This will 
require a bit of refactoring but I think it's possible. We don't actually 
want to block, though, so it'll be an async submission, and we'll still 
need kv_sync_thread just telling rocksdb to commit in a loop and 
triggering callbacks.  A recent PR sharded the completion finishers so I'm 
guessing the final step would be some affinity thing that pins the 
finishers to the same cores as the submitters?

- Shard the io completion (before we submit the kv transaction).  Not sure 
if we want a thread per shard, or polls at opportunistic/strategic points 
in code.  The goal would be keeping the processing local to the 
core/socket (vs the current strategy of a single thread waiting/polling 
for completions and doing the next phase of work).

> was exploring a way to have the aio callback thread matached/reserved at
> the time of io submission, so that we don't need to do io_getevents(),
> kind of a async callback to the specified thread so that we can avoid
> some waiting logic in io_getevents() and process the request in the same
> thread context. You can refer to
> http://manpages.ubuntu.com/manpages/wily/man3/io_set_callback.3.html. I
> don't have the working code ready for this. FWIW, that is worth
> experimenting and see if it reduces any latency.

I don't think this will help--it just means you're using a layer of the 
library that's calling getevents for you and calling your callback.

Thanks!
sage


> 
> Varada
> 
> On Thursday 25 August 2016 01:25 PM, Haomai Wang wrote:
> > looks very litlle improvements. rocksdb result meet my expectation
> > because rocksdb internal has lock for multi sync write. But memdb
> > improments is a little confusing.
> >
> > On Thu, Aug 25, 2016 at 3:45 PM, Tang, Haodong <haodong.tang@intel.com> wrote:
> >> Hi Sage, Varada
> >>
> >> Noticed you are making parallel transaction submits, we also worked out a prototype that looks similar, here is the link for the implementation: https://github.com/ceph/ceph/pull/10856
> >>
> >> Background:
> >> From the perf counter we added, found it spent a lot time in kv_queue, that is, single thread transaction submits is not competent to handle the transaction from OSD.
> >>
> >> Implementation:
> >> The key thought is to use multiple thread and assign each TransContext to one of the processing threads. In order to parallelize transaction submit, add different kv_locks and kv_conds for each thread.
> >>
> >> Performance evaluation:
> >> Test ENV:
> >>         4 x server, 4 x client, 16 x Intel S3700 as block device, and 4 x Intel P3600 as Rocksdb/WAL device.
> >> Performance:
> >> We also did several quick tests to verify the performance benefit, the results showed that parallel transaction submission will brought 10% performance improvement if using memdb, but little performance improvement with rocksdb.
> >>
> >> What's more, without parallel transaction submits, we also see performance boost if just changing to MemDB, but a little.
> >>
> >> Test summary:
> >> QD Scaling Test - 4k Random Write:
> >>                                                                                   QD = 1      QD = 16     QD = 32      QD = 64      QD = 128
> >> With rocksdb (IOPS)                                              682            173000       190000        203000       204000
> >> With memdb (IOPS)                                              704            180000       194000        206000       218000
> >> With rocksdb+multiple_kv_thread(IOPS)          /                164243        167037        180961      201752
> >> With memdb+multiple_kv_thread(IOPS)          /                 176000       200000        221000      227000
> >>
> >>
> >> It seems single thread of transaction submits will be a bottleneck if using MemDB.
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: parallel transaction submit
  2016-08-25 14:11     ` Sage Weil
@ 2016-08-25 14:26       ` Varada Kari
  0 siblings, 0 replies; 6+ messages in thread
From: Varada Kari @ 2016-08-25 14:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: Haomai Wang, Tang, Haodong, ceph-devel

On Thursday 25 August 2016 07:41 PM, Sage Weil wrote:
> On Thu, 25 Aug 2016, Varada Kari wrote:
>> Hi,
>>
>> Increasing the number of the kv_sync_threads is not giving much of
>> performance. In the current threading model, shard_worker submits the IO
>> to the block device which are handled by aio_callback thread(which is
>> one) and submits to the kv_sync thread, which batches the requests and
>> submits to the rocksdb. Because kv_sync batches the requests and submits
>> the requests, we might observe more time spent on kv_sync_thread
>> routine. And i haven't observed much of an improvement by adding more
>> threads here.
>>
>> But when increased the number of callbacks thread from aio(still needs
>> some refinements in polling for the request completions) and completing
>> the write completion in the same thread context increased some
>> performance. I don't have the numbers to say how much, but that is
>> better than having multiple kv_sync threads, adding one more queue and
>> lock. You can refer to
>> https://github.com/varadakari/ceph/commits/wip-parallel-aiocb (ignore
>> the first commit, was trying to do sync transaction in the same thread
>> context of sharded worker to measure the latency).
> Yeah, I think this is right.  I see two avenues of attack:
>
> - Try to eliminate the handoff to _kv_sync_thread by having the 
> transaction submitted to rocksdb in the calling thread.  This will 
> require a bit of refactoring but I think it's possible. We don't actually 
> want to block, though, so it'll be an async submission, and we'll still 
> need kv_sync_thread just telling rocksdb to commit in a loop and 
> triggering callbacks.  A recent PR sharded the completion finishers so I'm 
> guessing the final step would be some affinity thing that pins the 
> finishers to the same cores as the submitters?
I kind of copied what kv_sync_thread does and was able to run multiple
callbacks at the same time.
If we can complete the write ack in the same aio_cb thread not handing
off to finisher thread, we can eliminate
one thread switch and we can have same number of threads as shards. We
can use the same logic to handle the finishers
(not sure if we can process the request by reading the osr here) for the
callback threads.

Varada
> - Shard the io completion (before we submit the kv transaction).  Not sure 
> if we want a thread per shard, or polls at opportunistic/strategic points 
> in code.  The goal would be keeping the processing local to the 
> core/socket (vs the current strategy of a single thread waiting/polling 
> for completions and doing the next phase of work).
>
>> was exploring a way to have the aio callback thread matached/reserved at
>> the time of io submission, so that we don't need to do io_getevents(),
>> kind of a async callback to the specified thread so that we can avoid
>> some waiting logic in io_getevents() and process the request in the same
>> thread context. You can refer to
>> http://manpages.ubuntu.com/manpages/wily/man3/io_set_callback.3.html. I
>> don't have the working code ready for this. FWIW, that is worth
>> experimenting and see if it reduces any latency.
> I don't think this will help--it just means you're using a layer of the 
> library that's calling getevents for you and calling your callback.
>
> Thanks!
> sage
>
>
>> Varada
>>
>> On Thursday 25 August 2016 01:25 PM, Haomai Wang wrote:
>>> looks very litlle improvements. rocksdb result meet my expectation
>>> because rocksdb internal has lock for multi sync write. But memdb
>>> improments is a little confusing.
>>>
>>> On Thu, Aug 25, 2016 at 3:45 PM, Tang, Haodong <haodong.tang@intel.com> wrote:
>>>> Hi Sage, Varada
>>>>
>>>> Noticed you are making parallel transaction submits, we also worked out a prototype that looks similar, here is the link for the implementation: https://github.com/ceph/ceph/pull/10856
>>>>
>>>> Background:
>>>> From the perf counter we added, found it spent a lot time in kv_queue, that is, single thread transaction submits is not competent to handle the transaction from OSD.
>>>>
>>>> Implementation:
>>>> The key thought is to use multiple thread and assign each TransContext to one of the processing threads. In order to parallelize transaction submit, add different kv_locks and kv_conds for each thread.
>>>>
>>>> Performance evaluation:
>>>> Test ENV:
>>>>         4 x server, 4 x client, 16 x Intel S3700 as block device, and 4 x Intel P3600 as Rocksdb/WAL device.
>>>> Performance:
>>>> We also did several quick tests to verify the performance benefit, the results showed that parallel transaction submission will brought 10% performance improvement if using memdb, but little performance improvement with rocksdb.
>>>>
>>>> What's more, without parallel transaction submits, we also see performance boost if just changing to MemDB, but a little.
>>>>
>>>> Test summary:
>>>> QD Scaling Test - 4k Random Write:
>>>>                                                                                   QD = 1      QD = 16     QD = 32      QD = 64      QD = 128
>>>> With rocksdb (IOPS)                                              682            173000       190000        203000       204000
>>>> With memdb (IOPS)                                              704            180000       194000        206000       218000
>>>> With rocksdb+multiple_kv_thread(IOPS)          /                164243        167037        180961      201752
>>>> With memdb+multiple_kv_thread(IOPS)          /                 176000       200000        221000      227000
>>>>
>>>>
>>>> It seems single thread of transaction submits will be a bottleneck if using MemDB.
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-25 14:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25  7:45 parallel transaction submit Tang, Haodong
2016-08-25  7:55 ` Haomai Wang
2016-08-25  8:47   ` Varada Kari
2016-08-25 14:11     ` Sage Weil
2016-08-25 14:26       ` Varada Kari
2016-08-25  8:48   ` Tang, Haodong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.