All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rbd: add queuing delay
@ 2010-06-20 20:44 Christian Brunner
  2010-06-21 23:52 ` Yehuda Sadeh Weinraub
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Brunner @ 2010-06-20 20:44 UTC (permalink / raw)
  To: ceph-devel

Hi Yehuda,

while running tests with qemu-io I've been experiencing a lot of
messages when running a large writev request (several hundred MB in
a single call):

10.06.20 22:10:07.337108 b67dcb70 client4136.objecter  pg 3.437e on [0] is laggy: 33
10.06.20 22:10:07.337708 b67dcb70 client4136.objecter  pg 3.2553 on [0] is laggy: 19
[...]

Everything is working fine, though. I think that the large number of
queued requests is the cause for this behaviour and I would propose to
delay futher requests (see attached patch).

What do you think about it?

Another question: I there a way to figure out max_osd through librados?

Christian

---
 block/rbd.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 74589cb..241b0c6 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -47,6 +47,14 @@
 
 #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
 
+/*
+ * For best performance MAX_RADOS_REQS should be at least as large as the 
+ * number of osds. It may be larger, but if to high you may experience lagging
+ *
+ * XXX: automatically set to 2*max_osd ???
+ */
+#define MAX_RADOS_REQS 16
+
 typedef struct RBDAIOCB {
     BlockDriverAIOCB common;
     QEMUBH *bh;
@@ -507,6 +515,11 @@ static BlockDriverAIOCB *rbd_aio_rw_vector(BlockDriverState *bs,
         rcb->segsize = segsize;
         rcb->buf = buf;
 
+        /* delay rados aio requests when the queue is getting to large */
+        while ((segnr - last_segnr + acb->aiocnt) > MAX_RADOS_REQS) {
+		usleep(100);
+	}
+
         if (write) {
             rados_aio_create_completion(rcb, NULL,
                                         (rados_callback_t) rbd_finish_aiocb,
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] rbd: add queuing delay
  2010-06-20 20:44 [PATCH] rbd: add queuing delay Christian Brunner
@ 2010-06-21 23:52 ` Yehuda Sadeh Weinraub
  2010-06-22 20:27   ` Christian Brunner
  0 siblings, 1 reply; 5+ messages in thread
From: Yehuda Sadeh Weinraub @ 2010-06-21 23:52 UTC (permalink / raw)
  To: Christian Brunner; +Cc: ceph-devel

On Sun, Jun 20, 2010 at 1:44 PM, Christian Brunner <chb@muc.de> wrote:
> Hi Yehuda,
>
> while running tests with qemu-io I've been experiencing a lot of
> messages when running a large writev request (several hundred MB in
> a single call):
>
> 10.06.20 22:10:07.337108 b67dcb70 client4136.objecter  pg 3.437e on [0] is laggy: 33
> 10.06.20 22:10:07.337708 b67dcb70 client4136.objecter  pg 3.2553 on [0] is laggy: 19
> [...]
>
> Everything is working fine, though. I think that the large number of
> queued requests is the cause for this behaviour and I would propose to
> delay futher requests (see attached patch).
>
> What do you think about it?

It seems that the osd is lagging behind. The usleep might work for you
as you avoid the pressure, but it's also somewhat random and will
probably hurt performance on other setups. I'd rather see a
configurable solution that lets you specify a total in-flight bytes or
some other resizable window scheme.

>
> Another question: I there a way to figure out max_osd through librados?

Not currently.

Thanks,
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rbd: add queuing delay
  2010-06-21 23:52 ` Yehuda Sadeh Weinraub
@ 2010-06-22 20:27   ` Christian Brunner
  2010-06-22 21:03     ` Yehuda Sadeh Weinraub
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Brunner @ 2010-06-22 20:27 UTC (permalink / raw)
  To: ceph-devel

On Tue, Jun 22, 2010 at 09:50:24PM +0200, Christian Brunner wrote:
> > while running tests with qemu-io I've been experiencing a lot of
> > messages when running a large writev request (several hundred MB in
> > a single call):
> >
> > 10.06.20 22:10:07.337108 b67dcb70 client4136.objecter  pg 3.437e on [0] is laggy: 33
> > 10.06.20 22:10:07.337708 b67dcb70 client4136.objecter  pg 3.2553 on [0] is laggy: 19
> > [...]
> >
> > Everything is working fine, though. I think that the large number of
> > queued requests is the cause for this behaviour and I would propose to
> > delay futher requests (see attached patch).
> >
> > What do you think about it?
> 
> It seems that the osd is lagging behind. The usleep might work for you
> as you avoid the pressure, but it's also somewhat random and will
> probably hurt performance on other setups. I'd rather see a
> configurable solution that lets you specify a total in-flight bytes or
> some other resizable window scheme.

I'm not sure if I understand what "lagging behind" means. If the in-flight
bytes are the sum of all requests in the queue, a solution could look like 
this (although it isn't configurable yet).

Christian

---
 block/rbd.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 10daf20..f87e84c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -50,6 +50,7 @@ int eventfd(unsigned int initval, int flags);
  */
 
 #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
+#define MAX_QUEUE_SIZE 33554432 // 32MB
 
 typedef struct RBDAIOCB {
     BlockDriverAIOCB common;
@@ -79,6 +80,7 @@ typedef struct BDRVRBDState {
     uint64_t size;
     uint64_t objsize;
     int qemu_aio_count;
+    uint64_t queuesize;
 } BDRVRBDState;
 
 typedef struct rbd_obj_header_ondisk RbdHeader1;
@@ -334,6 +336,7 @@ static int rbd_open(BlockDriverState *bs, const char *filename, int flags)
     le64_to_cpus((uint64_t *) & header->image_size);
     s->size = header->image_size;
     s->objsize = 1 << header->options.order;
+    s->queuesize = 0;
 
     s->efd = eventfd(0, 0);
     if (s->efd < 0) {
@@ -443,6 +446,7 @@ static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb)
     int i;
 
     acb->aiocnt--;
+    acb->s->queuesize -= rcb->segsize;
     r = rados_aio_get_return_value(c);
     rados_aio_release(c);
     if (acb->write) {
@@ -560,6 +564,12 @@ static BlockDriverAIOCB *rbd_aio_rw_vector(BlockDriverState *bs,
         rcb->segsize = segsize;
         rcb->buf = buf;
 
+        while  (s->queuesize > MAX_QUEUE_SIZE) {
+            usleep(100);
+        }
+
+        s->queuesize += segsize;
+
         if (write) {
             rados_aio_create_completion(rcb, NULL,
                                         (rados_callback_t) rbd_finish_aiocb,
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] rbd: add queuing delay
  2010-06-22 20:27   ` Christian Brunner
@ 2010-06-22 21:03     ` Yehuda Sadeh Weinraub
  2010-06-27 21:05       ` Christian Brunner
  0 siblings, 1 reply; 5+ messages in thread
From: Yehuda Sadeh Weinraub @ 2010-06-22 21:03 UTC (permalink / raw)
  To: Christian Brunner; +Cc: ceph-devel

On Tue, Jun 22, 2010 at 1:27 PM, Christian Brunner <chb@muc.de> wrote:
> On Tue, Jun 22, 2010 at 09:50:24PM +0200, Christian Brunner wrote:
>> > while running tests with qemu-io I've been experiencing a lot of
>> > messages when running a large writev request (several hundred MB in
>> > a single call):
>> >
>> > 10.06.20 22:10:07.337108 b67dcb70 client4136.objecter  pg 3.437e on [0] is laggy: 33
>> > 10.06.20 22:10:07.337708 b67dcb70 client4136.objecter  pg 3.2553 on [0] is laggy: 19
>> > [...]
>> >
>> > Everything is working fine, though. I think that the large number of
>> > queued requests is the cause for this behaviour and I would propose to
>> > delay futher requests (see attached patch).
>> >
>> > What do you think about it?
>>
>> It seems that the osd is lagging behind. The usleep might work for you
>> as you avoid the pressure, but it's also somewhat random and will
>> probably hurt performance on other setups. I'd rather see a
>> configurable solution that lets you specify a total in-flight bytes or
>> some other resizable window scheme.
>
> I'm not sure if I understand what "lagging behind" means. If the in-flight
> bytes are the sum of all requests in the queue, a solution could look like
> this (although it isn't configurable yet).
>
The problem is that the sleep will lead to having the osd being
underutilized in certain configurations. What you probably need here
is some mechanism that keeps feeding the osd with pending data
whenever old data has been cleared. E.g., make use of the async
callbacks to wake up the sender whenever the amount of pending
outgoing data has fell below some threshold.

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rbd: add queuing delay
  2010-06-22 21:03     ` Yehuda Sadeh Weinraub
@ 2010-06-27 21:05       ` Christian Brunner
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Brunner @ 2010-06-27 21:05 UTC (permalink / raw)
  To: ceph-devel

> >> > 10.06.20 22:10:07.337108 b67dcb70 client4136.objecter  pg 3.437e on [0] is laggy: 33
> >> > 10.06.20 22:10:07.337708 b67dcb70 client4136.objecter  pg 3.2553 on [0] is laggy: 19
> >> > [...]
> >> >
> >> > Everything is working fine, though. I think that the large number of
> >> > queued requests is the cause for this behaviour and I would propose to
> >> > delay futher requests (see attached patch).
> >>
> >> It seems that the osd is lagging behind. The usleep might work for you
> >> as you avoid the pressure, but it's also somewhat random and will
> >> probably hurt performance on other setups. I'd rather see a
> >> configurable solution that lets you specify a total in-flight bytes or
> >> some other resizable window scheme.
> >
> > I'm not sure if I understand what "lagging behind" means. If the in-flight
> > bytes are the sum of all requests in the queue, a solution could look like
> > this (although it isn't configurable yet).
> >
> The problem is that the sleep will lead to having the osd being
> underutilized in certain configurations. What you probably need here
> is some mechanism that keeps feeding the osd with pending data
> whenever old data has been cleared. E.g., make use of the async
> callbacks to wake up the sender whenever the amount of pending
> outgoing data has fell below some threshold.

Do you mean replacing the sleep with something like a futex? 

When thinking about this a little bit more I'm not sure if doing this in
the client is the right way to handle this. The client has no way to 
judge if the delay is caused by a general high load or by a single slow osd.
Maybe it would be better if libraos could notify the client to slow down?

Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-06-27 21:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-20 20:44 [PATCH] rbd: add queuing delay Christian Brunner
2010-06-21 23:52 ` Yehuda Sadeh Weinraub
2010-06-22 20:27   ` Christian Brunner
2010-06-22 21:03     ` Yehuda Sadeh Weinraub
2010-06-27 21:05       ` Christian Brunner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.