All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blkback: Fix block I/O latency issue
@ 2011-05-02  7:04 Vincent, Pradeep
  2011-05-02  8:13 ` Jan Beulich
  2011-05-28 20:12 ` [RE-PATCH] " Daniel Stodden
  0 siblings, 2 replies; 33+ messages in thread
From: Vincent, Pradeep @ 2011-05-02  7:04 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1274 bytes --]

In blkback driver, after I/O requests are submitted to Dom-0 block I/O subsystem, blkback goes to 'sleep' effectively without letting blkfront know about it (req_event isn't set appropriately). Hence blkfront doesn't notify blkback when it submits a new I/O thus delaying the 'dispatch' of the new I/O to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as one of the previous I/Os completes.

As a result of this issue, the block I/O latency performance is degraded for some workloads on Xen guests using blkfront-blkback stack.

The following change addresses this issue:


Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>

diff --git a/drivers/xen/blkback/blkback.c b/drivers/xen/blkback/blkback.c
--- a/drivers/xen/blkback/blkback.c
+++ b/drivers/xen/blkback/blkback.c
@@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
  cond_resched();
  }

+ /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
+   let blkfront know about it (by setting req_event appropriately) so that
+   blkfront will bother to wake us up (via interrupt) when it submits a
+   new I/O */
+        if (!more_to_do)
+                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
  return more_to_do;
 }





[-- Attachment #1.2: Type: text/html, Size: 2329 bytes --]

[-- Attachment #2: blkback-bugfix-reqevent-assignment.patch --]
[-- Type: application/octet-stream, Size: 662 bytes --]

Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>

diff --git a/drivers/xen/blkback/blkback.c b/drivers/xen/blkback/blkback.c
--- a/drivers/xen/blkback/blkback.c
+++ b/drivers/xen/blkback/blkback.c
@@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
 		cond_resched();
 	}
 
+	/* If blkback might go to sleep (i.e. more_to_do == 0) then we better
+	   let blkfront know about it (by setting req_event appropriately) so that
+	   blkfront will bother to wake us up (via interrupt) when it submits a 
+	   new I/O */
+        if (!more_to_do)
+                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
 	return more_to_do;
 }
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-02  7:04 [PATCH] blkback: Fix block I/O latency issue Vincent, Pradeep
@ 2011-05-02  8:13 ` Jan Beulich
  2011-05-03  1:10   ` Vincent, Pradeep
  2011-05-28 20:12 ` [RE-PATCH] " Daniel Stodden
  1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2011-05-02  8:13 UTC (permalink / raw)
  To: Pradeep Vincent; +Cc: xen-devel

>>> On 02.05.11 at 09:04, "Vincent, Pradeep" <pradeepv@amazon.com> wrote:
> In blkback driver, after I/O requests are submitted to Dom-0 block I/O 
> subsystem, blkback goes to 'sleep' effectively without letting blkfront know 
> about it (req_event isn't set appropriately). Hence blkfront doesn't notify 
> blkback when it submits a new I/O thus delaying the 'dispatch' of the new I/O 
> to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as one of the 
> previous I/Os completes.
> 
> As a result of this issue, the block I/O latency performance is degraded for 
> some workloads on Xen guests using blkfront-blkback stack.
> 
> The following change addresses this issue:
> 
> 
> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> 
> diff --git a/drivers/xen/blkback/blkback.c b/drivers/xen/blkback/blkback.c
> --- a/drivers/xen/blkback/blkback.c
> +++ b/drivers/xen/blkback/blkback.c
> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
>   cond_resched();
>   }
> 
> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
> +   let blkfront know about it (by setting req_event appropriately) so that
> +   blkfront will bother to wake us up (via interrupt) when it submits a
> +   new I/O */
> +        if (!more_to_do)
> +                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);

To me this contradicts the comment preceding the use of
RING_FINAL_CHECK_FOR_REQUESTS() in make_response()
(there it's supposedly used to avoid unnecessary notification,
here you say it's used to force notification). Albeit I agree that
the change looks consistent with the comments in io/ring.h.

Even if correct, you're not holding blkif->blk_ring_lock here, and
hence I think you'll need to explain how this is not a problem.

>From a formal perspective, you also want to correct usage of tabs,
and (assuming this is intended for the 2.6.18 tree) you'd also need
to indicate so for Keir to pick this up and apply it to that tree (and
it might then also be a good idea to submit an equivalent patch for
the pv-ops trees).

Jan

>   return more_to_do;
>  }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-02  8:13 ` Jan Beulich
@ 2011-05-03  1:10   ` Vincent, Pradeep
  2011-05-03 14:55     ` Konrad Rzeszutek Wilk
  2011-05-03 17:52     ` Daniel Stodden
  0 siblings, 2 replies; 33+ messages in thread
From: Vincent, Pradeep @ 2011-05-03  1:10 UTC (permalink / raw)
  To: Jan Beulich, xen-devel, Jeremy Fitzhardinge

[-- Attachment #1: Type: text/plain, Size: 5048 bytes --]

Thanks Jan.

Re: avoid unnecessary notification

If this was a deliberate design choice then the duration of the delay is
at the mercy of the pending I/O latencies & I/O patterns and the delay is
simply too long in some cases. E.g. A write I/O stuck behind a read I/O
could see more than double the latency on a Xen guest compared to a
baremetal host. Avoiding notifications this way results in significant
latency degradation perceived by many applications.

If this is about allowing I/O scheduler to coalesce more I/Os, then I bet
I/O scheduler's 'wait and coalesce' logic is a great substitute for the
delays introduced by blkback.

I totally agree IRQ coalescing or delay is useful for both blkback and
netback but we need a logic that doesn't impact I/O latencies
significantly. Also, I don't think netback has this type of notification
avoidance logic (at least in 2.6.18 code base).


Re: Other points

Good call. Changed the patch to include tabs.

I wasn't very sure about blk_ring_lock usage and I should have clarified
it before sending out the patch.

Assuming blk_ring_lock was meant to protect shared ring manipulations
within blkback, is there a reason 'blk_rings->common.req_cons'
manipulation in do_block_io_op is not protected ? The reasons for the
differences between locking logic in do_block_io_op and make_response
weren't terribly obvious although the failure mode for the race condition
may very well be benign.

Anyway, I am attaching a patch with appropriate changes.

Jeremey, Can you apply this patch to pvops Dom-0
(http://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git). Should I
submit another patch for 2.6.18 Dom-0 ?


Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>

diff --git a/drivers/xen/blkback/blkback.c b/drivers/xen/blkback/blkback.c
--- a/drivers/xen/blkback/blkback.c
+++ b/drivers/xen/blkback/blkback.c
@@ -315,6 +315,7 @@ static int do_block_io_op(blkif_t *blkif)
  pending_req_t *pending_req;
  RING_IDX rc, rp;
  int more_to_do = 0;
+ unsigned long     flags;
 
  rc = blk_rings->common.req_cons;
  rp = blk_rings->common.sring->req_prod;
@@ -383,6 +384,15 @@ static int do_block_io_op(blkif_t *blkif)
   cond_resched();
  }
 
+ /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
+    let blkfront know about it (by setting req_event appropriately) so
that
+    blkfront will bother to wake us up (via interrupt) when it submits a
+    new I/O */
+ if (!more_to_do){
+  spin_lock_irqsave(&blkif->blk_ring_lock, flags);
+  RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
+  spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
+ }
  return more_to_do;
 }
 







On 5/2/11 1:13 AM, "Jan Beulich" <JBeulich@novell.com> wrote:

>>>> On 02.05.11 at 09:04, "Vincent, Pradeep" <pradeepv@amazon.com> wrote:
>> In blkback driver, after I/O requests are submitted to Dom-0 block I/O
>> subsystem, blkback goes to 'sleep' effectively without letting blkfront
>>know 
>> about it (req_event isn't set appropriately). Hence blkfront doesn't
>>notify 
>> blkback when it submits a new I/O thus delaying the 'dispatch' of the
>>new I/O 
>> to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as one
>>of the 
>> previous I/Os completes.
>> 
>> As a result of this issue, the block I/O latency performance is
>>degraded for 
>> some workloads on Xen guests using blkfront-blkback stack.
>> 
>> The following change addresses this issue:
>> 
>> 
>> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
>> 
>> diff --git a/drivers/xen/blkback/blkback.c
>>b/drivers/xen/blkback/blkback.c
>> --- a/drivers/xen/blkback/blkback.c
>> +++ b/drivers/xen/blkback/blkback.c
>> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
>>   cond_resched();
>>   }
>> 
>> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
>> +   let blkfront know about it (by setting req_event appropriately) so
>>that
>> +   blkfront will bother to wake us up (via interrupt) when it submits a
>> +   new I/O */
>> +        if (!more_to_do)
>> +                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common,
>>more_to_do);
>
>To me this contradicts the comment preceding the use of
>RING_FINAL_CHECK_FOR_REQUESTS() in make_response()
>(there it's supposedly used to avoid unnecessary notification,
>here you say it's used to force notification). Albeit I agree that
>the change looks consistent with the comments in io/ring.h.
>
>Even if correct, you're not holding blkif->blk_ring_lock here, and
>hence I think you'll need to explain how this is not a problem.
>
>From a formal perspective, you also want to correct usage of tabs,
>and (assuming this is intended for the 2.6.18 tree) you'd also need
>to indicate so for Keir to pick this up and apply it to that tree (and
>it might then also be a good idea to submit an equivalent patch for
>the pv-ops trees).
>
>Jan
>
>>   return more_to_do;
>>  }
>
>
>


[-- Attachment #2: blkback-bugfix-reqevent-assignment.patch --]
[-- Type: application/octet-stream, Size: 993 bytes --]

Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>

diff --git a/drivers/xen/blkback/blkback.c b/drivers/xen/blkback/blkback.c
--- a/drivers/xen/blkback/blkback.c
+++ b/drivers/xen/blkback/blkback.c
@@ -315,6 +315,7 @@ static int do_block_io_op(blkif_t *blkif)
 	pending_req_t *pending_req;
 	RING_IDX rc, rp;
 	int more_to_do = 0;
+	unsigned long     flags;
 
 	rc = blk_rings->common.req_cons;
 	rp = blk_rings->common.sring->req_prod;
@@ -383,6 +384,15 @@ static int do_block_io_op(blkif_t *blkif)
 		cond_resched();
 	}
 
+	/* If blkback might go to sleep (i.e. more_to_do == 0) then we better
+	   let blkfront know about it (by setting req_event appropriately) so that
+	   blkfront will bother to wake us up (via interrupt) when it submits a 
+	   new I/O */
+	if (!more_to_do){
+		spin_lock_irqsave(&blkif->blk_ring_lock, flags);
+		RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
+		spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
+	}
 	return more_to_do;
 }
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-03  1:10   ` Vincent, Pradeep
@ 2011-05-03 14:55     ` Konrad Rzeszutek Wilk
  2011-05-03 17:16       ` Vincent, Pradeep
  2011-05-03 17:52     ` Daniel Stodden
  1 sibling, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-03 14:55 UTC (permalink / raw)
  To: Vincent, Pradeep; +Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich

On Mon, May 02, 2011 at 06:10:22PM -0700, Vincent, Pradeep wrote:
> Thanks Jan.
> 
> Re: avoid unnecessary notification
> 
> If this was a deliberate design choice then the duration of the delay is
> at the mercy of the pending I/O latencies & I/O patterns and the delay is
> simply too long in some cases. E.g. A write I/O stuck behind a read I/O
> could see more than double the latency on a Xen guest compared to a
> baremetal host. Avoiding notifications this way results in significant
> latency degradation perceived by many applications.

You sure this is not the fault of the IO scheduler? I had similar issues
with the CFQ scheduler upstream and found out that I needed to add
REQ_SYNC on write requests.
> 
> If this is about allowing I/O scheduler to coalesce more I/Os, then I bet
> I/O scheduler's 'wait and coalesce' logic is a great substitute for the
> delays introduced by blkback.
> 
> I totally agree IRQ coalescing or delay is useful for both blkback and
> netback but we need a logic that doesn't impact I/O latencies
> significantly. Also, I don't think netback has this type of notification
> avoidance logic (at least in 2.6.18 code base).
> 
> 
> Re: Other points
> 
> Good call. Changed the patch to include tabs.
> 
> I wasn't very sure about blk_ring_lock usage and I should have clarified
> it before sending out the patch.
> 
> Assuming blk_ring_lock was meant to protect shared ring manipulations
> within blkback, is there a reason 'blk_rings->common.req_cons'
> manipulation in do_block_io_op is not protected ? The reasons for the
> differences between locking logic in do_block_io_op and make_response
> weren't terribly obvious although the failure mode for the race condition
> may very well be benign.
> 
> Anyway, I am attaching a patch with appropriate changes.
> 
> Jeremey, Can you apply this patch to pvops Dom-0
> (http://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git). Should I
> submit another patch for 2.6.18 Dom-0 ?
> 
> 
> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> 
> diff --git a/drivers/xen/blkback/blkback.c b/drivers/xen/blkback/blkback.c
> --- a/drivers/xen/blkback/blkback.c
> +++ b/drivers/xen/blkback/blkback.c
> @@ -315,6 +315,7 @@ static int do_block_io_op(blkif_t *blkif)
>   pending_req_t *pending_req;
>   RING_IDX rc, rp;
>   int more_to_do = 0;
> + unsigned long     flags;
>  
>   rc = blk_rings->common.req_cons;
>   rp = blk_rings->common.sring->req_prod;
> @@ -383,6 +384,15 @@ static int do_block_io_op(blkif_t *blkif)
>    cond_resched();
>   }
>  
> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
> +    let blkfront know about it (by setting req_event appropriately) so
> that
> +    blkfront will bother to wake us up (via interrupt) when it submits a
> +    new I/O */
> + if (!more_to_do){
> +  spin_lock_irqsave(&blkif->blk_ring_lock, flags);
> +  RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> +  spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
> + }
>   return more_to_do;
>  }
>  
> 
> 
> 
> 
> 
> 
> 
> On 5/2/11 1:13 AM, "Jan Beulich" <JBeulich@novell.com> wrote:
> 
> >>>> On 02.05.11 at 09:04, "Vincent, Pradeep" <pradeepv@amazon.com> wrote:
> >> In blkback driver, after I/O requests are submitted to Dom-0 block I/O
> >> subsystem, blkback goes to 'sleep' effectively without letting blkfront
> >>know 
> >> about it (req_event isn't set appropriately). Hence blkfront doesn't
> >>notify 
> >> blkback when it submits a new I/O thus delaying the 'dispatch' of the
> >>new I/O 
> >> to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as one
> >>of the 
> >> previous I/Os completes.
> >> 
> >> As a result of this issue, the block I/O latency performance is
> >>degraded for 
> >> some workloads on Xen guests using blkfront-blkback stack.
> >> 
> >> The following change addresses this issue:
> >> 
> >> 
> >> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> >> 
> >> diff --git a/drivers/xen/blkback/blkback.c
> >>b/drivers/xen/blkback/blkback.c
> >> --- a/drivers/xen/blkback/blkback.c
> >> +++ b/drivers/xen/blkback/blkback.c
> >> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
> >>   cond_resched();
> >>   }
> >> 
> >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
> >> +   let blkfront know about it (by setting req_event appropriately) so
> >>that
> >> +   blkfront will bother to wake us up (via interrupt) when it submits a
> >> +   new I/O */
> >> +        if (!more_to_do)
> >> +                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common,
> >>more_to_do);
> >
> >To me this contradicts the comment preceding the use of
> >RING_FINAL_CHECK_FOR_REQUESTS() in make_response()
> >(there it's supposedly used to avoid unnecessary notification,
> >here you say it's used to force notification). Albeit I agree that
> >the change looks consistent with the comments in io/ring.h.
> >
> >Even if correct, you're not holding blkif->blk_ring_lock here, and
> >hence I think you'll need to explain how this is not a problem.
> >
> >From a formal perspective, you also want to correct usage of tabs,
> >and (assuming this is intended for the 2.6.18 tree) you'd also need
> >to indicate so for Keir to pick this up and apply it to that tree (and
> >it might then also be a good idea to submit an equivalent patch for
> >the pv-ops trees).
> >
> >Jan
> >
> >>   return more_to_do;
> >>  }
> >
> >
> >
> 


> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-03 14:55     ` Konrad Rzeszutek Wilk
@ 2011-05-03 17:16       ` Vincent, Pradeep
  2011-05-03 17:51         ` Daniel Stodden
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent, Pradeep @ 2011-05-03 17:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich

Yes - I am positive what I am seeing isn't 'I/O scheduler issue due to
REQ_SYNC'. Trace data from blkback showed that blkback was simply not
submitting the 2nd I/O to the I/O scheduler. Type of I/O (read vs write)
doesn't matter. 

Recreation Steps: 

1. Generate I/O requests so that two I/Os are pending at any given time.
The I/O submissions shouldn't be synchronized. Potentially use two threads
for I/O submissions each submitting a small size random direct I/O.
2. Verify that the guest sends out two I/Os at a given time. 'iostat'
avgqu-sz will be '2'
3. Now check iostat in Dom-0 for the corresponding block device. Avgqu-sz
will be '1'
4. 'await' comparison in DomU vs Dom0 will show a fairly big difference.

And I confirmed that the patch I submitted fixes this issue.

- Pradeep Vincent


On 5/3/11 7:55 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:

>On Mon, May 02, 2011 at 06:10:22PM -0700, Vincent, Pradeep wrote:
>> Thanks Jan.
>> 
>> Re: avoid unnecessary notification
>> 
>> If this was a deliberate design choice then the duration of the delay is
>> at the mercy of the pending I/O latencies & I/O patterns and the delay
>>is
>> simply too long in some cases. E.g. A write I/O stuck behind a read I/O
>> could see more than double the latency on a Xen guest compared to a
>> baremetal host. Avoiding notifications this way results in significant
>> latency degradation perceived by many applications.
>
>You sure this is not the fault of the IO scheduler? I had similar issues
>with the CFQ scheduler upstream and found out that I needed to add
>REQ_SYNC on write requests.
>> 
>> If this is about allowing I/O scheduler to coalesce more I/Os, then I
>>bet
>> I/O scheduler's 'wait and coalesce' logic is a great substitute for the
>> delays introduced by blkback.
>> 
>> I totally agree IRQ coalescing or delay is useful for both blkback and
>> netback but we need a logic that doesn't impact I/O latencies
>> significantly. Also, I don't think netback has this type of notification
>> avoidance logic (at least in 2.6.18 code base).
>> 
>> 
>> Re: Other points
>> 
>> Good call. Changed the patch to include tabs.
>> 
>> I wasn't very sure about blk_ring_lock usage and I should have clarified
>> it before sending out the patch.
>> 
>> Assuming blk_ring_lock was meant to protect shared ring manipulations
>> within blkback, is there a reason 'blk_rings->common.req_cons'
>> manipulation in do_block_io_op is not protected ? The reasons for the
>> differences between locking logic in do_block_io_op and make_response
>> weren't terribly obvious although the failure mode for the race
>>condition
>> may very well be benign.
>> 
>> Anyway, I am attaching a patch with appropriate changes.
>> 
>> Jeremey, Can you apply this patch to pvops Dom-0
>> (http://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git). Should
>>I
>> submit another patch for 2.6.18 Dom-0 ?
>> 
>> 
>> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
>> 
>> diff --git a/drivers/xen/blkback/blkback.c
>>b/drivers/xen/blkback/blkback.c
>> --- a/drivers/xen/blkback/blkback.c
>> +++ b/drivers/xen/blkback/blkback.c
>> @@ -315,6 +315,7 @@ static int do_block_io_op(blkif_t *blkif)
>>   pending_req_t *pending_req;
>>   RING_IDX rc, rp;
>>   int more_to_do = 0;
>> + unsigned long     flags;
>>  
>>   rc = blk_rings->common.req_cons;
>>   rp = blk_rings->common.sring->req_prod;
>> @@ -383,6 +384,15 @@ static int do_block_io_op(blkif_t *blkif)
>>    cond_resched();
>>   }
>>  
>> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
>> +    let blkfront know about it (by setting req_event appropriately) so
>> that
>> +    blkfront will bother to wake us up (via interrupt) when it submits
>>a
>> +    new I/O */
>> + if (!more_to_do){
>> +  spin_lock_irqsave(&blkif->blk_ring_lock, flags);
>> +  RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
>> +  spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
>> + }
>>   return more_to_do;
>>  }
>>  
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 5/2/11 1:13 AM, "Jan Beulich" <JBeulich@novell.com> wrote:
>> 
>> >>>> On 02.05.11 at 09:04, "Vincent, Pradeep" <pradeepv@amazon.com>
>>wrote:
>> >> In blkback driver, after I/O requests are submitted to Dom-0 block
>>I/O
>> >> subsystem, blkback goes to 'sleep' effectively without letting
>>blkfront
>> >>know 
>> >> about it (req_event isn't set appropriately). Hence blkfront doesn't
>> >>notify 
>> >> blkback when it submits a new I/O thus delaying the 'dispatch' of the
>> >>new I/O 
>> >> to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as
>>one
>> >>of the 
>> >> previous I/Os completes.
>> >> 
>> >> As a result of this issue, the block I/O latency performance is
>> >>degraded for 
>> >> some workloads on Xen guests using blkfront-blkback stack.
>> >> 
>> >> The following change addresses this issue:
>> >> 
>> >> 
>> >> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
>> >> 
>> >> diff --git a/drivers/xen/blkback/blkback.c
>> >>b/drivers/xen/blkback/blkback.c
>> >> --- a/drivers/xen/blkback/blkback.c
>> >> +++ b/drivers/xen/blkback/blkback.c
>> >> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
>> >>   cond_resched();
>> >>   }
>> >> 
>> >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we
>>better
>> >> +   let blkfront know about it (by setting req_event appropriately)
>>so
>> >>that
>> >> +   blkfront will bother to wake us up (via interrupt) when it
>>submits a
>> >> +   new I/O */
>> >> +        if (!more_to_do)
>> >> +                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common,
>> >>more_to_do);
>> >
>> >To me this contradicts the comment preceding the use of
>> >RING_FINAL_CHECK_FOR_REQUESTS() in make_response()
>> >(there it's supposedly used to avoid unnecessary notification,
>> >here you say it's used to force notification). Albeit I agree that
>> >the change looks consistent with the comments in io/ring.h.
>> >
>> >Even if correct, you're not holding blkif->blk_ring_lock here, and
>> >hence I think you'll need to explain how this is not a problem.
>> >
>> >From a formal perspective, you also want to correct usage of tabs,
>> >and (assuming this is intended for the 2.6.18 tree) you'd also need
>> >to indicate so for Keir to pick this up and apply it to that tree (and
>> >it might then also be a good idea to submit an equivalent patch for
>> >the pv-ops trees).
>> >
>> >Jan
>> >
>> >>   return more_to_do;
>> >>  }
>> >
>> >
>> >
>> 
>
>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-03 17:16       ` Vincent, Pradeep
@ 2011-05-03 17:51         ` Daniel Stodden
  2011-05-03 23:41           ` Vincent, Pradeep
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Stodden @ 2011-05-03 17:51 UTC (permalink / raw)
  To: Vincent, Pradeep
  Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Tue, 2011-05-03 at 13:16 -0400, Vincent, Pradeep wrote:
> Yes - I am positive what I am seeing isn't 'I/O scheduler issue due to
> REQ_SYNC'. Trace data from blkback showed that blkback was simply not
> submitting the 2nd I/O to the I/O scheduler. Type of I/O (read vs write)
> doesn't matter. 

Block trace? 

Just to make sure we're debugging the right thing -- what I/O scheduler
are you running?

Daniel

> Recreation Steps: 
> 
> 1. Generate I/O requests so that two I/Os are pending at any given time.
> The I/O submissions shouldn't be synchronized. Potentially use two threads
> for I/O submissions each submitting a small size random direct I/O.
> 2. Verify that the guest sends out two I/Os at a given time. 'iostat'
> avgqu-sz will be '2'
> 3. Now check iostat in Dom-0 for the corresponding block device. Avgqu-sz
> will be '1'
> 4. 'await' comparison in DomU vs Dom0 will show a fairly big difference.
> 
> And I confirmed that the patch I submitted fixes this issue.
> 
> - Pradeep Vincent
> 
> 
> On 5/3/11 7:55 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
> 
> >On Mon, May 02, 2011 at 06:10:22PM -0700, Vincent, Pradeep wrote:
> >> Thanks Jan.
> >> 
> >> Re: avoid unnecessary notification
> >> 
> >> If this was a deliberate design choice then the duration of the delay is
> >> at the mercy of the pending I/O latencies & I/O patterns and the delay
> >>is
> >> simply too long in some cases. E.g. A write I/O stuck behind a read I/O
> >> could see more than double the latency on a Xen guest compared to a
> >> baremetal host. Avoiding notifications this way results in significant
> >> latency degradation perceived by many applications.
> >
> >You sure this is not the fault of the IO scheduler? I had similar issues
> >with the CFQ scheduler upstream and found out that I needed to add
> >REQ_SYNC on write requests.
> >> 
> >> If this is about allowing I/O scheduler to coalesce more I/Os, then I
> >>bet
> >> I/O scheduler's 'wait and coalesce' logic is a great substitute for the
> >> delays introduced by blkback.
> >> 
> >> I totally agree IRQ coalescing or delay is useful for both blkback and
> >> netback but we need a logic that doesn't impact I/O latencies
> >> significantly. Also, I don't think netback has this type of notification
> >> avoidance logic (at least in 2.6.18 code base).
> >> 
> >> 
> >> Re: Other points
> >> 
> >> Good call. Changed the patch to include tabs.
> >> 
> >> I wasn't very sure about blk_ring_lock usage and I should have clarified
> >> it before sending out the patch.
> >> 
> >> Assuming blk_ring_lock was meant to protect shared ring manipulations
> >> within blkback, is there a reason 'blk_rings->common.req_cons'
> >> manipulation in do_block_io_op is not protected ? The reasons for the
> >> differences between locking logic in do_block_io_op and make_response
> >> weren't terribly obvious although the failure mode for the race
> >>condition
> >> may very well be benign.
> >> 
> >> Anyway, I am attaching a patch with appropriate changes.
> >> 
> >> Jeremey, Can you apply this patch to pvops Dom-0
> >> (http://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git). Should
> >>I
> >> submit another patch for 2.6.18 Dom-0 ?
> >> 
> >> 
> >> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> >> 
> >> diff --git a/drivers/xen/blkback/blkback.c
> >>b/drivers/xen/blkback/blkback.c
> >> --- a/drivers/xen/blkback/blkback.c
> >> +++ b/drivers/xen/blkback/blkback.c
> >> @@ -315,6 +315,7 @@ static int do_block_io_op(blkif_t *blkif)
> >>   pending_req_t *pending_req;
> >>   RING_IDX rc, rp;
> >>   int more_to_do = 0;
> >> + unsigned long     flags;
> >>  
> >>   rc = blk_rings->common.req_cons;
> >>   rp = blk_rings->common.sring->req_prod;
> >> @@ -383,6 +384,15 @@ static int do_block_io_op(blkif_t *blkif)
> >>    cond_resched();
> >>   }
> >>  
> >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
> >> +    let blkfront know about it (by setting req_event appropriately) so
> >> that
> >> +    blkfront will bother to wake us up (via interrupt) when it submits
> >>a
> >> +    new I/O */
> >> + if (!more_to_do){
> >> +  spin_lock_irqsave(&blkif->blk_ring_lock, flags);
> >> +  RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> >> +  spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
> >> + }
> >>   return more_to_do;
> >>  }
> >>  
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On 5/2/11 1:13 AM, "Jan Beulich" <JBeulich@novell.com> wrote:
> >> 
> >> >>>> On 02.05.11 at 09:04, "Vincent, Pradeep" <pradeepv@amazon.com>
> >>wrote:
> >> >> In blkback driver, after I/O requests are submitted to Dom-0 block
> >>I/O
> >> >> subsystem, blkback goes to 'sleep' effectively without letting
> >>blkfront
> >> >>know 
> >> >> about it (req_event isn't set appropriately). Hence blkfront doesn't
> >> >>notify 
> >> >> blkback when it submits a new I/O thus delaying the 'dispatch' of the
> >> >>new I/O 
> >> >> to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as
> >>one
> >> >>of the 
> >> >> previous I/Os completes.
> >> >> 
> >> >> As a result of this issue, the block I/O latency performance is
> >> >>degraded for 
> >> >> some workloads on Xen guests using blkfront-blkback stack.
> >> >> 
> >> >> The following change addresses this issue:
> >> >> 
> >> >> 
> >> >> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> >> >> 
> >> >> diff --git a/drivers/xen/blkback/blkback.c
> >> >>b/drivers/xen/blkback/blkback.c
> >> >> --- a/drivers/xen/blkback/blkback.c
> >> >> +++ b/drivers/xen/blkback/blkback.c
> >> >> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
> >> >>   cond_resched();
> >> >>   }
> >> >> 
> >> >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we
> >>better
> >> >> +   let blkfront know about it (by setting req_event appropriately)
> >>so
> >> >>that
> >> >> +   blkfront will bother to wake us up (via interrupt) when it
> >>submits a
> >> >> +   new I/O */
> >> >> +        if (!more_to_do)
> >> >> +                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common,
> >> >>more_to_do);
> >> >
> >> >To me this contradicts the comment preceding the use of
> >> >RING_FINAL_CHECK_FOR_REQUESTS() in make_response()
> >> >(there it's supposedly used to avoid unnecessary notification,
> >> >here you say it's used to force notification). Albeit I agree that
> >> >the change looks consistent with the comments in io/ring.h.
> >> >
> >> >Even if correct, you're not holding blkif->blk_ring_lock here, and
> >> >hence I think you'll need to explain how this is not a problem.
> >> >
> >> >From a formal perspective, you also want to correct usage of tabs,
> >> >and (assuming this is intended for the 2.6.18 tree) you'd also need
> >> >to indicate so for Keir to pick this up and apply it to that tree (and
> >> >it might then also be a good idea to submit an equivalent patch for
> >> >the pv-ops trees).
> >> >
> >> >Jan
> >> >
> >> >>   return more_to_do;
> >> >>  }
> >> >
> >> >
> >> >
> >> 
> >
> >
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> >
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-03  1:10   ` Vincent, Pradeep
  2011-05-03 14:55     ` Konrad Rzeszutek Wilk
@ 2011-05-03 17:52     ` Daniel Stodden
  2011-05-04  1:54       ` Vincent, Pradeep
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel Stodden @ 2011-05-03 17:52 UTC (permalink / raw)
  To: Vincent, Pradeep; +Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich

On Mon, 2011-05-02 at 21:10 -0400, Vincent, Pradeep wrote:
> Thanks Jan.
> 
> Re: avoid unnecessary notification
> 
> If this was a deliberate design choice then the duration of the delay is
> at the mercy of the pending I/O latencies & I/O patterns and the delay is
> simply too long in some cases. E.g. A write I/O stuck behind a read I/O
> could see more than double the latency on a Xen guest compared to a
> baremetal host. Avoiding notifications this way results in significant
> latency degradation perceived by many applications.

I'm trying to follow - let me know if I misread you - but I think you're
misunderstanding this stuff. 

The notification avoidance these macros implement does not promote
deliberate latency. This stuff is not dropping events or deferring guest
requests.

It only avoids a gratuitious notification sent by the remote end in
cases where the local one didn't go to sleep yet, and therefore can
guarantee that it's going to process the message ASAP, right after
finishing what's still pending from the previous kick. 

It's only a mechanism to avoid excess interrupt signaling. Think about a
situation where you ask the guy at the front door to take his thumb off
the buzzer while you're already running down the hallway.

R/W reordering is a matter dealt with by I/O schedulers. 

Any case of write I/O behind the read you describe is supposed to be
queued back-to-back. It should never get stuck. A backend can obviously
reserve the right to override guest submit order, but blkback doesn't do
this, it's just pushing everything down the disk queue as soon as it
sees it.

So, that'd be the basic idea. Now, we've got that extra stuff in there
mixing that up between request and response processing, and it's
admittedly somewhat hard to read.

If you found a bug in there, well, yoho. Normally the slightest mistake
on the event processing front rather leads to deadlocks, and we
currently don't see any.

Iff you're right -- I guess the better fix would look different. If this
stuff is actually broken, may we can rather simplify things again, not
add more extra checks on top. :)

Daniel

> If this is about allowing I/O scheduler to coalesce more I/Os, then I bet
> I/O scheduler's 'wait and coalesce' logic is a great substitute for the
> delays introduced by blkback.
> 
> I totally agree IRQ coalescing or delay is useful for both blkback and
> netback but we need a logic that doesn't impact I/O latencies
> significantly. Also, I don't think netback has this type of notification
> avoidance logic (at least in 2.6.18 code base).
> 
> 
> Re: Other points
> 
> Good call. Changed the patch to include tabs.
> 
> I wasn't very sure about blk_ring_lock usage and I should have clarified
> it before sending out the patch.
> 
> Assuming blk_ring_lock was meant to protect shared ring manipulations
> within blkback, is there a reason 'blk_rings->common.req_cons'
> manipulation in do_block_io_op is not protected ? The reasons for the
> differences between locking logic in do_block_io_op and make_response
> weren't terribly obvious although the failure mode for the race condition
> may very well be benign.
> 
> Anyway, I am attaching a patch with appropriate changes.
> 
> Jeremey, Can you apply this patch to pvops Dom-0
> (http://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git). Should I
> submit another patch for 2.6.18 Dom-0 ?
> 
> 
> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> 
> diff --git a/drivers/xen/blkback/blkback.c b/drivers/xen/blkback/blkback.c
> --- a/drivers/xen/blkback/blkback.c
> +++ b/drivers/xen/blkback/blkback.c
> @@ -315,6 +315,7 @@ static int do_block_io_op(blkif_t *blkif)
>   pending_req_t *pending_req;
>   RING_IDX rc, rp;
>   int more_to_do = 0;
> + unsigned long     flags;
>  
>   rc = blk_rings->common.req_cons;
>   rp = blk_rings->common.sring->req_prod;
> @@ -383,6 +384,15 @@ static int do_block_io_op(blkif_t *blkif)
>    cond_resched();
>   }
>  
> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
> +    let blkfront know about it (by setting req_event appropriately) so
> that
> +    blkfront will bother to wake us up (via interrupt) when it submits a
> +    new I/O */
> + if (!more_to_do){
> +  spin_lock_irqsave(&blkif->blk_ring_lock, flags);
> +  RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> +  spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
> + }
>   return more_to_do;
>  }
>  
> 
> 
> 
> 
> 
> 
> 
> On 5/2/11 1:13 AM, "Jan Beulich" <JBeulich@novell.com> wrote:
> 
> >>>> On 02.05.11 at 09:04, "Vincent, Pradeep" <pradeepv@amazon.com> wrote:
> >> In blkback driver, after I/O requests are submitted to Dom-0 block I/O
> >> subsystem, blkback goes to 'sleep' effectively without letting blkfront
> >>know 
> >> about it (req_event isn't set appropriately). Hence blkfront doesn't
> >>notify 
> >> blkback when it submits a new I/O thus delaying the 'dispatch' of the
> >>new I/O 
> >> to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as one
> >>of the 
> >> previous I/Os completes.
> >> 
> >> As a result of this issue, the block I/O latency performance is
> >>degraded for 
> >> some workloads on Xen guests using blkfront-blkback stack.
> >> 
> >> The following change addresses this issue:
> >> 
> >> 
> >> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> >> 
> >> diff --git a/drivers/xen/blkback/blkback.c
> >>b/drivers/xen/blkback/blkback.c
> >> --- a/drivers/xen/blkback/blkback.c
> >> +++ b/drivers/xen/blkback/blkback.c
> >> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
> >>   cond_resched();
> >>   }
> >> 
> >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
> >> +   let blkfront know about it (by setting req_event appropriately) so
> >>that
> >> +   blkfront will bother to wake us up (via interrupt) when it submits a
> >> +   new I/O */
> >> +        if (!more_to_do)
> >> +                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common,
> >>more_to_do);
> >
> >To me this contradicts the comment preceding the use of
> >RING_FINAL_CHECK_FOR_REQUESTS() in make_response()
> >(there it's supposedly used to avoid unnecessary notification,
> >here you say it's used to force notification). Albeit I agree that
> >the change looks consistent with the comments in io/ring.h.
> >
> >Even if correct, you're not holding blkif->blk_ring_lock here, and
> >hence I think you'll need to explain how this is not a problem.
> >
> >From a formal perspective, you also want to correct usage of tabs,
> >and (assuming this is intended for the 2.6.18 tree) you'd also need
> >to indicate so for Keir to pick this up and apply it to that tree (and
> >it might then also be a good idea to submit an equivalent patch for
> >the pv-ops trees).
> >
> >Jan
> >
> >>   return more_to_do;
> >>  }
> >
> >
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH] blkback: Fix block I/O latency issue
  2011-05-03 17:51         ` Daniel Stodden
@ 2011-05-03 23:41           ` Vincent, Pradeep
  0 siblings, 0 replies; 33+ messages in thread
From: Vincent, Pradeep @ 2011-05-03 23:41 UTC (permalink / raw)
  To: Daniel Stodden
  Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 8000 bytes --]

Hey Daniel, 

CFQ primarily but I/O scheduler doesn't come into the picture here.

>> Block trace? 

Nope.. Systemtap

- Pradeep Vincent

-----Original Message-----
From: Daniel Stodden [mailto:daniel.stodden@citrix.com] 
Sent: Tuesday, May 03, 2011 10:52 AM
To: Vincent, Pradeep
Cc: Konrad Rzeszutek Wilk; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Jan Beulich
Subject: Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue

On Tue, 2011-05-03 at 13:16 -0400, Vincent, Pradeep wrote:
> Yes - I am positive what I am seeing isn't 'I/O scheduler issue due to
> REQ_SYNC'. Trace data from blkback showed that blkback was simply not
> submitting the 2nd I/O to the I/O scheduler. Type of I/O (read vs write)
> doesn't matter. 

Block trace? 

Just to make sure we're debugging the right thing -- what I/O scheduler
are you running?

Daniel

> Recreation Steps: 
> 
> 1. Generate I/O requests so that two I/Os are pending at any given time.
> The I/O submissions shouldn't be synchronized. Potentially use two threads
> for I/O submissions each submitting a small size random direct I/O.
> 2. Verify that the guest sends out two I/Os at a given time. 'iostat'
> avgqu-sz will be '2'
> 3. Now check iostat in Dom-0 for the corresponding block device. Avgqu-sz
> will be '1'
> 4. 'await' comparison in DomU vs Dom0 will show a fairly big difference.
> 
> And I confirmed that the patch I submitted fixes this issue.
> 
> - Pradeep Vincent
> 
> 
> On 5/3/11 7:55 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
> 
> >On Mon, May 02, 2011 at 06:10:22PM -0700, Vincent, Pradeep wrote:
> >> Thanks Jan.
> >> 
> >> Re: avoid unnecessary notification
> >> 
> >> If this was a deliberate design choice then the duration of the delay is
> >> at the mercy of the pending I/O latencies & I/O patterns and the delay
> >>is
> >> simply too long in some cases. E.g. A write I/O stuck behind a read I/O
> >> could see more than double the latency on a Xen guest compared to a
> >> baremetal host. Avoiding notifications this way results in significant
> >> latency degradation perceived by many applications.
> >
> >You sure this is not the fault of the IO scheduler? I had similar issues
> >with the CFQ scheduler upstream and found out that I needed to add
> >REQ_SYNC on write requests.
> >> 
> >> If this is about allowing I/O scheduler to coalesce more I/Os, then I
> >>bet
> >> I/O scheduler's 'wait and coalesce' logic is a great substitute for the
> >> delays introduced by blkback.
> >> 
> >> I totally agree IRQ coalescing or delay is useful for both blkback and
> >> netback but we need a logic that doesn't impact I/O latencies
> >> significantly. Also, I don't think netback has this type of notification
> >> avoidance logic (at least in 2.6.18 code base).
> >> 
> >> 
> >> Re: Other points
> >> 
> >> Good call. Changed the patch to include tabs.
> >> 
> >> I wasn't very sure about blk_ring_lock usage and I should have clarified
> >> it before sending out the patch.
> >> 
> >> Assuming blk_ring_lock was meant to protect shared ring manipulations
> >> within blkback, is there a reason 'blk_rings->common.req_cons'
> >> manipulation in do_block_io_op is not protected ? The reasons for the
> >> differences between locking logic in do_block_io_op and make_response
> >> weren't terribly obvious although the failure mode for the race
> >>condition
> >> may very well be benign.
> >> 
> >> Anyway, I am attaching a patch with appropriate changes.
> >> 
> >> Jeremey, Can you apply this patch to pvops Dom-0
> >> (http://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git). Should
> >>I
> >> submit another patch for 2.6.18 Dom-0 ?
> >> 
> >> 
> >> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> >> 
> >> diff --git a/drivers/xen/blkback/blkback.c
> >>b/drivers/xen/blkback/blkback.c
> >> --- a/drivers/xen/blkback/blkback.c
> >> +++ b/drivers/xen/blkback/blkback.c
> >> @@ -315,6 +315,7 @@ static int do_block_io_op(blkif_t *blkif)
> >>   pending_req_t *pending_req;
> >>   RING_IDX rc, rp;
> >>   int more_to_do = 0;
> >> + unsigned long     flags;
> >>  
> >>   rc = blk_rings->common.req_cons;
> >>   rp = blk_rings->common.sring->req_prod;
> >> @@ -383,6 +384,15 @@ static int do_block_io_op(blkif_t *blkif)
> >>    cond_resched();
> >>   }
> >>  
> >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
> >> +    let blkfront know about it (by setting req_event appropriately) so
> >> that
> >> +    blkfront will bother to wake us up (via interrupt) when it submits
> >>a
> >> +    new I/O */
> >> + if (!more_to_do){
> >> +  spin_lock_irqsave(&blkif->blk_ring_lock, flags);
> >> +  RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> >> +  spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
> >> + }
> >>   return more_to_do;
> >>  }
> >>  
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On 5/2/11 1:13 AM, "Jan Beulich" <JBeulich@novell.com> wrote:
> >> 
> >> >>>> On 02.05.11 at 09:04, "Vincent, Pradeep" <pradeepv@amazon.com>
> >>wrote:
> >> >> In blkback driver, after I/O requests are submitted to Dom-0 block
> >>I/O
> >> >> subsystem, blkback goes to 'sleep' effectively without letting
> >>blkfront
> >> >>know 
> >> >> about it (req_event isn't set appropriately). Hence blkfront doesn't
> >> >>notify 
> >> >> blkback when it submits a new I/O thus delaying the 'dispatch' of the
> >> >>new I/O 
> >> >> to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as
> >>one
> >> >>of the 
> >> >> previous I/Os completes.
> >> >> 
> >> >> As a result of this issue, the block I/O latency performance is
> >> >>degraded for 
> >> >> some workloads on Xen guests using blkfront-blkback stack.
> >> >> 
> >> >> The following change addresses this issue:
> >> >> 
> >> >> 
> >> >> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> >> >> 
> >> >> diff --git a/drivers/xen/blkback/blkback.c
> >> >>b/drivers/xen/blkback/blkback.c
> >> >> --- a/drivers/xen/blkback/blkback.c
> >> >> +++ b/drivers/xen/blkback/blkback.c
> >> >> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
> >> >>   cond_resched();
> >> >>   }
> >> >> 
> >> >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we
> >>better
> >> >> +   let blkfront know about it (by setting req_event appropriately)
> >>so
> >> >>that
> >> >> +   blkfront will bother to wake us up (via interrupt) when it
> >>submits a
> >> >> +   new I/O */
> >> >> +        if (!more_to_do)
> >> >> +                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common,
> >> >>more_to_do);
> >> >
> >> >To me this contradicts the comment preceding the use of
> >> >RING_FINAL_CHECK_FOR_REQUESTS() in make_response()
> >> >(there it's supposedly used to avoid unnecessary notification,
> >> >here you say it's used to force notification). Albeit I agree that
> >> >the change looks consistent with the comments in io/ring.h.
> >> >
> >> >Even if correct, you're not holding blkif->blk_ring_lock here, and
> >> >hence I think you'll need to explain how this is not a problem.
> >> >
> >> >From a formal perspective, you also want to correct usage of tabs,
> >> >and (assuming this is intended for the 2.6.18 tree) you'd also need
> >> >to indicate so for Keir to pick this up and apply it to that tree (and
> >> >it might then also be a good idea to submit an equivalent patch for
> >> >the pv-ops trees).
> >> >
> >> >Jan
> >> >
> >> >>   return more_to_do;
> >> >>  }
> >> >
> >> >
> >> >
> >> 
> >
> >
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> >
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-03 17:52     ` Daniel Stodden
@ 2011-05-04  1:54       ` Vincent, Pradeep
  2011-05-09 20:24         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent, Pradeep @ 2011-05-04  1:54 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich

Hey Daniel,
 
Thanks for your comments.
 
>> The notification avoidance these macros implement does not promote
>>deliberate latency. This stuff is not dropping events or deferring guest
requests.
 
It only avoids a gratuitious notification sent by the remote end in
cases where the local one didn't go to sleep yet, and therefore can
>>guarantee that it's going to process the message ASAP, right after
>>finishing what's still pending from the previous kick.
 
If the design goal was to simply avoid unnecessary interrupts but not
delay I/Os, then blkback code has a bug.

If the design goal was to delay the I/Os in order to reducing interrupt
rate, then I am arguing that the design introduces way too much latency
that affects many applications.

Either way, this issue needs to be addressed.


Perhaps, a timeline example will help shed some light on this issue. Let's
IO-1 and IO-2 are submitted with a gap of 200 usecs. Let's assume
interrupt latency is 10usec and disk drive takes ~10,000 usecs to process
each I/O.

t1: IO-1 arrives at blkfront. RING_PUSH_REQUESTS_AND_CHECK_NOTIFY is
called which updates 'sring->req_prod' and uses 'sring->req_event' to
determine if an interrupt must be generated. In this case, blkfront
generates the interrupt.

t1+10 usecs: Interrupt is received by blkback. do_block_io_op is
eventually invoked which dispatches the I/O after incrementing
'common.req_cons'. Note that 'req_event' is NOT updated. There are no more
I/Os to be processed and hence blkback thread goes to sleep.

t1+200 usecs: IO-2 arrives at blkfront.
RING_PUSH_REQUESTS_AND_CHECK_NOTIFY is called which updates
'sring->req_prod' and uses 'sring->req_event' to determine if an interrupt
must be generated. Unfortunately, 'req_event' was NOT updated in the
previous step and as a result blkfront decides not to send an interrupt.
As a result blkback doesn't wake up immediately to process the I/O that
has been added to the shared ring by blkfront.

t1+10000 usecs: IO-1 completes. 'make_response' is invoked which signals
the completion of IO-1 to blkfront. Now it goes through the following code
and decides there is 'more_to_do'.

        if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons) {
                /*
                 * Tail check for pending requests. Allows frontend to
avoid
                 * notifications if requests are already in flight (lower
                 * overheads and promotes batching).
                 */
                RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common,
more_to_do);

        
         Hence the blkback thread is woken up which then invokes
'do_block_io_op'. 'do_block_io_op' then dispatches IO-2

t1+20000 usecs: IO-2 completes.


>From guest point of view, IO-1 took ~10,000 usecs to complete which is
fine. But IO-2 took 19,800 usecs which is obviously very bad.

Now once the patch is applied,


t1+10 usecs : Interrupt is received by blkback. do_block_io_op is
eventually invoked which dispatches the I/O after incrementing
'common.req_cons'. RING_FINAL_CHECK_FOR_REQUESTS is invoked which updates
'req_event'. There are no more I/Os to be processed and hence blkback
thread goes to sleep.

t1+200 usecs: IO-2 arrives at blkfront.
RING_PUSH_REQUESTS_AND_CHECK_NOTIFY is called which updates
'sring->req_prod' and uses 'sring->req_event' to determine if an interrupt
must be generated. Since req_event was updated in the previous step,
blkfront decides to generate an interrupt

t1+210 usecs: Interrupt is received by blkback. do_block_io_op is
eventually invoked which dispatches IO-2 after incrementing
'common.req_cons'. RING_FINAL_CHECK_FOR_REQUESTS is invoked which updates
'req_event'. There are no more I/Os to be processed and hence blkback
thread goes to sleep.

t1+10000 usecs: IO-1 completes.

t1+10210 usecs: IO-2 completes.

Both I/Os take ~10,000 usecs to complete and the application lives happily
ever after.


Does that make sense ?

>>Normally the slightest mistake
on the event processing front rather leads to deadlocks, and we
>> currently don't see any.


Yeah - I had the same thought initially. In this case, the fact that the
make_response kicks off any pending I/Os turns potential deadlocks into
latency issues.


>>Iff you're right -- I guess the better fix would look different. If this
stuff is actually broken, may we can rather simplify things again, not
add more extra checks on top. :)

Love to hear better ways of fixing this issue. Any proposals ?


Thanks,

- Pradeep Vincent

 





On 5/3/11 10:52 AM, "Daniel Stodden" <daniel.stodden@citrix.com> wrote:

>On Mon, 2011-05-02 at 21:10 -0400, Vincent, Pradeep wrote:
>> Thanks Jan.
>> 
>> Re: avoid unnecessary notification
>> 
>> If this was a deliberate design choice then the duration of the delay is
>> at the mercy of the pending I/O latencies & I/O patterns and the delay
>>is
>> simply too long in some cases. E.g. A write I/O stuck behind a read I/O
>> could see more than double the latency on a Xen guest compared to a
>> baremetal host. Avoiding notifications this way results in significant
>> latency degradation perceived by many applications.
>
>I'm trying to follow - let me know if I misread you - but I think you're
>misunderstanding this stuff.
>
>The notification avoidance these macros implement does not promote
>deliberate latency. This stuff is not dropping events or deferring guest
>requests.
>
>It only avoids a gratuitious notification sent by the remote end in
>cases where the local one didn't go to sleep yet, and therefore can
>guarantee that it's going to process the message ASAP, right after
>finishing what's still pending from the previous kick.
>
>It's only a mechanism to avoid excess interrupt signaling. Think about a
>situation where you ask the guy at the front door to take his thumb off
>the buzzer while you're already running down the hallway.
>
>R/W reordering is a matter dealt with by I/O schedulers.
>
>Any case of write I/O behind the read you describe is supposed to be
>queued back-to-back. It should never get stuck. A backend can obviously
>reserve the right to override guest submit order, but blkback doesn't do
>this, it's just pushing everything down the disk queue as soon as it
>sees it.
>
>So, that'd be the basic idea. Now, we've got that extra stuff in there
>mixing that up between request and response processing, and it's
>admittedly somewhat hard to read.
>
>If you found a bug in there, well, yoho. Normally the slightest mistake
>on the event processing front rather leads to deadlocks, and we
>currently don't see any.
>
>Iff you're right -- I guess the better fix would look different. If this
>stuff is actually broken, may we can rather simplify things again, not
>add more extra checks on top. :)
>
>Daniel
>
>> If this is about allowing I/O scheduler to coalesce more I/Os, then I
>>bet
>> I/O scheduler's 'wait and coalesce' logic is a great substitute for the
>> delays introduced by blkback.
>> 
>> I totally agree IRQ coalescing or delay is useful for both blkback and
>> netback but we need a logic that doesn't impact I/O latencies
>> significantly. Also, I don't think netback has this type of notification
>> avoidance logic (at least in 2.6.18 code base).
>> 
>> 
>> Re: Other points
>> 
>> Good call. Changed the patch to include tabs.
>> 
>> I wasn't very sure about blk_ring_lock usage and I should have clarified
>> it before sending out the patch.
>> 
>> Assuming blk_ring_lock was meant to protect shared ring manipulations
>> within blkback, is there a reason 'blk_rings->common.req_cons'
>> manipulation in do_block_io_op is not protected ? The reasons for the
>> differences between locking logic in do_block_io_op and make_response
>> weren't terribly obvious although the failure mode for the race
>>condition
>> may very well be benign.
>> 
>> Anyway, I am attaching a patch with appropriate changes.
>> 
>> Jeremey, Can you apply this patch to pvops Dom-0
>> (http://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git). Should
>>I
>> submit another patch for 2.6.18 Dom-0 ?
>> 
>> 
>> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
>> 
>> diff --git a/drivers/xen/blkback/blkback.c
>>b/drivers/xen/blkback/blkback.c
>> --- a/drivers/xen/blkback/blkback.c
>> +++ b/drivers/xen/blkback/blkback.c
>> @@ -315,6 +315,7 @@ static int do_block_io_op(blkif_t *blkif)
>>   pending_req_t *pending_req;
>>   RING_IDX rc, rp;
>>   int more_to_do = 0;
>> + unsigned long     flags;
>>  
>>   rc = blk_rings->common.req_cons;
>>   rp = blk_rings->common.sring->req_prod;
>> @@ -383,6 +384,15 @@ static int do_block_io_op(blkif_t *blkif)
>>    cond_resched();
>>   }
>>  
>> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
>> +    let blkfront know about it (by setting req_event appropriately) so
>> that
>> +    blkfront will bother to wake us up (via interrupt) when it submits
>>a
>> +    new I/O */
>> + if (!more_to_do){
>> +  spin_lock_irqsave(&blkif->blk_ring_lock, flags);
>> +  RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
>> +  spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
>> + }
>>   return more_to_do;
>>  }
>>  
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 5/2/11 1:13 AM, "Jan Beulich" <JBeulich@novell.com> wrote:
>> 
>> >>>> On 02.05.11 at 09:04, "Vincent, Pradeep" <pradeepv@amazon.com>
>>wrote:
>> >> In blkback driver, after I/O requests are submitted to Dom-0 block
>>I/O
>> >> subsystem, blkback goes to 'sleep' effectively without letting
>>blkfront
>> >>know 
>> >> about it (req_event isn't set appropriately). Hence blkfront doesn't
>> >>notify 
>> >> blkback when it submits a new I/O thus delaying the 'dispatch' of the
>> >>new I/O 
>> >> to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as
>>one
>> >>of the 
>> >> previous I/Os completes.
>> >> 
>> >> As a result of this issue, the block I/O latency performance is
>> >>degraded for 
>> >> some workloads on Xen guests using blkfront-blkback stack.
>> >> 
>> >> The following change addresses this issue:
>> >> 
>> >> 
>> >> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
>> >> 
>> >> diff --git a/drivers/xen/blkback/blkback.c
>> >>b/drivers/xen/blkback/blkback.c
>> >> --- a/drivers/xen/blkback/blkback.c
>> >> +++ b/drivers/xen/blkback/blkback.c
>> >> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
>> >>   cond_resched();
>> >>   }
>> >> 
>> >> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we
>>better
>> >> +   let blkfront know about it (by setting req_event appropriately)
>>so
>> >>that
>> >> +   blkfront will bother to wake us up (via interrupt) when it
>>submits a
>> >> +   new I/O */
>> >> +        if (!more_to_do)
>> >> +                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common,
>> >>more_to_do);
>> >
>> >To me this contradicts the comment preceding the use of
>> >RING_FINAL_CHECK_FOR_REQUESTS() in make_response()
>> >(there it's supposedly used to avoid unnecessary notification,
>> >here you say it's used to force notification). Albeit I agree that
>> >the change looks consistent with the comments in io/ring.h.
>> >
>> >Even if correct, you're not holding blkif->blk_ring_lock here, and
>> >hence I think you'll need to explain how this is not a problem.
>> >
>> >From a formal perspective, you also want to correct usage of tabs,
>> >and (assuming this is intended for the 2.6.18 tree) you'd also need
>> >to indicate so for Keir to pick this up and apply it to that tree (and
>> >it might then also be a good idea to submit an equivalent patch for
>> >the pv-ops trees).
>> >
>> >Jan
>> >
>> >>   return more_to_do;
>> >>  }
>> >
>> >
>> >
>> 
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-04  1:54       ` Vincent, Pradeep
@ 2011-05-09 20:24         ` Konrad Rzeszutek Wilk
  2011-05-13  0:40           ` Vincent, Pradeep
  0 siblings, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-09 20:24 UTC (permalink / raw)
  To: Vincent, Pradeep
  Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich, Daniel Stodden

On Tue, May 03, 2011 at 06:54:38PM -0700, Vincent, Pradeep wrote:
> Hey Daniel,
>  
> Thanks for your comments.
>  
> >> The notification avoidance these macros implement does not promote
> >>deliberate latency. This stuff is not dropping events or deferring guest
> requests.
>  
> It only avoids a gratuitious notification sent by the remote end in
> cases where the local one didn't go to sleep yet, and therefore can
> >>guarantee that it's going to process the message ASAP, right after
> >>finishing what's still pending from the previous kick.
>  
> If the design goal was to simply avoid unnecessary interrupts but not
> delay I/Os, then blkback code has a bug.
> 
> If the design goal was to delay the I/Os in order to reducing interrupt
> rate, then I am arguing that the design introduces way too much latency
> that affects many applications.
> 
> Either way, this issue needs to be addressed.


I agree we need to fix this. What I am curious is:
 - what are the workloads under which this patch has a negative effect.
 - I presume you have tested this in the production - what were the numbers
   when it came to high bandwith numbers (so imagine, four or six threads
   putting as much I/O as possible)? Did the level of IRQs go way up
   compared to not running with this patch?


I am wondering if it might be worth looking in something NAPI-type in the
block layer (so polling basically). The concern I've is that this
patch would trigger a interrupt storm for small sized requests which might be
happening at a high rate (say, 512 bytes random writes).

But perhaps the way for this work is to have a ratelimiting code in it
so that there is no chance of interrupt storms.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-09 20:24         ` Konrad Rzeszutek Wilk
@ 2011-05-13  0:40           ` Vincent, Pradeep
  2011-05-13  2:51             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent, Pradeep @ 2011-05-13  0:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich, Daniel Stodden

Thanks Konrad.

>>I presume you have tested this in the production

Yes. Absolutely. 

>>what were the numbers when it came to high bandwith numbers

Under high I/O workload, where the blkfront would fill up the queue as
blkback works the queue, the I/O latency problem in question doesn't
manifest itself and as a result this patch doesn't make much of a
difference in terms of interrupt rate. My benchmarks didn't show any
significant effect.

The above rationale combined with relatively high disk I/O latencies
(compared to IRQ latency) generally minimizes excessive interrupt rate.
Also, blkfront interrupt generation mechanism works exactly the same way
as the patched blkback. Netfront and netback generate interrupts the same
way as well. 

Under 'moderate' I/O workload, the rate of interrupt does go up but the
I/O latency benefit clearly outweighs the cost extra interrupt rate (which
isn't much for moderate I/O load anyways)


Overall, advantages of this patch (I/O latency improvement) outweighs any
potential fringe negative effects by a large margin and the fact that
netfront, netback and blkfront already have the same interrupt generation
design point should give us a lot of confidence.

That said, I do think a comprehensive interrupt throttling mechanism for
netback, blkback and other backend I/O drivers would be useful and should
be pursued as a separate initiative. Such a mechanism would be
particularly useful for netfront-netback stack which is more susceptible
to interrupt storms than blkfront-blkback. 'IRQ coalescing' type mechanism
that could induce delays in the order of  10s of microsecs (certainly not
in millisecs though) to minimize interrupt generation rate would be useful
(similar to what NICs do).

Thanks,

- Pradeep Vincent




On 5/9/11 1:24 PM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:

>On Tue, May 03, 2011 at 06:54:38PM -0700, Vincent, Pradeep wrote:
>> Hey Daniel,
>>  
>> Thanks for your comments.
>>  
>> >> The notification avoidance these macros implement does not promote
>> >>deliberate latency. This stuff is not dropping events or deferring
>>guest
>> requests.
>>  
>> It only avoids a gratuitious notification sent by the remote end in
>> cases where the local one didn't go to sleep yet, and therefore can
>> >>guarantee that it's going to process the message ASAP, right after
>> >>finishing what's still pending from the previous kick.
>>  
>> If the design goal was to simply avoid unnecessary interrupts but not
>> delay I/Os, then blkback code has a bug.
>> 
>> If the design goal was to delay the I/Os in order to reducing interrupt
>> rate, then I am arguing that the design introduces way too much latency
>> that affects many applications.
>> 
>> Either way, this issue needs to be addressed.
>
>
>I agree we need to fix this. What I am curious is:
> - what are the workloads under which this patch has a negative effect.
> - I presume you have tested this in the production - what were the
>numbers
>   when it came to high bandwith numbers (so imagine, four or six threads
>   putting as much I/O as possible)? Did the level of IRQs go way up
>   compared to not running with this patch?
>
>
>I am wondering if it might be worth looking in something NAPI-type in the
>block layer (so polling basically). The concern I've is that this
>patch would trigger a interrupt storm for small sized requests which
>might be
>happening at a high rate (say, 512 bytes random writes).
>
>But perhaps the way for this work is to have a ratelimiting code in it
>so that there is no chance of interrupt storms.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-13  0:40           ` Vincent, Pradeep
@ 2011-05-13  2:51             ` Konrad Rzeszutek Wilk
  2011-05-16 15:22               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-13  2:51 UTC (permalink / raw)
  To: Vincent, Pradeep
  Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich, Daniel Stodden

> >>what were the numbers when it came to high bandwidth numbers
> 
> Under high I/O workload, where the blkfront would fill up the queue as
> blkback works the queue, the I/O latency problem in question doesn't
> manifest itself and as a result this patch doesn't make much of a
> difference in terms of interrupt rate. My benchmarks didn't show any
> significant effect.

I have to rerun my benchmarks. Under high load (so 64Kb, four threads
writting as much as they can to a iSCSI disk), the IRQ rate for each
blkif went from 2-3/sec to ~5K/sec. But I did not do a good
job on capturing the submission latency to see if the I/Os get the
response back as fast (or the same) as without your patch.

And the iSCSI disk on the target side was an RAMdisk, so latency
was quite small which is not fair to your problem.

Do you have a program to measure the latency for the workload you
had encountered? I would like to run those numbers myself.

> 
> The above rationale combined with relatively high disk I/O latencies
> (compared to IRQ latency) generally minimizes excessive interrupt rate.
> Also, blkfront interrupt generation mechanism works exactly the same way
> as the patched blkback. Netfront and netback generate interrupts the same
> way as well. 
> 
> Under 'moderate' I/O workload, the rate of interrupt does go up but the
> I/O latency benefit clearly outweighs the cost extra interrupt rate (which
> isn't much for moderate I/O load anyways)
> 
> 
> Overall, advantages of this patch (I/O latency improvement) outweighs any
> potential fringe negative effects by a large margin and the fact that
> netfront, netback and blkfront already have the same interrupt generation
> design point should give us a lot of confidence.

If we can come up with a patch that does decrease the I/O latency _and_
does not cause a huge spike in interrupt generation that would be super.

I am particularly worried about the high interrupt generation when there is
a high I/O load. But that seems to be tied directly to the "disk" I have.
I am curious to how this works with drivers that are SSDs for example.

> 
> That said, I do think a comprehensive interrupt throttling mechanism for
> netback, blkback and other backend I/O drivers would be useful and should
> be pursued as a separate initiative. Such a mechanism would be

Yeah, my fear is that you will disappear once this patch is in and I will
have to go forth to do this (and I don't know when I would get to it).
Would prefer to have your help here or (better) you write the patch for
both problems right now :-)

> particularly useful for netfront-netback stack which is more susceptible
> to interrupt storms than blkfront-blkback. 'IRQ coalescing' type mechanism
> that could induce delays in the order of  10s of microsecs (certainly not
> in millisecs though) to minimize interrupt generation rate would be useful
> (similar to what NICs do).

Good point. Awaiting your patch for the netfront-netback code :-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-13  2:51             ` Konrad Rzeszutek Wilk
@ 2011-05-16 15:22               ` Konrad Rzeszutek Wilk
  2011-05-20  6:12                 ` Vincent, Pradeep
  0 siblings, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-16 15:22 UTC (permalink / raw)
  To: Vincent, Pradeep
  Cc: Jeremy Fitzhardinge, xen-devel, Jan Beulich, Daniel Stodden

[-- Attachment #1: Type: text/plain, Size: 2879 bytes --]

On Thu, May 12, 2011 at 10:51:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > >>what were the numbers when it came to high bandwidth numbers
> > 
> > Under high I/O workload, where the blkfront would fill up the queue as
> > blkback works the queue, the I/O latency problem in question doesn't
> > manifest itself and as a result this patch doesn't make much of a
> > difference in terms of interrupt rate. My benchmarks didn't show any
> > significant effect.
> 
> I have to rerun my benchmarks. Under high load (so 64Kb, four threads
> writting as much as they can to a iSCSI disk), the IRQ rate for each
> blkif went from 2-3/sec to ~5K/sec. But I did not do a good
> job on capturing the submission latency to see if the I/Os get the
> response back as fast (or the same) as without your patch.
> 
> And the iSCSI disk on the target side was an RAMdisk, so latency
> was quite small which is not fair to your problem.
> 
> Do you have a program to measure the latency for the workload you
> had encountered? I would like to run those numbers myself.

Ran some more benchmarks over this week. This time I tried to run it on:

 - iSCSI target (1GB, and on the "other side" it wakes up every 1msec, so the
   latency is set to 1msec).
 - scsi_debug delay=0 (no delay and as fast possible. Comes out to be about
   4 microseconds completion with queue depth of one with 32K I/Os).
 - local SATAI 80GB ST3808110AS. Still running as it is quite slow.

With only one PV guest doing a round (three times) of two threads randomly
writting I/Os with a queue depth of 256. Then a different round of four
threads writting/reading (80/20) 512bytes up to 64K randomly over the disk.

I used the attached patch against #master (git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git)
to gauge how well we are doing (and what the interrupt generation rate is).

These workloads I think would be considered 'high I/O' and I was expecting
your patch to not have any influence on the numbers.

But to my surprise the case where the I/O latency is high, the interrupt generation
was quite small. But where the I/O latency was very very small (4 microseconds)
the interrupt generation was on average about 20K/s. And this is with a queue depth
of 256 with four threads. I was expecting the opposite. Hence quite curious
to see your use case.

What do you consider a middle I/O and low I/O cases? Do you use 'fio' for your
testing?

With the high I/O load, the numbers came out to give us about 1% benefit with your
patch. However, I am worried (maybe unneccassarily?) about the 20K interrupt generation
when the iometer tests kicked in (this was only when using the unrealistic 'scsi_debug'
drive).

The picture of this using iSCSI target:
http://darnok.org/xen/amazon/iscsi_target/iometer-bw.png

And when done on top of local RAMdisk:
http://darnok.org/xen/amazon/scsi_debug/iometer-bw.png


[-- Attachment #2: amazon-debug.patch --]
[-- Type: text/x-diff, Size: 3282 bytes --]

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index dba55e3..83c24ed 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -60,8 +60,11 @@ static int xen_blkif_reqs = 64;
 module_param_named(reqs, xen_blkif_reqs, int, 0);
 MODULE_PARM_DESC(reqs, "Number of blkback requests to allocate");
 
+static int xen_kick_front = 1;
+module_param(xen_kick_front, int, 0644);
+
 /* Run-time switchable: /sys/module/blkback/parameters/ */
-static unsigned int log_stats;
+static unsigned int log_stats = 1;
 module_param(log_stats, int, 0644);
 
 /*
@@ -255,10 +258,21 @@ static void print_stats(struct xen_blkif *blkif)
 	pr_info("xen-blkback (%s): oo %3d  |  rd %4d  |  wr %4d  |  f %4d\n",
 		 current->comm, blkif->st_oo_req,
 		 blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req);
+
+	if (blkif->st_reqs_avail) {
+		pr_info("xen-blkback (%s):  bk %4d fk %4d | avail %4d finished %4d\n",
+			current->comm, blkif->st_back_kick, blkif->st_front_kick,
+		 	blkif->st_reqs_avail, blkif->st_reqs_finished);
+	}
+
 	blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
 	blkif->st_rd_req = 0;
 	blkif->st_wr_req = 0;
 	blkif->st_oo_req = 0;
+	blkif->st_back_kick = 0;
+	blkif->st_front_kick = 0;
+	blkif->st_reqs_avail = 0;
+	blkif->st_reqs_finished = 0;
 }
 
 int xen_blkif_schedule(void *arg)
@@ -459,6 +473,7 @@ static int do_block_io_op(struct xen_blkif *blkif)
 	struct pending_req *pending_req;
 	RING_IDX rc, rp;
 	int more_to_do = 0;
+	unsigned long flags;
 
 	rc = blk_rings->common.req_cons;
 	rp = blk_rings->common.sring->req_prod;
@@ -505,7 +520,13 @@ static int do_block_io_op(struct xen_blkif *blkif)
 		/* Yield point for this unbounded loop. */
 		cond_resched();
 	}
-
+	if (!more_to_do && xen_kick_front) {
+		spin_lock_irqsave(&blkif->blk_ring_lock, flags);
+		RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
+		if (more_to_do)
+			blkif->st_reqs_avail ++;
+		spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
+	}
 	return more_to_do;
 }
 
@@ -727,6 +748,7 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 	blk_rings->common.rsp_prod_pvt++;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
 	if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons) {
+		blkif->st_reqs_finished ++;
 		/*
 		 * Tail check for pending requests. Allows frontend to avoid
 		 * notifications if requests are already in flight (lower
@@ -740,10 +762,14 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 
 	spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
 
-	if (more_to_do)
+	if (more_to_do) {
+		blkif->st_back_kick++;
 		blkif_notify_work(blkif);
-	if (notify)
+	}
+	if (notify) {
+		blkif->st_front_kick ++;
 		notify_remote_via_irq(blkif->irq);
+	}
 }
 
 static int __init xen_blkif_init(void)
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 9e40b28..ccb72e2 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -161,6 +161,10 @@ struct xen_blkif {
 	int			st_f_req;
 	int			st_rd_sect;
 	int			st_wr_sect;
+	int 			st_reqs_finished;
+	int			st_reqs_avail;
+	int			st_front_kick;
+	int			st_back_kick;
 
 	wait_queue_head_t	waiting_to_free;
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-16 15:22               ` Konrad Rzeszutek Wilk
@ 2011-05-20  6:12                 ` Vincent, Pradeep
  2011-05-24 16:02                   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 33+ messages in thread
From: Vincent, Pradeep @ 2011-05-20  6:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Daniel, Jeremy Fitzhardinge, xen-devel, Jan Beulich, Stodden

Hey Konrad, 

Thanks for running the tests. Very useful data.

Re: Experiment to show latency improvement

I never ran anything on ramdisk.

You should be able to see the latency benefit with 'orion' tool but I am
sure other tools can be used as well. For a volume backed by a single disk
drive, keep the number of small random I/O outstanding to 2 (I think
"num_small" parameter in orion should do the job) with a 50-50 mix of
write and read. Measure the latency reported by the guest and Dom-0 &
compare them. For LVM volumes that present multiple drives as a single LUN
(inside the guest), the latency improvement will be the highest when the
number of I/O outstanding is 2X the number of spindles. This is the
'moderate I/O' scenario I was describing and you should see significant
improvement in latencies.


If you allow page cache to perform sequential I/O using dd or other
sequential non-direct I/O generation tool, you should find that the
interrupt rate doesn't go up for high I/O load. Thinking about this, I
think burstiness of I/O submission as seen by the driver is also a key
player particularly in the absence of I/O coalescing waits introduced by
I/O scheduler. Page cache draining is notoriously bursty.

>>queue depth of 256.

What 'queue depth' is this ? If I am not wrong, blkfront-blkback is
restricted to ~32 max pending I/Os due to the limit of one page being used
for mailbox entries - no ?

>>But to my surprise the case where the I/O latency is high, the interrupt
>>generation was quite small

If this patch results in an extra interrupt, it will very likely result in
reduction of latency for the next I/O. If the interrupt generation
increase is not high, then the number of I/Os whose latencies this patch
has improved is low. Looks like your workload belonged to this category.
Perhaps that's why you didn't much of an improvement in overall
performance ? I think this is close to the high I/O workload scenario I
described.

>>But where the I/O latency was very very small (4 microseconds) the
>>interrupt generation was on average about 20K/s.

This is not a scenario I tested but the results aren't surprising.  This
isn't the high I/O load I was describing though (I didn't test ramdisk).
SSD is probably the closest real world workload.
An increase of 20K/sec means this patch very likely improved latency of
20K I/Os per sec although the absolute value of latency improvements would
be smaller in this case. 20K/sec interrupt rate (50usec delay between
interrupt) is something I would be comfortable with if they directly
translate to latency improvement for the users. The graphs seem to
indicate a 5% increase in throughput for this case - Am I reading the
graphs right ? 

Overall, Very useful tests indeed and I haven't seen anything too
concerning or unexpected except that I don't think you have seen the 50+%
latency benefit that the patch got me in my moderate I/O benchmark :-)
Feel free to ping me offline if you aren't able to see the latency impact
using the 'moderate I/O' methodology described above.

About IRQ coalescing: Stepping back a bit, there are few different use
cases that irq coalescing mechanism would be useful for

1. Latency sensitive workload: Wait time of 10s of usecs. Particularly
useful for SSDs. 
2. Interrupt rate conscious workload/environment: Wait time of 200+ usecs
which will essentially cap the theoretical interrupt rate to 5K.
3. Excessive CPU consumption Mitigation: This is similar to (2) but
includes the case of malicious guests. Perhaps not a big concern unless
you have lots of drives attached to each guest.

I suspect the implementation for (1) and (2) would be different (spin vs
sleep perhaps). (3) can't be implemented by manipulation of 'req_event'
since a guest has the ability to abuse irq channel independent of what
'blkback' tries to tell 'blkfront' via 'req_event' manipulation.

(3) could be implemented in the hypervisor as a generic irq throttler that
could be leveraged for all irqs heading to Dom-0 from DomUs including
blkback/netback. Such a mechanism could potentially solve (1) and/or (2)
as well. Thoughts ?

One crude way to address (3) for 'many disk drive' scenario is to pin
all/most blkback interrupts for an instance to the same CPU core in Dom-0
and throttle down the thread wake up (wake_up(&blkif->wq) in
blkif_notify_work) that usually results in IPIs. Not an elegant solution
but might be a good crutch.

Another angle to (1) and (2) is whether these irq coalesce settings should
be controllable by the guest, perhaps within limits set by the
administrator. 

Thoughts ? Suggestions ?

Konrad, Love to help out if you are already working on something around
irq coalescing. Or when I have irq coalescing functionality that can be
consumed by community I will certainly submit them.

Meanwhile, I wouldn't want to deny Xen users the advantage of this patch
just because there is no irq coalescing functionality. Particularly since
the downside is very minimal on blkfront-blkback stack. My 2 cents..

Thanks much Konrad,

- Pradeep Vincent




On 5/16/11 8:22 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:

>On Thu, May 12, 2011 at 10:51:32PM -0400, Konrad Rzeszutek Wilk wrote:
>> > >>what were the numbers when it came to high bandwidth numbers
>> > 
>> > Under high I/O workload, where the blkfront would fill up the queue as
>> > blkback works the queue, the I/O latency problem in question doesn't
>> > manifest itself and as a result this patch doesn't make much of a
>> > difference in terms of interrupt rate. My benchmarks didn't show any
>> > significant effect.
>> 
>> I have to rerun my benchmarks. Under high load (so 64Kb, four threads
>> writting as much as they can to a iSCSI disk), the IRQ rate for each
>> blkif went from 2-3/sec to ~5K/sec. But I did not do a good
>> job on capturing the submission latency to see if the I/Os get the
>> response back as fast (or the same) as without your patch.
>> 
>> And the iSCSI disk on the target side was an RAMdisk, so latency
>> was quite small which is not fair to your problem.
>> 
>> Do you have a program to measure the latency for the workload you
>> had encountered? I would like to run those numbers myself.
>
>Ran some more benchmarks over this week. This time I tried to run it on:
>
> - iSCSI target (1GB, and on the "other side" it wakes up every 1msec, so
>the
>   latency is set to 1msec).
> - scsi_debug delay=0 (no delay and as fast possible. Comes out to be
>about
>   4 microseconds completion with queue depth of one with 32K I/Os).
> - local SATAI 80GB ST3808110AS. Still running as it is quite slow.
>
>With only one PV guest doing a round (three times) of two threads randomly
>writting I/Os with a queue depth of 256. Then a different round of four
>threads writting/reading (80/20) 512bytes up to 64K randomly over the
>disk.
>
>I used the attached patch against #master
>(git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git)
>to gauge how well we are doing (and what the interrupt generation rate
>is).
>
>These workloads I think would be considered 'high I/O' and I was expecting
>your patch to not have any influence on the numbers.
>
>But to my surprise the case where the I/O latency is high, the interrupt
>generation
>was quite small. But where the I/O latency was very very small (4
>microseconds)
>the interrupt generation was on average about 20K/s. And this is with a
>queue depth
>of 256 with four threads. I was expecting the opposite. Hence quite
>curious
>to see your use case.
>
>What do you consider a middle I/O and low I/O cases? Do you use 'fio' for
>your
>testing?
>
>With the high I/O load, the numbers came out to give us about 1% benefit
>with your
>patch. However, I am worried (maybe unneccassarily?) about the 20K
>interrupt generation
>when the iometer tests kicked in (this was only when using the
>unrealistic 'scsi_debug'
>drive).
>
>The picture of this using iSCSI target:
>http://darnok.org/xen/amazon/iscsi_target/iometer-bw.png
>
>And when done on top of local RAMdisk:
>http://darnok.org/xen/amazon/scsi_debug/iometer-bw.png
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-20  6:12                 ` Vincent, Pradeep
@ 2011-05-24 16:02                   ` Konrad Rzeszutek Wilk
  2011-05-24 22:40                     ` Vincent, Pradeep
  0 siblings, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-24 16:02 UTC (permalink / raw)
  To: Vincent, Pradeep
  Cc: Jeremy Fitzhardinge, xen-devel, Daniel, Jan Beulich, Stodden

On Thu, May 19, 2011 at 11:12:25PM -0700, Vincent, Pradeep wrote:
> Hey Konrad, 
> 
> Thanks for running the tests. Very useful data.
> 
> Re: Experiment to show latency improvement
> 
> I never ran anything on ramdisk.
> 
> You should be able to see the latency benefit with 'orion' tool but I am

Link?

> sure other tools can be used as well. For a volume backed by a single disk
> drive, keep the number of small random I/O outstanding to 2 (I think
> "num_small" parameter in orion should do the job) with a 50-50 mix of
> write and read. Measure the latency reported by the guest and Dom-0 &
> compare them. For LVM volumes that present multiple drives as a single LUN
> (inside the guest), the latency improvement will be the highest when the
> number of I/O outstanding is 2X the number of spindles. This is the
> 'moderate I/O' scenario I was describing and you should see significant
> improvement in latencies.

Ok.
> 
> 
> If you allow page cache to perform sequential I/O using dd or other
> sequential non-direct I/O generation tool, you should find that the
> interrupt rate doesn't go up for high I/O load. Thinking about this, I
> think burstiness of I/O submission as seen by the driver is also a key
> player particularly in the absence of I/O coalescing waits introduced by
> I/O scheduler. Page cache draining is notoriously bursty.

Sure, .. thought most of the tests I've been doing have been bypassing
the page cache.
> 
> >>queue depth of 256.
> 
> What 'queue depth' is this ? If I am not wrong, blkfront-blkback is

The 'request_queue' one. This is the block API one.

> restricted to ~32 max pending I/Os due to the limit of one page being used
> for mailbox entries - no ?

This is the frontend's block API queue I was thinking about. In regards
to the ring buffer .. um, not exactly sure the right number (would have to
compute it), but it is much bigger I believe.
The ring buffer entries are for 'requests', wherein each request can contain
up to 11 pages of data (nr segments).

> 
> >>But to my surprise the case where the I/O latency is high, the interrupt
> >>generation was quite small
> 
> If this patch results in an extra interrupt, it will very likely result in
> reduction of latency for the next I/O. If the interrupt generation
> increase is not high, then the number of I/Os whose latencies this patch
> has improved is low. Looks like your workload belonged to this category.
> Perhaps that's why you didn't much of an improvement in overall
> performance ? I think this is close to the high I/O workload scenario I
> described.
Ok
> 
> >>But where the I/O latency was very very small (4 microseconds) the
> >>interrupt generation was on average about 20K/s.
> 
> This is not a scenario I tested but the results aren't surprising.  This
> isn't the high I/O load I was describing though (I didn't test ramdisk).
> SSD is probably the closest real world workload.
> An increase of 20K/sec means this patch very likely improved latency of
> 20K I/Os per sec although the absolute value of latency improvements would
> be smaller in this case. 20K/sec interrupt rate (50usec delay between
> interrupt) is something I would be comfortable with if they directly
> translate to latency improvement for the users. The graphs seem to
> indicate a 5% increase in throughput for this case - Am I reading the

I came up with 1%. But those are a bit unrealistic - and I ordered
an SSD to do some proper testing.

> graphs right ? 
> 
> Overall, Very useful tests indeed and I haven't seen anything too
> concerning or unexpected except that I don't think you have seen the 50+%
> latency benefit that the patch got me in my moderate I/O benchmark :-)

Let me redo the tests again.

> Feel free to ping me offline if you aren't able to see the latency impact
> using the 'moderate I/O' methodology described above.
> 
> About IRQ coalescing: Stepping back a bit, there are few different use
> cases that irq coalescing mechanism would be useful for
> 
> 1. Latency sensitive workload: Wait time of 10s of usecs. Particularly
> useful for SSDs. 
> 2. Interrupt rate conscious workload/environment: Wait time of 200+ usecs
> which will essentially cap the theoretical interrupt rate to 5K.
> 3. Excessive CPU consumption Mitigation: This is similar to (2) but
> includes the case of malicious guests. Perhaps not a big concern unless
> you have lots of drives attached to each guest.
> 
> I suspect the implementation for (1) and (2) would be different (spin vs
> sleep perhaps). (3) can't be implemented by manipulation of 'req_event'
> since a guest has the ability to abuse irq channel independent of what
> 'blkback' tries to tell 'blkfront' via 'req_event' manipulation.
> 
> (3) could be implemented in the hypervisor as a generic irq throttler that
> could be leveraged for all irqs heading to Dom-0 from DomUs including
> blkback/netback. Such a mechanism could potentially solve (1) and/or (2)
> as well. Thoughts ?

The hypervisor does have some irq storm avoidancy mechanism. Thought the
number is 100K/sec and it only applies to physical IRQs.
> 
> One crude way to address (3) for 'many disk drive' scenario is to pin
> all/most blkback interrupts for an instance to the same CPU core in Dom-0
> and throttle down the thread wake up (wake_up(&blkif->wq) in
> blkif_notify_work) that usually results in IPIs. Not an elegant solution
> but might be a good crutch.
> 
> Another angle to (1) and (2) is whether these irq coalesce settings should
> be controllable by the guest, perhaps within limits set by the
> administrator. 
> 
> Thoughts ? Suggestions ?
> 
> Konrad, Love to help out if you are already working on something around
> irq coalescing. Or when I have irq coalescing functionality that can be

Not yet. Hence hinting for you to do it :-)

> consumed by community I will certainly submit them.
> 
> Meanwhile, I wouldn't want to deny Xen users the advantage of this patch
> just because there is no irq coalescing functionality. Particularly since
> the downside is very minimal on blkfront-blkback stack. My 2 cents..
> 
> Thanks much Konrad,
> 
> - Pradeep Vincent
> 
> 
> 
> 
> On 5/16/11 8:22 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
> 
> >On Thu, May 12, 2011 at 10:51:32PM -0400, Konrad Rzeszutek Wilk wrote:
> >> > >>what were the numbers when it came to high bandwidth numbers
> >> > 
> >> > Under high I/O workload, where the blkfront would fill up the queue as
> >> > blkback works the queue, the I/O latency problem in question doesn't
> >> > manifest itself and as a result this patch doesn't make much of a
> >> > difference in terms of interrupt rate. My benchmarks didn't show any
> >> > significant effect.
> >> 
> >> I have to rerun my benchmarks. Under high load (so 64Kb, four threads
> >> writting as much as they can to a iSCSI disk), the IRQ rate for each
> >> blkif went from 2-3/sec to ~5K/sec. But I did not do a good
> >> job on capturing the submission latency to see if the I/Os get the
> >> response back as fast (or the same) as without your patch.
> >> 
> >> And the iSCSI disk on the target side was an RAMdisk, so latency
> >> was quite small which is not fair to your problem.
> >> 
> >> Do you have a program to measure the latency for the workload you
> >> had encountered? I would like to run those numbers myself.
> >
> >Ran some more benchmarks over this week. This time I tried to run it on:
> >
> > - iSCSI target (1GB, and on the "other side" it wakes up every 1msec, so
> >the
> >   latency is set to 1msec).
> > - scsi_debug delay=0 (no delay and as fast possible. Comes out to be
> >about
> >   4 microseconds completion with queue depth of one with 32K I/Os).
> > - local SATAI 80GB ST3808110AS. Still running as it is quite slow.
> >
> >With only one PV guest doing a round (three times) of two threads randomly
> >writting I/Os with a queue depth of 256. Then a different round of four
> >threads writting/reading (80/20) 512bytes up to 64K randomly over the
> >disk.
> >
> >I used the attached patch against #master
> >(git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git)
> >to gauge how well we are doing (and what the interrupt generation rate
> >is).
> >
> >These workloads I think would be considered 'high I/O' and I was expecting
> >your patch to not have any influence on the numbers.
> >
> >But to my surprise the case where the I/O latency is high, the interrupt
> >generation
> >was quite small. But where the I/O latency was very very small (4
> >microseconds)
> >the interrupt generation was on average about 20K/s. And this is with a
> >queue depth
> >of 256 with four threads. I was expecting the opposite. Hence quite
> >curious
> >to see your use case.
> >
> >What do you consider a middle I/O and low I/O cases? Do you use 'fio' for
> >your
> >testing?
> >
> >With the high I/O load, the numbers came out to give us about 1% benefit
> >with your
> >patch. However, I am worried (maybe unneccassarily?) about the 20K
> >interrupt generation
> >when the iometer tests kicked in (this was only when using the
> >unrealistic 'scsi_debug'
> >drive).
> >
> >The picture of this using iSCSI target:
> >http://darnok.org/xen/amazon/iscsi_target/iometer-bw.png
> >
> >And when done on top of local RAMdisk:
> >http://darnok.org/xen/amazon/scsi_debug/iometer-bw.png
> >
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH] blkback: Fix block I/O latency issue
  2011-05-24 16:02                   ` Konrad Rzeszutek Wilk
@ 2011-05-24 22:40                     ` Vincent, Pradeep
  0 siblings, 0 replies; 33+ messages in thread
From: Vincent, Pradeep @ 2011-05-24 22:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Fitzhardinge, xen-devel, Jeremy, Jan Beulich, Daniel, Stodden

Response inline..

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Tuesday, May 24, 2011 9:03 AM
To: Vincent, Pradeep
Cc: Daniel@acsinet11.oracle.com; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Jan Beulich; Stodden
Subject: Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue

On Thu, May 19, 2011 at 11:12:25PM -0700, Vincent, Pradeep wrote:
> Hey Konrad, 
> 
> Thanks for running the tests. Very useful data.
> 
> Re: Experiment to show latency improvement
> 
> I never ran anything on ramdisk.
> 
> You should be able to see the latency benefit with 'orion' tool but I am

>Link?

PV: http://www.oracle.com/technetwork/topics/index-089595.html

> sure other tools can be used as well. For a volume backed by a single disk
> drive, keep the number of small random I/O outstanding to 2 (I think
> "num_small" parameter in orion should do the job) with a 50-50 mix of
> write and read. Measure the latency reported by the guest and Dom-0 &
> compare them. For LVM volumes that present multiple drives as a single LUN
> (inside the guest), the latency improvement will be the highest when the
> number of I/O outstanding is 2X the number of spindles. This is the
> 'moderate I/O' scenario I was describing and you should see significant
> improvement in latencies.

Ok.
> 
> 
> If you allow page cache to perform sequential I/O using dd or other
> sequential non-direct I/O generation tool, you should find that the
> interrupt rate doesn't go up for high I/O load. Thinking about this, I
> think burstiness of I/O submission as seen by the driver is also a key
> player particularly in the absence of I/O coalescing waits introduced by
> I/O scheduler. Page cache draining is notoriously bursty.

Sure, .. thought most of the tests I've been doing have been bypassing
the page cache.
> 
> >>queue depth of 256.
> 
> What 'queue depth' is this ? If I am not wrong, blkfront-blkback is

The 'request_queue' one. This is the block API one.

PV: Got it.

> restricted to ~32 max pending I/Os due to the limit of one page being used
> for mailbox entries - no ?

>This is the frontend's block API queue I was thinking about. In regards
to the ring buffer .. um, not exactly sure the right number (would have to
compute it), but it is much bigger I believe.
The ring buffer entries are for 'requests', wherein each request can contain
>up to 11 pages of data (nr segments).

PV: I just did a back of the envelope calculation for size of blkif_request that gave me ~78 bytes, primarily dominated by 6 bytes per segment for 11 segments per request. This would result in max pending I/O count of 32. This matches my recollection from long time back but not sure if I missed something. Of course, like you said each I/O req can have 44K of data but small sized random I/O can't take advantage of it. (If I am not wrong, netback takes a slightly different approach where each slot is essentially a 4K page and multiple slots are used for larger sized packets.)

> 
> >>But to my surprise the case where the I/O latency is high, the interrupt
> >>generation was quite small
> 
> If this patch results in an extra interrupt, it will very likely result in
> reduction of latency for the next I/O. If the interrupt generation
> increase is not high, then the number of I/Os whose latencies this patch
> has improved is low. Looks like your workload belonged to this category.
> Perhaps that's why you didn't much of an improvement in overall
> performance ? I think this is close to the high I/O workload scenario I
> described.
Ok
> 
> >>But where the I/O latency was very very small (4 microseconds) the
> >>interrupt generation was on average about 20K/s.
> 
> This is not a scenario I tested but the results aren't surprising.  This
> isn't the high I/O load I was describing though (I didn't test ramdisk).
> SSD is probably the closest real world workload.
> An increase of 20K/sec means this patch very likely improved latency of
> 20K I/Os per sec although the absolute value of latency improvements would
> be smaller in this case. 20K/sec interrupt rate (50usec delay between
> interrupt) is something I would be comfortable with if they directly
> translate to latency improvement for the users. The graphs seem to
> indicate a 5% increase in throughput for this case - Am I reading the

>I came up with 1%. But those are a bit unrealistic - and I ordered
>an SSD to do some proper testing.

PV: Terrific.

> graphs right ? 
> 
> Overall, Very useful tests indeed and I haven't seen anything too
> concerning or unexpected except that I don't think you have seen the 50+%
> latency benefit that the patch got me in my moderate I/O benchmark :-)

Let me redo the tests again.

PV: Thanks much. Let me know if you need more info on test setup.

> Feel free to ping me offline if you aren't able to see the latency impact
> using the 'moderate I/O' methodology described above.
> 
> About IRQ coalescing: Stepping back a bit, there are few different use
> cases that irq coalescing mechanism would be useful for
> 
> 1. Latency sensitive workload: Wait time of 10s of usecs. Particularly
> useful for SSDs. 
> 2. Interrupt rate conscious workload/environment: Wait time of 200+ usecs
> which will essentially cap the theoretical interrupt rate to 5K.
> 3. Excessive CPU consumption Mitigation: This is similar to (2) but
> includes the case of malicious guests. Perhaps not a big concern unless
> you have lots of drives attached to each guest.
> 
> I suspect the implementation for (1) and (2) would be different (spin vs
> sleep perhaps). (3) can't be implemented by manipulation of 'req_event'
> since a guest has the ability to abuse irq channel independent of what
> 'blkback' tries to tell 'blkfront' via 'req_event' manipulation.
> 
> (3) could be implemented in the hypervisor as a generic irq throttler that
> could be leveraged for all irqs heading to Dom-0 from DomUs including
> blkback/netback. Such a mechanism could potentially solve (1) and/or (2)
> as well. Thoughts ?

The hypervisor does have some irq storm avoidancy mechanism. Thought the
>number is 100K/sec and it only applies to physical IRQs.

PV: I will take a closer look to see what hypervisor already does here.

> 
> One crude way to address (3) for 'many disk drive' scenario is to pin
> all/most blkback interrupts for an instance to the same CPU core in Dom-0
> and throttle down the thread wake up (wake_up(&blkif->wq) in
> blkif_notify_work) that usually results in IPIs. Not an elegant solution
> but might be a good crutch.
> 
> Another angle to (1) and (2) is whether these irq coalesce settings should
> be controllable by the guest, perhaps within limits set by the
> administrator. 
> 
> Thoughts ? Suggestions ?
> 
> Konrad, Love to help out if you are already working on something around
> irq coalescing. Or when I have irq coalescing functionality that can be

Not yet. Hence hinting for you to do it :-)

> consumed by community I will certainly submit them.
> 
> Meanwhile, I wouldn't want to deny Xen users the advantage of this patch
> just because there is no irq coalescing functionality. Particularly since
> the downside is very minimal on blkfront-blkback stack. My 2 cents..
> 
> Thanks much Konrad,
> 
> - Pradeep Vincent
> 
> 
> 
> 
> On 5/16/11 8:22 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
> 
> >On Thu, May 12, 2011 at 10:51:32PM -0400, Konrad Rzeszutek Wilk wrote:
> >> > >>what were the numbers when it came to high bandwidth numbers
> >> > 
> >> > Under high I/O workload, where the blkfront would fill up the queue as
> >> > blkback works the queue, the I/O latency problem in question doesn't
> >> > manifest itself and as a result this patch doesn't make much of a
> >> > difference in terms of interrupt rate. My benchmarks didn't show any
> >> > significant effect.
> >> 
> >> I have to rerun my benchmarks. Under high load (so 64Kb, four threads
> >> writting as much as they can to a iSCSI disk), the IRQ rate for each
> >> blkif went from 2-3/sec to ~5K/sec. But I did not do a good
> >> job on capturing the submission latency to see if the I/Os get the
> >> response back as fast (or the same) as without your patch.
> >> 
> >> And the iSCSI disk on the target side was an RAMdisk, so latency
> >> was quite small which is not fair to your problem.
> >> 
> >> Do you have a program to measure the latency for the workload you
> >> had encountered? I would like to run those numbers myself.
> >
> >Ran some more benchmarks over this week. This time I tried to run it on:
> >
> > - iSCSI target (1GB, and on the "other side" it wakes up every 1msec, so
> >the
> >   latency is set to 1msec).
> > - scsi_debug delay=0 (no delay and as fast possible. Comes out to be
> >about
> >   4 microseconds completion with queue depth of one with 32K I/Os).
> > - local SATAI 80GB ST3808110AS. Still running as it is quite slow.
> >
> >With only one PV guest doing a round (three times) of two threads randomly
> >writting I/Os with a queue depth of 256. Then a different round of four
> >threads writting/reading (80/20) 512bytes up to 64K randomly over the
> >disk.
> >
> >I used the attached patch against #master
> >(git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git)
> >to gauge how well we are doing (and what the interrupt generation rate
> >is).
> >
> >These workloads I think would be considered 'high I/O' and I was expecting
> >your patch to not have any influence on the numbers.
> >
> >But to my surprise the case where the I/O latency is high, the interrupt
> >generation
> >was quite small. But where the I/O latency was very very small (4
> >microseconds)
> >the interrupt generation was on average about 20K/s. And this is with a
> >queue depth
> >of 256 with four threads. I was expecting the opposite. Hence quite
> >curious
> >to see your use case.
> >
> >What do you consider a middle I/O and low I/O cases? Do you use 'fio' for
> >your
> >testing?
> >
> >With the high I/O load, the numbers came out to give us about 1% benefit
> >with your
> >patch. However, I am worried (maybe unneccassarily?) about the 20K
> >interrupt generation
> >when the iometer tests kicked in (this was only when using the
> >unrealistic 'scsi_debug'
> >drive).
> >
> >The picture of this using iSCSI target:
> >http://darnok.org/xen/amazon/iscsi_target/iometer-bw.png
> >
> >And when done on top of local RAMdisk:
> >http://darnok.org/xen/amazon/scsi_debug/iometer-bw.png
> >
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RE-PATCH] Re: [PATCH] blkback: Fix block I/O latency issue
  2011-05-02  7:04 [PATCH] blkback: Fix block I/O latency issue Vincent, Pradeep
  2011-05-02  8:13 ` Jan Beulich
@ 2011-05-28 20:12 ` Daniel Stodden
  2011-05-28 20:21   ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Daniel Stodden
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel Stodden @ 2011-05-28 20:12 UTC (permalink / raw)
  To: Vincent, Pradeep, Konrad Rzeszutek Wilk, Jeremy Fitzhardinge; +Cc: xen-devel


Hi.

I got a chance to look into the problem below while chasing after some
issues with big ring support.  The way the blkback request dispatching
presently works is indeed critically retarded.

What the present approach essentially does is blocking any further
request processing until the previously observed batch starts to
complete. This completely kills throughput in the very common case where
guests unplug or notify early, while rightfully assuming the reminder of
their batch will be picked up as timely as the initial one.

Proposed solution is to render request consumption and response
production independent, as usual.

Without having worked my way through the remainder of this thread again,
the original goes into the right direction, but I think it should
distinguish more between more_to_do -> we got more requests, go for
them, and more_to_do -> we got more requests, but we're resource
congested, so wait().

Remaining request related processing in make_response is garbage, so
removed.

Patch follows and I'd strongly recommend to apply this or a similar fix
to any tree under maintenance too.

Daniel

On Mon, 2011-05-02 at 03:04 -0400, Vincent, Pradeep wrote:
> In blkback driver, after I/O requests are submitted to Dom-0 block I/O subsystem, blkback goes to 'sleep' effectively without letting blkfront know about it (req_event isn't set appropriately). Hence blkfront doesn't notify blkback when it submits a new I/O thus delaying the 'dispatch' of the new I/O to Dom-0 block I/O subsystem. The new I/O is dispatched as soon as one of the previous I/Os completes.
> 
> As a result of this issue, the block I/O latency performance is degraded for some workloads on Xen guests using blkfront-blkback stack.
> 
> The following change addresses this issue:
> 
> 
> Signed-off-by: Pradeep Vincent <pradeepv@amazon.com>
> 
> diff --git a/drivers/xen/blkback/blkback.c b/drivers/xen/blkback/blkback.c
> --- a/drivers/xen/blkback/blkback.c
> +++ b/drivers/xen/blkback/blkback.c
> @@ -383,6 +383,12 @@ static int do_block_io_op(blkif_t *blkif)
>   cond_resched();
>   }
> 
> + /* If blkback might go to sleep (i.e. more_to_do == 0) then we better
> +   let blkfront know about it (by setting req_event appropriately) so that
> +   blkfront will bother to wake us up (via interrupt) when it submits a
> +   new I/O */
> +        if (!more_to_do)
> +                 RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
>   return more_to_do;
>  }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-05-28 20:12 ` [RE-PATCH] " Daniel Stodden
@ 2011-05-28 20:21   ` Daniel Stodden
  2011-05-29  8:09     ` Vincent, Pradeep
  2011-05-31 16:08     ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 33+ messages in thread
From: Daniel Stodden @ 2011-05-28 20:21 UTC (permalink / raw)
  To: Xen; +Cc: pradeepv, Daniel Stodden, konrad.wilk

Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
idea. It means that in-flight I/O is essentially blocking continued
batches. This essentially kills throughput on frontends which unplug
(or even just notify) early and rightfully assume addtional requests
will be picked up on time, not synchronously.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
---
 drivers/block/xen-blkback/blkback.c |   36 ++++++++++++++++++----------------
 1 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 9dee545..48ad7fa 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -451,7 +451,8 @@ static void end_block_io_op(struct bio *bio, int error)
  * (which has the sectors we want, number of them, grant references, etc),
  * and transmute  it to the block API to hand it over to the proper block disk.
  */
-static int do_block_io_op(struct xen_blkif *blkif)
+static int
+__do_block_io_op(struct xen_blkif *blkif)
 {
 	union blkif_back_rings *blk_rings = &blkif->blk_rings;
 	struct blkif_request req;
@@ -508,6 +509,23 @@ static int do_block_io_op(struct xen_blkif *blkif)
 	return more_to_do;
 }
 
+static int
+do_block_io_op(blkif_t *blkif)
+{
+	blkif_back_rings_t *blk_rings = &blkif->blk_rings;
+	int more_to_do;
+
+	do {
+		more_to_do = __do_block_io_op(blkif);
+		if (more_to_do)
+			break;
+
+		RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
+	} while (more_to_do);
+
+	return more_to_do;
+}
+
 /*
  * Transmutation of the 'struct blkif_request' to a proper 'struct bio'
  * and call the 'submit_bio' to pass it to the underlying storage.
@@ -698,7 +716,6 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 	struct blkif_response  resp;
 	unsigned long     flags;
 	union blkif_back_rings *blk_rings = &blkif->blk_rings;
-	int more_to_do = 0;
 	int notify;
 
 	resp.id        = id;
@@ -725,22 +742,7 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 	}
 	blk_rings->common.rsp_prod_pvt++;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
-	if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons) {
-		/*
-		 * Tail check for pending requests. Allows frontend to avoid
-		 * notifications if requests are already in flight (lower
-		 * overheads and promotes batching).
-		 */
-		RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
-
-	} else if (RING_HAS_UNCONSUMED_REQUESTS(&blk_rings->common)) {
-		more_to_do = 1;
-	}
-
 	spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
-
-	if (more_to_do)
-		blkif_notify_work(blkif);
 	if (notify)
 		notify_remote_via_irq(blkif->irq);
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-05-28 20:21   ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Daniel Stodden
@ 2011-05-29  8:09     ` Vincent, Pradeep
  2011-05-29 11:34       ` Daniel Stodden
  2011-05-31 13:44       ` Fix wrong help message for parameter nestedhvm Dong, Eddie
  2011-05-31 16:08     ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 33+ messages in thread
From: Vincent, Pradeep @ 2011-05-29  8:09 UTC (permalink / raw)
  To: Daniel Stodden, Xen; +Cc: konrad.wilk

Opportunistically avoiding interrupts by checking for I/Os in the flight
doesn't sound like a bad idea. I think the RING_HAS_UNCONSUMED_REQUESTS
call and what follows should be retained in 'make_response'.

Also, should RING_FINAL_CHECK_FOR_REQUESTS be protected by blk_ring_lock ?


- Pradeep Vincent


On 5/28/11 1:21 PM, "Daniel Stodden" <daniel.stodden@citrix.com> wrote:

>Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
>idea. It means that in-flight I/O is essentially blocking continued
>batches. This essentially kills throughput on frontends which unplug
>(or even just notify) early and rightfully assume addtional requests
>will be picked up on time, not synchronously.
>
>Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
>---
> drivers/block/xen-blkback/blkback.c |   36
>++++++++++++++++++----------------
> 1 files changed, 19 insertions(+), 17 deletions(-)
>
>diff --git a/drivers/block/xen-blkback/blkback.c
>b/drivers/block/xen-blkback/blkback.c
>index 9dee545..48ad7fa 100644
>--- a/drivers/block/xen-blkback/blkback.c
>+++ b/drivers/block/xen-blkback/blkback.c
>@@ -451,7 +451,8 @@ static void end_block_io_op(struct bio *bio, int
>error)
>  * (which has the sectors we want, number of them, grant references,
>etc),
>  * and transmute  it to the block API to hand it over to the proper
>block disk.
>  */
>-static int do_block_io_op(struct xen_blkif *blkif)
>+static int
>+__do_block_io_op(struct xen_blkif *blkif)
> {
>     union blkif_back_rings *blk_rings = &blkif->blk_rings;
>     struct blkif_request req;
>@@ -508,6 +509,23 @@ static int do_block_io_op(struct xen_blkif *blkif)
>     return more_to_do;
> }
> 
>+static int
>+do_block_io_op(blkif_t *blkif)
>+{
>+    blkif_back_rings_t *blk_rings = &blkif->blk_rings;
>+    int more_to_do;
>+
>+    do {
>+        more_to_do = __do_block_io_op(blkif);
>+        if (more_to_do)
>+            break;
>+
>+        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
>+    } while (more_to_do);
>+
>+    return more_to_do;
>+}
>+
> /*
>  * Transmutation of the 'struct blkif_request' to a proper 'struct bio'
>  * and call the 'submit_bio' to pass it to the underlying storage.
>@@ -698,7 +716,6 @@ static void make_response(struct xen_blkif *blkif,
>u64 id,
>     struct blkif_response  resp;
>     unsigned long     flags;
>     union blkif_back_rings *blk_rings = &blkif->blk_rings;
>-    int more_to_do = 0;
>     int notify;
> 
>     resp.id        = id;
>@@ -725,22 +742,7 @@ static void make_response(struct xen_blkif *blkif,
>u64 id,
>     }
>     blk_rings->common.rsp_prod_pvt++;
>     RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
>-    if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons) {
>-        /*
>-         * Tail check for pending requests. Allows frontend to avoid
>-         * notifications if requests are already in flight (lower
>-         * overheads and promotes batching).
>-         */
>-        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
>-
>-    } else if (RING_HAS_UNCONSUMED_REQUESTS(&blk_rings->common)) {
>-        more_to_do = 1;
>-    }
>-
>     spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
>-
>-    if (more_to_do)
>-        blkif_notify_work(blkif);
>     if (notify)
>         notify_remote_via_irq(blkif->irq);
> }
>-- 
>1.7.4.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-05-29  8:09     ` Vincent, Pradeep
@ 2011-05-29 11:34       ` Daniel Stodden
  2011-06-01  8:02         ` Vincent, Pradeep
  2011-05-31 13:44       ` Fix wrong help message for parameter nestedhvm Dong, Eddie
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel Stodden @ 2011-05-29 11:34 UTC (permalink / raw)
  To: Vincent, Pradeep; +Cc: Xen, konrad.wilk

On Sun, 2011-05-29 at 04:09 -0400, Vincent, Pradeep wrote:
> Opportunistically avoiding interrupts by checking for I/Os in the flight
> doesn't sound like a bad idea. I think the RING_HAS_UNCONSUMED_REQUESTS
> call and what follows should be retained in 'make_response'.

There's not much room for opportunism left here. After FINAL_CHECK
returning with !_work_to_do you're going to receive an interrupt.
Holding that notification off would kill performance.

>From there on, still leaving a duplicate check around end_io has only an
infinitesimal  chance to preempt (none to prevent) the event reception.
Even if it ever happens, the chance of making a difference in time to
actual thread wake is probably even smaller.

I think it's just overhead. If you disagree, this stuff is easy to prove
or confute with event counting. Good luck :)

> Also, should RING_FINAL_CHECK_FOR_REQUESTS be protected by blk_ring_lock ?

Nope. The ring lock is only needed to sync rsp production. Specifically,
make_response upon request completion colliding with make_response
called from the backend thread (the error case in do_block_io_op).

Should rather be named rsp_lock or so, it doesn't lock anything except
rsp_prod.

Daniel

> 
> - Pradeep Vincent
> 
> 
> On 5/28/11 1:21 PM, "Daniel Stodden" <daniel.stodden@citrix.com> wrote:
> 
> >Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
> >idea. It means that in-flight I/O is essentially blocking continued
> >batches. This essentially kills throughput on frontends which unplug
> >(or even just notify) early and rightfully assume addtional requests
> >will be picked up on time, not synchronously.
> >
> >Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
> >---
> > drivers/block/xen-blkback/blkback.c |   36
> >++++++++++++++++++----------------
> > 1 files changed, 19 insertions(+), 17 deletions(-)
> >
> >diff --git a/drivers/block/xen-blkback/blkback.c
> >b/drivers/block/xen-blkback/blkback.c
> >index 9dee545..48ad7fa 100644
> >--- a/drivers/block/xen-blkback/blkback.c
> >+++ b/drivers/block/xen-blkback/blkback.c
> >@@ -451,7 +451,8 @@ static void end_block_io_op(struct bio *bio, int
> >error)
> >  * (which has the sectors we want, number of them, grant references,
> >etc),
> >  * and transmute  it to the block API to hand it over to the proper
> >block disk.
> >  */
> >-static int do_block_io_op(struct xen_blkif *blkif)
> >+static int
> >+__do_block_io_op(struct xen_blkif *blkif)
> > {
> >     union blkif_back_rings *blk_rings = &blkif->blk_rings;
> >     struct blkif_request req;
> >@@ -508,6 +509,23 @@ static int do_block_io_op(struct xen_blkif *blkif)
> >     return more_to_do;
> > }
> > 
> >+static int
> >+do_block_io_op(blkif_t *blkif)
> >+{
> >+    blkif_back_rings_t *blk_rings = &blkif->blk_rings;
> >+    int more_to_do;
> >+
> >+    do {
> >+        more_to_do = __do_block_io_op(blkif);
> >+        if (more_to_do)
> >+            break;
> >+
> >+        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> >+    } while (more_to_do);
> >+
> >+    return more_to_do;
> >+}
> >+
> > /*
> >  * Transmutation of the 'struct blkif_request' to a proper 'struct bio'
> >  * and call the 'submit_bio' to pass it to the underlying storage.
> >@@ -698,7 +716,6 @@ static void make_response(struct xen_blkif *blkif,
> >u64 id,
> >     struct blkif_response  resp;
> >     unsigned long     flags;
> >     union blkif_back_rings *blk_rings = &blkif->blk_rings;
> >-    int more_to_do = 0;
> >     int notify;
> > 
> >     resp.id        = id;
> >@@ -725,22 +742,7 @@ static void make_response(struct xen_blkif *blkif,
> >u64 id,
> >     }
> >     blk_rings->common.rsp_prod_pvt++;
> >     RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
> >-    if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons) {
> >-        /*
> >-         * Tail check for pending requests. Allows frontend to avoid
> >-         * notifications if requests are already in flight (lower
> >-         * overheads and promotes batching).
> >-         */
> >-        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> >-
> >-    } else if (RING_HAS_UNCONSUMED_REQUESTS(&blk_rings->common)) {
> >-        more_to_do = 1;
> >-    }
> >-
> >     spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
> >-
> >-    if (more_to_do)
> >-        blkif_notify_work(blkif);
> >     if (notify)
> >         notify_remote_via_irq(blkif->irq);
> > }
> >-- 
> >1.7.4.1
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Fix wrong help message for parameter nestedhvm
  2011-05-29  8:09     ` Vincent, Pradeep
  2011-05-29 11:34       ` Daniel Stodden
@ 2011-05-31 13:44       ` Dong, Eddie
  2011-05-31 16:23         ` Ian Campbell
  1 sibling, 1 reply; 33+ messages in thread
From: Dong, Eddie @ 2011-05-31 13:44 UTC (permalink / raw)
  To: Xen; +Cc: Dong, Eddie

A typo? The config file file is using nestedhvm=x.

Thx, Eddie




    Fix wrong help message for parameter nestedhvm.

    Signed-off-by: Eddie Dong <eddie.dong@intel.com>

diff -r 6489452ca865 tools/libxl/libxl.idl
--- a/tools/libxl/libxl.idl	Tue May 31 18:11:35 2011 +0800
+++ b/tools/libxl/libxl.idl	Tue May 31 21:34:52 2011 +0800
@@ -170,7 +170,7 @@
                                         ("hpet", bool),
                                         ("vpt_align", bool),
                                         ("timer_mode", integer),
-                                        ("nested_hvm", bool),
+                                        ("nestedhvm", bool),
                                         ])),
                  ("pv", "!%s", Struct(None,
                                        [("kernel", libxl_file_reference),

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-05-28 20:21   ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Daniel Stodden
  2011-05-29  8:09     ` Vincent, Pradeep
@ 2011-05-31 16:08     ` Konrad Rzeszutek Wilk
  2011-05-31 16:30       ` Daniel Stodden
  1 sibling, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-31 16:08 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: pradeepv, Xen

On Sat, May 28, 2011 at 01:21:10PM -0700, Daniel Stodden wrote:
> Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
> idea. It means that in-flight I/O is essentially blocking continued
> batches. This essentially kills throughput on frontends which unplug
> (or even just notify) early and rightfully assume addtional requests

You have any perf numbers?

> will be picked up on time, not synchronously.
> 
> Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
> ---
>  drivers/block/xen-blkback/blkback.c |   36 ++++++++++++++++++----------------
>  1 files changed, 19 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 9dee545..48ad7fa 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -451,7 +451,8 @@ static void end_block_io_op(struct bio *bio, int error)
>   * (which has the sectors we want, number of them, grant references, etc),
>   * and transmute  it to the block API to hand it over to the proper block disk.
>   */
> -static int do_block_io_op(struct xen_blkif *blkif)
> +static int
> +__do_block_io_op(struct xen_blkif *blkif)
>  {
>  	union blkif_back_rings *blk_rings = &blkif->blk_rings;
>  	struct blkif_request req;
> @@ -508,6 +509,23 @@ static int do_block_io_op(struct xen_blkif *blkif)
>  	return more_to_do;
>  }
>  
> +static int
> +do_block_io_op(blkif_t *blkif)
> +{
> +	blkif_back_rings_t *blk_rings = &blkif->blk_rings;
> +	int more_to_do;
> +
> +	do {
> +		more_to_do = __do_block_io_op(blkif);
> +		if (more_to_do)
> +			break;
> +
> +		RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> +	} while (more_to_do);
> +
> +	return more_to_do;
> +}
> +
>  /*
>   * Transmutation of the 'struct blkif_request' to a proper 'struct bio'
>   * and call the 'submit_bio' to pass it to the underlying storage.
> @@ -698,7 +716,6 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>  	struct blkif_response  resp;
>  	unsigned long     flags;
>  	union blkif_back_rings *blk_rings = &blkif->blk_rings;
> -	int more_to_do = 0;
>  	int notify;
>  
>  	resp.id        = id;
> @@ -725,22 +742,7 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>  	}
>  	blk_rings->common.rsp_prod_pvt++;
>  	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
> -	if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons) {
> -		/*
> -		 * Tail check for pending requests. Allows frontend to avoid
> -		 * notifications if requests are already in flight (lower
> -		 * overheads and promotes batching).
> -		 */
> -		RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> -
> -	} else if (RING_HAS_UNCONSUMED_REQUESTS(&blk_rings->common)) {
> -		more_to_do = 1;
> -	}
> -
>  	spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
> -
> -	if (more_to_do)
> -		blkif_notify_work(blkif);
>  	if (notify)
>  		notify_remote_via_irq(blkif->irq);
>  }
> -- 
> 1.7.4.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Fix wrong help message for parameter nestedhvm
  2011-05-31 13:44       ` Fix wrong help message for parameter nestedhvm Dong, Eddie
@ 2011-05-31 16:23         ` Ian Campbell
  0 siblings, 0 replies; 33+ messages in thread
From: Ian Campbell @ 2011-05-31 16:23 UTC (permalink / raw)
  To: Dong, Eddie; +Cc: Xen

(please don't reply to random threads in order to start a new thread, it
confuses threaded mailers.)

On Tue, 2011-05-31 at 14:44 +0100, Dong, Eddie wrote:
> A typo? The config file file is using nestedhvm=x.

This is the name of the struct member, not the configuration file option
nor a help message.

Perhaps it would be nicer if they were the same but there is no
requirement for that to be so. See xl_cmdimpl.c:parse_config_data():
        if (!xlu_cfg_get_long (config, "nestedhvm", &l))
            b_info->u.hvm.nested_hvm = l;

I don't think you can have compiled this patch!

Ian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-05-31 16:08     ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Konrad Rzeszutek Wilk
@ 2011-05-31 16:30       ` Daniel Stodden
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Stodden @ 2011-05-31 16:30 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: pradeepv, Xen

On Tue, 2011-05-31 at 12:08 -0400, Konrad Rzeszutek Wilk wrote:
> On Sat, May 28, 2011 at 01:21:10PM -0700, Daniel Stodden wrote:
> > Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
> > idea. It means that in-flight I/O is essentially blocking continued
> > batches. This essentially kills throughput on frontends which unplug
> > (or even just notify) early and rightfully assume addtional requests
> 
> You have any perf numbers?

This was on iSCSI. Single stripe, not a big array. Case of interest was
a simple linear dd, 1GB. Dom0 gets to around 70M/s on the raw device
node.

Guest before:

dd if=/dev/zero of=/dev/xvdc bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 24.5 s, 43.8 MB/s
1073741824 bytes (1.1 GB) copied, 23.9216 s, 44.9 MB/s
1073741824 bytes (1.1 GB) copied, 23.336 s, 46.0 MB/s

Guest after:

dd if=/dev/zero of=/dev/xvdc bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 17.0105 s, 63.1 MB/s
1073741824 bytes (1.1 GB) copied, 17.7566 s, 60.5 MB/s
1073741824 bytes (1.1 GB) copied, 17.163 s, 62.6 MB/s

This is not a silver bullet. With faster backends, there are more issues
with queue depth and request assembly, but one main blocker is what
Pradeep first described.

Daniel

> > will be picked up on time, not synchronously.
> > 
> > Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
> > ---
> >  drivers/block/xen-blkback/blkback.c |   36 ++++++++++++++++++----------------
> >  1 files changed, 19 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> > index 9dee545..48ad7fa 100644
> > --- a/drivers/block/xen-blkback/blkback.c
> > +++ b/drivers/block/xen-blkback/blkback.c
> > @@ -451,7 +451,8 @@ static void end_block_io_op(struct bio *bio, int error)
> >   * (which has the sectors we want, number of them, grant references, etc),
> >   * and transmute  it to the block API to hand it over to the proper block disk.
> >   */
> > -static int do_block_io_op(struct xen_blkif *blkif)
> > +static int
> > +__do_block_io_op(struct xen_blkif *blkif)
> >  {
> >  	union blkif_back_rings *blk_rings = &blkif->blk_rings;
> >  	struct blkif_request req;
> > @@ -508,6 +509,23 @@ static int do_block_io_op(struct xen_blkif *blkif)
> >  	return more_to_do;
> >  }
> >  
> > +static int
> > +do_block_io_op(blkif_t *blkif)
> > +{
> > +	blkif_back_rings_t *blk_rings = &blkif->blk_rings;
> > +	int more_to_do;
> > +
> > +	do {
> > +		more_to_do = __do_block_io_op(blkif);
> > +		if (more_to_do)
> > +			break;
> > +
> > +		RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> > +	} while (more_to_do);
> > +
> > +	return more_to_do;
> > +}
> > +
> >  /*
> >   * Transmutation of the 'struct blkif_request' to a proper 'struct bio'
> >   * and call the 'submit_bio' to pass it to the underlying storage.
> > @@ -698,7 +716,6 @@ static void make_response(struct xen_blkif *blkif, u64 id,
> >  	struct blkif_response  resp;
> >  	unsigned long     flags;
> >  	union blkif_back_rings *blk_rings = &blkif->blk_rings;
> > -	int more_to_do = 0;
> >  	int notify;
> >  
> >  	resp.id        = id;
> > @@ -725,22 +742,7 @@ static void make_response(struct xen_blkif *blkif, u64 id,
> >  	}
> >  	blk_rings->common.rsp_prod_pvt++;
> >  	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
> > -	if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons) {
> > -		/*
> > -		 * Tail check for pending requests. Allows frontend to avoid
> > -		 * notifications if requests are already in flight (lower
> > -		 * overheads and promotes batching).
> > -		 */
> > -		RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> > -
> > -	} else if (RING_HAS_UNCONSUMED_REQUESTS(&blk_rings->common)) {
> > -		more_to_do = 1;
> > -	}
> > -
> >  	spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
> > -
> > -	if (more_to_do)
> > -		blkif_notify_work(blkif);
> >  	if (notify)
> >  		notify_remote_via_irq(blkif->irq);
> >  }
> > -- 
> > 1.7.4.1
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-05-29 11:34       ` Daniel Stodden
@ 2011-06-01  8:02         ` Vincent, Pradeep
  2011-06-01  8:24           ` Jan Beulich
  2011-06-01 17:49           ` Daniel Stodden
  0 siblings, 2 replies; 33+ messages in thread
From: Vincent, Pradeep @ 2011-06-01  8:02 UTC (permalink / raw)
  To: Daniel Stodden, Jan Beulich; +Cc: Xen, konrad.wilk

Re: Opportunistic check in make_response

Looking through blkfront logic, req_prod is only updated in batches - so
you are right that given the current blkfront design the opportunistic
check in make_response doesn't reduce interrupt rate and the benefit will
be limited to the reduction of I/O latency by an amount less than the
interrupt latency. 

Having said that, I think batching req_prod update delays the processing
of the first I/O by blkback until all the I/Os in the batch have been
added to the I/O ring - potentially increasing I/O latencies. While batche
interrupt generation makes perfect sense, I don't see a solid reason for
req_prod not being updated when I/Os are processed by blkfront. Netfront
does increment req_prod as soon as it processes each skb (but it might
send an interrupt for each skb as well which isn't great).

I think it would make sense to enhance blkfront design to increment
req_prod as soon as it processes an I/O while batching irq generation.
When blkfront and blkback are busy processing continuous stream of I/O
requests, it would be great if blkfront-blkback pipeline is able to
process them without generating unnecessary interrupts while improving I/O
latency performance. Thoughts ? Any historical context on why this might
be bad ?

If blkfront does increment req_prod on every I/O submission, I think
opportunistic check in make_response would make sense since such a check
could trigger blkback to start processing requests well ahead of the
'batch' completion on blkfront side (just the check for
RING_HAS_UNCONSUMED_REQUESTS should suffice - other parts of what you
removed should go)


Re: Locking

I was reflecting Jan Beulich's comment earlier in this thread. Like I said
before (in this thread), the locking logic in blkback isn't obvious from
the code and the failure modes seem benign. If someone has good context on
blkback locking strategy, I would love to learn. Also it would be very
useful to add some comments around lock usage to the blkback code.

Jan ??

- Pradeep Vincent
 

On 5/29/11 4:34 AM, "Daniel Stodden" <daniel.stodden@citrix.com> wrote:

>On Sun, 2011-05-29 at 04:09 -0400, Vincent, Pradeep wrote:
>> Opportunistically avoiding interrupts by checking for I/Os in the flight
>> doesn't sound like a bad idea. I think the RING_HAS_UNCONSUMED_REQUESTS
>> call and what follows should be retained in 'make_response'.
>
>There's not much room for opportunism left here. After FINAL_CHECK
>returning with !_work_to_do you're going to receive an interrupt.
>Holding that notification off would kill performance.
>
>From there on, still leaving a duplicate check around end_io has only an
>infinitesimal  chance to preempt (none to prevent) the event reception.
>Even if it ever happens, the chance of making a difference in time to
>actual thread wake is probably even smaller.
>
>I think it's just overhead. If you disagree, this stuff is easy to prove
>or confute with event counting. Good luck :)
>
>> Also, should RING_FINAL_CHECK_FOR_REQUESTS be protected by
>>blk_ring_lock ?
>
>Nope. The ring lock is only needed to sync rsp production. Specifically,
>make_response upon request completion colliding with make_response
>called from the backend thread (the error case in do_block_io_op).
>
>Should rather be named rsp_lock or so, it doesn't lock anything except
>rsp_prod.
>
>Daniel
>
>> 
>> - Pradeep Vincent
>> 
>> 
>> On 5/28/11 1:21 PM, "Daniel Stodden" <daniel.stodden@citrix.com> wrote:
>> 
>> >Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
>> >idea. It means that in-flight I/O is essentially blocking continued
>> >batches. This essentially kills throughput on frontends which unplug
>> >(or even just notify) early and rightfully assume addtional requests
>> >will be picked up on time, not synchronously.
>> >
>> >Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
>> >---
>> > drivers/block/xen-blkback/blkback.c |   36
>> >++++++++++++++++++----------------
>> > 1 files changed, 19 insertions(+), 17 deletions(-)
>> >
>> >diff --git a/drivers/block/xen-blkback/blkback.c
>> >b/drivers/block/xen-blkback/blkback.c
>> >index 9dee545..48ad7fa 100644
>> >--- a/drivers/block/xen-blkback/blkback.c
>> >+++ b/drivers/block/xen-blkback/blkback.c
>> >@@ -451,7 +451,8 @@ static void end_block_io_op(struct bio *bio, int
>> >error)
>> >  * (which has the sectors we want, number of them, grant references,
>> >etc),
>> >  * and transmute  it to the block API to hand it over to the proper
>> >block disk.
>> >  */
>> >-static int do_block_io_op(struct xen_blkif *blkif)
>> >+static int
>> >+__do_block_io_op(struct xen_blkif *blkif)
>> > {
>> >     union blkif_back_rings *blk_rings = &blkif->blk_rings;
>> >     struct blkif_request req;
>> >@@ -508,6 +509,23 @@ static int do_block_io_op(struct xen_blkif *blkif)
>> >     return more_to_do;
>> > }
>> > 
>> >+static int
>> >+do_block_io_op(blkif_t *blkif)
>> >+{
>> >+    blkif_back_rings_t *blk_rings = &blkif->blk_rings;
>> >+    int more_to_do;
>> >+
>> >+    do {
>> >+        more_to_do = __do_block_io_op(blkif);
>> >+        if (more_to_do)
>> >+            break;
>> >+
>> >+        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
>> >+    } while (more_to_do);
>> >+
>> >+    return more_to_do;
>> >+}
>> >+
>> > /*
>> >  * Transmutation of the 'struct blkif_request' to a proper 'struct
>>bio'
>> >  * and call the 'submit_bio' to pass it to the underlying storage.
>> >@@ -698,7 +716,6 @@ static void make_response(struct xen_blkif *blkif,
>> >u64 id,
>> >     struct blkif_response  resp;
>> >     unsigned long     flags;
>> >     union blkif_back_rings *blk_rings = &blkif->blk_rings;
>> >-    int more_to_do = 0;
>> >     int notify;
>> > 
>> >     resp.id        = id;
>> >@@ -725,22 +742,7 @@ static void make_response(struct xen_blkif *blkif,
>> >u64 id,
>> >     }
>> >     blk_rings->common.rsp_prod_pvt++;
>> >     RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
>> >-    if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons)
>>{
>> >-        /*
>> >-         * Tail check for pending requests. Allows frontend to avoid
>> >-         * notifications if requests are already in flight (lower
>> >-         * overheads and promotes batching).
>> >-         */
>> >-        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
>> >-
>> >-    } else if (RING_HAS_UNCONSUMED_REQUESTS(&blk_rings->common)) {
>> >-        more_to_do = 1;
>> >-    }
>> >-
>> >     spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
>> >-
>> >-    if (more_to_do)
>> >-        blkif_notify_work(blkif);
>> >     if (notify)
>> >         notify_remote_via_irq(blkif->irq);
>> > }
>> >-- 
>> >1.7.4.1
>> >
>> 
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-06-01  8:02         ` Vincent, Pradeep
@ 2011-06-01  8:24           ` Jan Beulich
  2011-06-01 17:49           ` Daniel Stodden
  1 sibling, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2011-06-01  8:24 UTC (permalink / raw)
  To: Pradeep Vincent; +Cc: Xen, konrad.wilk, Daniel Stodden

>>> On 01.06.11 at 10:02, "Vincent, Pradeep" <pradeepv@amazon.com> wrote:
> Re: Locking
> 
> I was reflecting Jan Beulich's comment earlier in this thread. Like I said
> before (in this thread), the locking logic in blkback isn't obvious from
> the code and the failure modes seem benign. If someone has good context on
> blkback locking strategy, I would love to learn. Also it would be very
> useful to add some comments around lock usage to the blkback code.
> 
> Jan ??

No, not really - I'm more of an observer on this and similar
discussions, just trying to learn as much as I can. Now and then I
do notice possible issues, but as the design and the reasons for
the choices made aren't (afaik, and like you also say) documented,
a lot of room for guessing is usually left.

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-06-01  8:02         ` Vincent, Pradeep
  2011-06-01  8:24           ` Jan Beulich
@ 2011-06-01 17:49           ` Daniel Stodden
  2011-06-01 18:07             ` Daniel Stodden
  2011-06-27 14:03             ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 33+ messages in thread
From: Daniel Stodden @ 2011-06-01 17:49 UTC (permalink / raw)
  To: Vincent, Pradeep; +Cc: Xen, Jan Beulich, konrad.wilk

On Wed, 2011-06-01 at 04:02 -0400, Vincent, Pradeep wrote:
> Re: Opportunistic check in make_response
> 
> Looking through blkfront logic, req_prod is only updated in batches - so
> you are right that given the current blkfront design the opportunistic
> check in make_response doesn't reduce interrupt rate and the benefit will
> be limited to the reduction of I/O latency by an amount less than the
> interrupt latency. 

Yup.

> Having said that, I think batching req_prod update delays the processing
> of the first I/O by blkback until all the I/Os in the batch have been
> added to the I/O ring - potentially increasing I/O latencies. While batche
> interrupt generation makes perfect sense, I don't see a solid reason for
> req_prod not being updated when I/Os are processed by blkfront. Netfront
> does increment req_prod as soon as it processes each skb (but it might
> send an interrupt for each skb as well which isn't great).
> 
> I think it would make sense to enhance blkfront design to increment
> req_prod as soon as it processes an I/O while batching irq generation.
> When blkfront and blkback are busy processing continuous stream of I/O
> requests, it would be great if blkfront-blkback pipeline is able to
> process them without generating unnecessary interrupts while improving I/O
> latency performance. 


> Thoughts ? Any historical context on why this might
> be bad ?

No, and it's a good idea, and your assumptions are imho correct.

The extreme case is an PUSH_REQUESTS_AND_CHECK_NOTIFY after each
request. Even in this case, the majority of interrupts should be held
off by looking at req_event. There are certainly variants, but I can
show you some results for what you're asking. because I happened to try
that just last week. 

Next consider adding a couple event counters in the backend. And you
need the patch above, of course.

req_event: Frontend event received
rsp_event: Frontend notification sent
req_again: FINAL_CHECK indicated more_to_do.

boost-1, order=0:

(This is the unmodified version)

dd if=/dev/zero of=/dev/xvdc bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 17.0105 s, 63.1 MB/s
1073741824 bytes (1.1 GB) copied, 17.7566 s, 60.5 MB/s
1073741824 bytes (1.1 GB) copied, 17.163 s, 62.6 MB/s

rsp_event 6759
req_event 6753
req_again 16

boost-2, order=0

(This was the aggressive one, one PUSH_NOTIFY per ring request).

dd if=/dev/zero of=/dev/xvdc bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 17.3208 s, 62.0 MB/s
1073741824 bytes (1.1 GB) copied, 17.4851 s, 61.4 MB/s
1073741824 bytes (1.1 GB) copied, 17.7333 s, 60.5 MB/s

rsp_event 7122
req_event 7141
req_again 5497


So the result is that the event load even in the most aggressive case
will typically increase only moderately. Instead, the restored outer
loop in the dispatcher just starts to play out.

I'm not proposing this as the final solution, we might chose to be more
careful and limit event emission to some stride instead.

Don't be confused by the throughput values not going up, the problem I
had with the array (iSCSI in this case) just turned out to be elsewhere.
I'm pretty convinced there are workload/storage configurations which
benefit from that.

In the case at hand, increasing the ring size was way more productive.
At which point the queue depth multiplies as well. And I currently
expect that the longer it gets the more urgent the issue you describe
will be.

But I also think it needs some experiments and wants to be backed by
numbers.

Daniel

> If blkfront does increment req_prod on every I/O submission, I think
> opportunistic check in make_response would make sense since such a check
> could trigger blkback to start processing requests well ahead of the
> 'batch' completion on blkfront side (just the check for
> RING_HAS_UNCONSUMED_REQUESTS should suffice - other parts of what you
> removed should go)
> 
> 
> Re: Locking
> 
> I was reflecting Jan Beulich's comment earlier in this thread. Like I said
> before (in this thread), the locking logic in blkback isn't obvious from
> the code and the failure modes seem benign. If someone has good context on
> blkback locking strategy, I would love to learn. Also it would be very
> useful to add some comments around lock usage to the blkback code.
> 
> Jan ??
> 
> - Pradeep Vincent
>  
> 
> On 5/29/11 4:34 AM, "Daniel Stodden" <daniel.stodden@citrix.com> wrote:
> 
> >On Sun, 2011-05-29 at 04:09 -0400, Vincent, Pradeep wrote:
> >> Opportunistically avoiding interrupts by checking for I/Os in the flight
> >> doesn't sound like a bad idea. I think the RING_HAS_UNCONSUMED_REQUESTS
> >> call and what follows should be retained in 'make_response'.
> >
> >There's not much room for opportunism left here. After FINAL_CHECK
> >returning with !_work_to_do you're going to receive an interrupt.
> >Holding that notification off would kill performance.
> >
> >From there on, still leaving a duplicate check around end_io has only an
> >infinitesimal  chance to preempt (none to prevent) the event reception.
> >Even if it ever happens, the chance of making a difference in time to
> >actual thread wake is probably even smaller.
> >
> >I think it's just overhead. If you disagree, this stuff is easy to prove
> >or confute with event counting. Good luck :)
> >
> >> Also, should RING_FINAL_CHECK_FOR_REQUESTS be protected by
> >>blk_ring_lock ?
> >
> >Nope. The ring lock is only needed to sync rsp production. Specifically,
> >make_response upon request completion colliding with make_response
> >called from the backend thread (the error case in do_block_io_op).
> >
> >Should rather be named rsp_lock or so, it doesn't lock anything except
> >rsp_prod.
> >
> >Daniel
> >
> >> 
> >> - Pradeep Vincent
> >> 
> >> 
> >> On 5/28/11 1:21 PM, "Daniel Stodden" <daniel.stodden@citrix.com> wrote:
> >> 
> >> >Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
> >> >idea. It means that in-flight I/O is essentially blocking continued
> >> >batches. This essentially kills throughput on frontends which unplug
> >> >(or even just notify) early and rightfully assume addtional requests
> >> >will be picked up on time, not synchronously.
> >> >
> >> >Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
> >> >---
> >> > drivers/block/xen-blkback/blkback.c |   36
> >> >++++++++++++++++++----------------
> >> > 1 files changed, 19 insertions(+), 17 deletions(-)
> >> >
> >> >diff --git a/drivers/block/xen-blkback/blkback.c
> >> >b/drivers/block/xen-blkback/blkback.c
> >> >index 9dee545..48ad7fa 100644
> >> >--- a/drivers/block/xen-blkback/blkback.c
> >> >+++ b/drivers/block/xen-blkback/blkback.c
> >> >@@ -451,7 +451,8 @@ static void end_block_io_op(struct bio *bio, int
> >> >error)
> >> >  * (which has the sectors we want, number of them, grant references,
> >> >etc),
> >> >  * and transmute  it to the block API to hand it over to the proper
> >> >block disk.
> >> >  */
> >> >-static int do_block_io_op(struct xen_blkif *blkif)
> >> >+static int
> >> >+__do_block_io_op(struct xen_blkif *blkif)
> >> > {
> >> >     union blkif_back_rings *blk_rings = &blkif->blk_rings;
> >> >     struct blkif_request req;
> >> >@@ -508,6 +509,23 @@ static int do_block_io_op(struct xen_blkif *blkif)
> >> >     return more_to_do;
> >> > }
> >> > 
> >> >+static int
> >> >+do_block_io_op(blkif_t *blkif)
> >> >+{
> >> >+    blkif_back_rings_t *blk_rings = &blkif->blk_rings;
> >> >+    int more_to_do;
> >> >+
> >> >+    do {
> >> >+        more_to_do = __do_block_io_op(blkif);
> >> >+        if (more_to_do)
> >> >+            break;
> >> >+
> >> >+        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> >> >+    } while (more_to_do);
> >> >+
> >> >+    return more_to_do;
> >> >+}
> >> >+
> >> > /*
> >> >  * Transmutation of the 'struct blkif_request' to a proper 'struct
> >>bio'
> >> >  * and call the 'submit_bio' to pass it to the underlying storage.
> >> >@@ -698,7 +716,6 @@ static void make_response(struct xen_blkif *blkif,
> >> >u64 id,
> >> >     struct blkif_response  resp;
> >> >     unsigned long     flags;
> >> >     union blkif_back_rings *blk_rings = &blkif->blk_rings;
> >> >-    int more_to_do = 0;
> >> >     int notify;
> >> > 
> >> >     resp.id        = id;
> >> >@@ -725,22 +742,7 @@ static void make_response(struct xen_blkif *blkif,
> >> >u64 id,
> >> >     }
> >> >     blk_rings->common.rsp_prod_pvt++;
> >> >     RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
> >> >-    if (blk_rings->common.rsp_prod_pvt == blk_rings->common.req_cons)
> >>{
> >> >-        /*
> >> >-         * Tail check for pending requests. Allows frontend to avoid
> >> >-         * notifications if requests are already in flight (lower
> >> >-         * overheads and promotes batching).
> >> >-         */
> >> >-        RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
> >> >-
> >> >-    } else if (RING_HAS_UNCONSUMED_REQUESTS(&blk_rings->common)) {
> >> >-        more_to_do = 1;
> >> >-    }
> >> >-
> >> >     spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
> >> >-
> >> >-    if (more_to_do)
> >> >-        blkif_notify_work(blkif);
> >> >     if (notify)
> >> >         notify_remote_via_irq(blkif->irq);
> >> > }
> >> >-- 
> >> >1.7.4.1
> >> >
> >> 
> >
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-06-01 17:49           ` Daniel Stodden
@ 2011-06-01 18:07             ` Daniel Stodden
  2011-06-27 14:03             ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 33+ messages in thread
From: Daniel Stodden @ 2011-06-01 18:07 UTC (permalink / raw)
  To: Vincent, Pradeep; +Cc: Xen, Jan Beulich, konrad.wilk

[-- Attachment #1: Type: text/plain, Size: 3058 bytes --]


I attached the event counting patch. It should make sense, but it was
against XCP, so it may not apply all smoothly for you. If you really
want to play with it, and manage to find more interesting
storage/system/patch combinations than the one quoted below, that'd
probaby be really useful.

Daniel

On Wed, 2011-06-01 at 13:49 -0400, Daniel Stodden wrote:
> > 
> > I think it would make sense to enhance blkfront design to increment
> > req_prod as soon as it processes an I/O while batching irq generation.
> > When blkfront and blkback are busy processing continuous stream of I/O
> > requests, it would be great if blkfront-blkback pipeline is able to
> > process them without generating unnecessary interrupts while improving I/O
> > latency performance. 
> 
> 
> > Thoughts ? Any historical context on why this might
> > be bad ?
> 
> No, and it's a good idea, and your assumptions are imho correct.
> 
> The extreme case is an PUSH_REQUESTS_AND_CHECK_NOTIFY after each
> request. Even in this case, the majority of interrupts should be held
> off by looking at req_event. There are certainly variants, but I can
> show you some results for what you're asking. because I happened to try
> that just last week. 
> 
> Next consider adding a couple event counters in the backend. And you
> need the patch above, of course.
> 
> req_event: Frontend event received
> rsp_event: Frontend notification sent
> req_again: FINAL_CHECK indicated more_to_do.

> boost-1, order=0:
> 
> (This is the unmodified version)
> 
> dd if=/dev/zero of=/dev/xvdc bs=1M count=1024
> 1073741824 bytes (1.1 GB) copied, 17.0105 s, 63.1 MB/s
> 1073741824 bytes (1.1 GB) copied, 17.7566 s, 60.5 MB/s
> 1073741824 bytes (1.1 GB) copied, 17.163 s, 62.6 MB/s
> 
> rsp_event 6759
> req_event 6753
> req_again 16
> 
> boost-2, order=0
> 
> (This was the aggressive one, one PUSH_NOTIFY per ring request).
> 
> dd if=/dev/zero of=/dev/xvdc bs=1M count=1024
> 1073741824 bytes (1.1 GB) copied, 17.3208 s, 62.0 MB/s
> 1073741824 bytes (1.1 GB) copied, 17.4851 s, 61.4 MB/s
> 1073741824 bytes (1.1 GB) copied, 17.7333 s, 60.5 MB/s
> 
> rsp_event 7122
> req_event 7141
> req_again 5497
> 
> 
> So the result is that the event load even in the most aggressive case
> will typically increase only moderately. Instead, the restored outer
> loop in the dispatcher just starts to play out.
> 
> I'm not proposing this as the final solution, we might chose to be more
> careful and limit event emission to some stride instead.
> 
> Don't be confused by the throughput values not going up, the problem I
> had with the array (iSCSI in this case) just turned out to be elsewhere.
> I'm pretty convinced there are workload/storage configurations which
> benefit from that.
> 
> In the case at hand, increasing the ring size was way more productive.
> At which point the queue depth multiplies as well. And I currently
> expect that the longer it gets the more urgent the issue you describe
> will be.
> 
> But I also think it needs some experiments and wants to be backed by
> numbers.
> 
> Daniel


[-- Attachment #2: blkback-final-check-stats.diff --]
[-- Type: text/x-patch, Size: 2262 bytes --]

# HG changeset patch
# Parent 9c6a7fdebc7fcb76421a8f053be7f5dfcf30c741

diff -r 9c6a7fdebc7f drivers/xen/blkback/blkback.c
--- a/drivers/xen/blkback/blkback.c	Fri May 27 15:32:58 2011 -0700
+++ b/drivers/xen/blkback/blkback.c	Fri May 27 15:33:39 2011 -0700
@@ -543,6 +543,8 @@
 	if (blkif->xenblkd)
 		wake_up_process(blkif->xenblkd);
 
+	blkif->st_req_event++;
+
 	return IRQ_HANDLED;
 }
 
@@ -641,6 +643,8 @@
 			break;
 
 		RING_FINAL_CHECK_FOR_REQUESTS(&blk_rings->common, more_to_do);
+		if (more_to_do)
+			blkif->st_req_again++;
 	} while (more_to_do);
 
 	return more_to_do;
@@ -862,8 +866,10 @@
 	blk_rings->common.rsp_prod_pvt++;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
 	spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
-	if (notify)
+	if (notify) {
 		notify_remote_via_irq(blkif->irq);
+		blkif->st_rsp_event++;
+	}
 }
 
 static int __init blkif_init(void)
diff -r 9c6a7fdebc7f drivers/xen/blkback/common.h
--- a/drivers/xen/blkback/common.h	Fri May 27 15:32:58 2011 -0700
+++ b/drivers/xen/blkback/common.h	Fri May 27 15:33:39 2011 -0700
@@ -117,6 +117,10 @@
 	s64                 st_wr_sum_usecs;
 	s64                 st_wr_max_usecs;
 
+	u64                 st_req_event;
+	u64		    st_req_again;
+	u64                 st_rsp_event;
+
 	unsigned int   nr_shared_pages;
 	grant_handle_t shmem_handle[BLKIF_MAX_RING_PAGES];
 } blkif_t;
diff -r 9c6a7fdebc7f drivers/xen/blkback/xenbus.c
--- a/drivers/xen/blkback/xenbus.c	Fri May 27 15:32:58 2011 -0700
+++ b/drivers/xen/blkback/xenbus.c	Fri May 27 15:33:39 2011 -0700
@@ -337,6 +337,9 @@
 VBD_SHOW(br_req,  "%d\n", be->blkif->st_br_req);
 VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect);
 VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect);
+VBD_SHOW(req_event, "%llu\n", be->blkif->st_req_event);
+VBD_SHOW(rsp_event, "%llu\n", be->blkif->st_rsp_event);
+VBD_SHOW(req_again, "%llu\n", be->blkif->st_req_again);
 VBD_SHOW_AVG(rd_usecs, be->blkif->st_rd_cnt,
 	     be->blkif->st_rd_sum_usecs, be->blkif->st_rd_max_usecs);
 VBD_SHOW_AVG(wr_usecs, be->blkif->st_wr_cnt,
@@ -351,6 +354,9 @@
 	&dev_attr_wr_sect.attr,
 	&dev_attr_rd_usecs.attr,
 	&dev_attr_wr_usecs.attr,
+	&dev_attr_req_event.attr,
+	&dev_attr_rsp_event.attr,
+	&dev_attr_req_again.attr,
 	NULL
 };
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-06-01 17:49           ` Daniel Stodden
  2011-06-01 18:07             ` Daniel Stodden
@ 2011-06-27 14:03             ` Konrad Rzeszutek Wilk
  2011-06-27 18:42               ` Daniel Stodden
  1 sibling, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-06-27 14:03 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Vincent, Pradeep, Xen, Jan Beulich

> In the case at hand, increasing the ring size was way more productive.
> At which point the queue depth multiplies as well. And I currently
> expect that the longer it gets the more urgent the issue you describe
> will be.

You wouldn't have patches for that somewhere tucked away? I am going over
the patches for 3.1 xen-blkback and was thinking to have them all queued up and
test them all at once..

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-06-27 14:03             ` Konrad Rzeszutek Wilk
@ 2011-06-27 18:42               ` Daniel Stodden
  2011-06-27 19:13                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Stodden @ 2011-06-27 18:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Vincent, Pradeep, Xen, Jan Beulich

On Mon, 2011-06-27 at 10:03 -0400, Konrad Rzeszutek Wilk wrote:
> > In the case at hand, increasing the ring size was way more productive.
> > At which point the queue depth multiplies as well. And I currently
> > expect that the longer it gets the more urgent the issue you describe
> > will be.
> 
> You wouldn't have patches for that somewhere tucked away? I am going over
> the patches for 3.1 xen-blkback and was thinking to have them all queued up and
> test them all at once..

I was going to send the kernel patch right after, just to discover that
xen-blkback lacks some of the synchronization items the original one was
based on. It's coming, but it's rather going to be a series.

Daniel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-06-27 18:42               ` Daniel Stodden
@ 2011-06-27 19:13                 ` Konrad Rzeszutek Wilk
  2011-06-28  0:31                   ` Daniel Stodden
  0 siblings, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-06-27 19:13 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Vincent, Pradeep, Xen, Jan Beulich

On Mon, Jun 27, 2011 at 11:42:28AM -0700, Daniel Stodden wrote:
> On Mon, 2011-06-27 at 10:03 -0400, Konrad Rzeszutek Wilk wrote:
> > > In the case at hand, increasing the ring size was way more productive.
> > > At which point the queue depth multiplies as well. And I currently
> > > expect that the longer it gets the more urgent the issue you describe
> > > will be.
> > 
> > You wouldn't have patches for that somewhere tucked away? I am going over
> > the patches for 3.1 xen-blkback and was thinking to have them all queued up and
> > test them all at once..
> 
> I was going to send the kernel patch right after, just to discover that
> xen-blkback lacks some of the synchronization items the original one was
> based on. It's coming, but it's rather going to be a series.

That is fine. Please also CC lkml when posting the series. Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-06-27 19:13                 ` Konrad Rzeszutek Wilk
@ 2011-06-28  0:31                   ` Daniel Stodden
  2011-06-28 13:19                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Stodden @ 2011-06-28  0:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Vincent, Pradeep, Xen, Jan Beulich

On Mon, 2011-06-27 at 15:13 -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Jun 27, 2011 at 11:42:28AM -0700, Daniel Stodden wrote:
> > On Mon, 2011-06-27 at 10:03 -0400, Konrad Rzeszutek Wilk wrote:
> > > > In the case at hand, increasing the ring size was way more productive.
> > > > At which point the queue depth multiplies as well. And I currently
> > > > expect that the longer it gets the more urgent the issue you describe
> > > > will be.
> > > 
> > > You wouldn't have patches for that somewhere tucked away? I am going over
> > > the patches for 3.1 xen-blkback and was thinking to have them all queued up and
> > > test them all at once..
> > 
> > I was going to send the kernel patch right after, just to discover that
> > xen-blkback lacks some of the synchronization items the original one was
> > based on. It's coming, but it's rather going to be a series.
> 
> That is fine. Please also CC lkml when posting the series. Thanks!

That's a more interesting thing, actually: Do you plan to maintain this
stuff? Because xen-blkback presently has no dedicated MAINTAINERS entry,
iirc, so I guess it defaults to Jens.

It might indeed make more sense to collect tested batches, and submit
them as such.

Daniel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: [PATCH] xen/blkback: Don't let in-flight requests defer pending ones.
  2011-06-28  0:31                   ` Daniel Stodden
@ 2011-06-28 13:19                     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-06-28 13:19 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Vincent, Pradeep, Xen, Jan Beulich

On Mon, Jun 27, 2011 at 05:31:47PM -0700, Daniel Stodden wrote:
> On Mon, 2011-06-27 at 15:13 -0400, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jun 27, 2011 at 11:42:28AM -0700, Daniel Stodden wrote:
> > > On Mon, 2011-06-27 at 10:03 -0400, Konrad Rzeszutek Wilk wrote:
> > > > > In the case at hand, increasing the ring size was way more productive.
> > > > > At which point the queue depth multiplies as well. And I currently
> > > > > expect that the longer it gets the more urgent the issue you describe
> > > > > will be.
> > > > 
> > > > You wouldn't have patches for that somewhere tucked away? I am going over
> > > > the patches for 3.1 xen-blkback and was thinking to have them all queued up and
> > > > test them all at once..
> > > 
> > > I was going to send the kernel patch right after, just to discover that
> > > xen-blkback lacks some of the synchronization items the original one was
> > > based on. It's coming, but it's rather going to be a series.
> > 
> > That is fine. Please also CC lkml when posting the series. Thanks!
> 
> That's a more interesting thing, actually: Do you plan to maintain this

That is my plan.
> stuff? Because xen-blkback presently has no dedicated MAINTAINERS entry,

You are right. I should send a patch to explicitly state it, even thought this:

$scripts/get_maintainer.pl -f drivers/block/xen-blkback/
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> (commit_signer:31/32=97%)
Laszlo Ersek <lersek@redhat.com> (commit_signer:2/32=6%)
Jan Beulich <jbeulich@novell.com> (commit_signer:2/32=6%)
linux-kernel@vger.kernel.org (open list)

Kind of makes me the top choice.
> iirc, so I guess it defaults to Jens.

Well, everything under drivers/block _has_ to eventually go through Jens.
It can go first through xen-devel to make sure there is nothing bogus, and
be reviewed here. And I can collect the patches, stick them in a branch,
run through the Xen gauntlet tests and then ask Jens to GIT PULL them.

It does not hurt to additionaly go through LKML - more eyes the better.
> 
> It might indeed make more sense to collect tested batches, and submit
> them as such.

<nods> So far I've:

    xen/blkback: Don't let in-flight requests defer pending ones.

(#stable/for-jens)

And I wouldn't mind putting some more there before I start cranking some
tests.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2011-06-28 13:19 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-02  7:04 [PATCH] blkback: Fix block I/O latency issue Vincent, Pradeep
2011-05-02  8:13 ` Jan Beulich
2011-05-03  1:10   ` Vincent, Pradeep
2011-05-03 14:55     ` Konrad Rzeszutek Wilk
2011-05-03 17:16       ` Vincent, Pradeep
2011-05-03 17:51         ` Daniel Stodden
2011-05-03 23:41           ` Vincent, Pradeep
2011-05-03 17:52     ` Daniel Stodden
2011-05-04  1:54       ` Vincent, Pradeep
2011-05-09 20:24         ` Konrad Rzeszutek Wilk
2011-05-13  0:40           ` Vincent, Pradeep
2011-05-13  2:51             ` Konrad Rzeszutek Wilk
2011-05-16 15:22               ` Konrad Rzeszutek Wilk
2011-05-20  6:12                 ` Vincent, Pradeep
2011-05-24 16:02                   ` Konrad Rzeszutek Wilk
2011-05-24 22:40                     ` Vincent, Pradeep
2011-05-28 20:12 ` [RE-PATCH] " Daniel Stodden
2011-05-28 20:21   ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Daniel Stodden
2011-05-29  8:09     ` Vincent, Pradeep
2011-05-29 11:34       ` Daniel Stodden
2011-06-01  8:02         ` Vincent, Pradeep
2011-06-01  8:24           ` Jan Beulich
2011-06-01 17:49           ` Daniel Stodden
2011-06-01 18:07             ` Daniel Stodden
2011-06-27 14:03             ` Konrad Rzeszutek Wilk
2011-06-27 18:42               ` Daniel Stodden
2011-06-27 19:13                 ` Konrad Rzeszutek Wilk
2011-06-28  0:31                   ` Daniel Stodden
2011-06-28 13:19                     ` Konrad Rzeszutek Wilk
2011-05-31 13:44       ` Fix wrong help message for parameter nestedhvm Dong, Eddie
2011-05-31 16:23         ` Ian Campbell
2011-05-31 16:08     ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Konrad Rzeszutek Wilk
2011-05-31 16:30       ` Daniel Stodden

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.