From mboxrd@z Thu Jan  1 00:00:00 1970
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: [PATCH] blkback: Fix block I/O latency issue
Date: Mon, 9 May 2011 16:24:03 -0400
Message-ID: <20110509202403.GA27755@dumpdata.com>
References: <1304445176.22480.120.camel@agari.van.xensource.com>
	<C9E5A81B.1376F%pradeepv@amazon.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Return-path: <xen-devel-bounces@lists.xensource.com>
Content-Disposition: inline
In-Reply-To: <C9E5A81B.1376F%pradeepv@amazon.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: "Vincent, Pradeep" <pradeepv@amazon.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>, Jan Beulich <JBeulich@novell.com>, Daniel Stodden <daniel.stodden@citrix.com>
List-Id: xen-devel@lists.xenproject.org

On Tue, May 03, 2011 at 06:54:38PM -0700, Vincent, Pradeep wrote:
> Hey Daniel,
>  
> Thanks for your comments.
>  
> >> The notification avoidance these macros implement does not promote
> >>deliberate latency. This stuff is not dropping events or deferring guest
> requests.
>  
> It only avoids a gratuitious notification sent by the remote end in
> cases where the local one didn't go to sleep yet, and therefore can
> >>guarantee that it's going to process the message ASAP, right after
> >>finishing what's still pending from the previous kick.
>  
> If the design goal was to simply avoid unnecessary interrupts but not
> delay I/Os, then blkback code has a bug.
> 
> If the design goal was to delay the I/Os in order to reducing interrupt
> rate, then I am arguing that the design introduces way too much latency
> that affects many applications.
> 
> Either way, this issue needs to be addressed.


I agree we need to fix this. What I am curious is:
 - what are the workloads under which this patch has a negative effect.
 - I presume you have tested this in the production - what were the numbers
   when it came to high bandwith numbers (so imagine, four or six threads
   putting as much I/O as possible)? Did the level of IRQs go way up
   compared to not running with this patch?


I am wondering if it might be worth looking in something NAPI-type in the
block layer (so polling basically). The concern I've is that this
patch would trigger a interrupt storm for small sized requests which might be
happening at a high rate (say, 512 bytes random writes).

But perhaps the way for this work is to have a ratelimiting code in it
so that there is no chance of interrupt storms.