From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH] blkback: Fix block I/O latency issue Date: Mon, 9 May 2011 16:24:03 -0400 Message-ID: <20110509202403.GA27755@dumpdata.com> References: <1304445176.22480.120.camel@agari.van.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Vincent, Pradeep" Cc: Jeremy Fitzhardinge , "xen-devel@lists.xensource.com" , Jan Beulich , Daniel Stodden List-Id: xen-devel@lists.xenproject.org On Tue, May 03, 2011 at 06:54:38PM -0700, Vincent, Pradeep wrote: > Hey Daniel, > > Thanks for your comments. > > >> The notification avoidance these macros implement does not promote > >>deliberate latency. This stuff is not dropping events or deferring guest > requests. > > It only avoids a gratuitious notification sent by the remote end in > cases where the local one didn't go to sleep yet, and therefore can > >>guarantee that it's going to process the message ASAP, right after > >>finishing what's still pending from the previous kick. > > If the design goal was to simply avoid unnecessary interrupts but not > delay I/Os, then blkback code has a bug. > > If the design goal was to delay the I/Os in order to reducing interrupt > rate, then I am arguing that the design introduces way too much latency > that affects many applications. > > Either way, this issue needs to be addressed. I agree we need to fix this. What I am curious is: - what are the workloads under which this patch has a negative effect. - I presume you have tested this in the production - what were the numbers when it came to high bandwith numbers (so imagine, four or six threads putting as much I/O as possible)? Did the level of IRQs go way up compared to not running with this patch? I am wondering if it might be worth looking in something NAPI-type in the block layer (so polling basically). The concern I've is that this patch would trigger a interrupt storm for small sized requests which might be happening at a high rate (say, 512 bytes random writes). But perhaps the way for this work is to have a ratelimiting code in it so that there is no chance of interrupt storms.