From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751702Ab1H3Fcf (ORCPT ); Tue, 30 Aug 2011 01:32:35 -0400 Received: from 173-166-109-252-newengland.hfc.comcastbusiness.net ([173.166.109.252]:35085 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750842Ab1H3Fcc (ORCPT ); Tue, 30 Aug 2011 01:32:32 -0400 Date: Tue, 30 Aug 2011 01:32:31 -0400 From: Christoph Hellwig To: Daniel Ehrenberg Cc: linux-kernel@vger.kernel.org Subject: Re: Approaches to making io_submit not block Message-ID: <20110830053231.GA1627@infradead.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 29, 2011 at 10:33:24AM -0700, Daniel Ehrenberg wrote: > - Blocking due to reading metadata. > Proposed solution: > Add a per-ioctx work queue to do metadata reads. It will be triggered > from the dio code: if in async mode, then get_block will be called > with an additional flag, meaning something like O_NONBLOCK on sockets. > File systems' get_block functions can implement this flag and return > -EAGAIN if a read from the underlying device would be necessary. (If > we're worried that EAGAIN might be used for other purposes in the > future, we could make a new errno for this purpose.) From a quick > glance at the code, it looks like this would not be too difficult to > add to ext4 for extent-based files, and support in other file systems > could be added gradually. If -EAGAIN is returned, then the struct dio > will be put on the work queue together with a description of what kind > of processing it was doing. The work queue only serves the metadata > request, and the rest of the request is served on the existing path. Let filesystems handle this. I've actually prototyped it in XFS, based on some pending work from Dave but at this point it's still butt ugly. > - Blocking for appends and writes to file holes due to the need for a > metadata write after the data write > Proposed solution: > Maintain a work queue for all appends and writes to file holes, which > executes the current code. No way. I've fixed this for XFS, and it's trivial without the need to queue them up. The only thing preventing appending writes to work is a flag to tell the dio layer to just do them, just like it already works for holes. (and more QA).