From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759560AbYAWBAn (ORCPT ); Tue, 22 Jan 2008 20:00:43 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752724AbYAWBAW (ORCPT ); Tue, 22 Jan 2008 20:00:22 -0500 Received: from smtp103.mail.mud.yahoo.com ([209.191.85.213]:37232 "HELO smtp103.mail.mud.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751761AbYAWBAU (ORCPT ); Tue, 22 Jan 2008 20:00:20 -0500 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=Received:X-YMail-OSG:X-Yahoo-Newman-Property:From:To:Subject:Date:User-Agent:Cc:References:In-Reply-To:MIME-Version:Content-Type:Message-Id; b=sMzRpq2Nqg+uJWJ01vHeIw5RmQKbL1qo47oCXzD5eT1I8oRF45EZZsn5FY75441Sl2I8RTyH8eK3wLsIbVdYlxPRIcLi3/LbkU5KtCNO5k+sqz4SeLAi+6o5RY6XL6CvtTuqSZhXNseGlPFFyG8Avv3K3uPKy0HBMJSVCUAKDI0= ; X-YMail-OSG: 9FZRbVcVM1ko3TDf4Fkawr38rElfUeldwDvJmnGYar8Fz5tK1yaYT5Oe8EyOx1Ys03osH8sWIQ-- X-Yahoo-Newman-Property: ymail-3 From: Nick Piggin To: Jan Kara Subject: Re: [PATCH RESEND] Minimal fix for private_list handling races Date: Wed, 23 Jan 2008 12:00:02 +1100 User-Agent: KMail/1.9.5 Cc: linux-kernel@vger.kernel.org, Andrew Morton References: <20080122171025.GB4485@duck.suse.cz> In-Reply-To: <20080122171025.GB4485@duck.suse.cz> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_TGplH81UAXcs723" Message-Id: <200801231200.03016.nickpiggin@yahoo.com.au> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --Boundary-00=_TGplH81UAXcs723 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline On Wednesday 23 January 2008 04:10, Jan Kara wrote: > Hi, > > as I got no answer for a week, I'm resending this fix for races in > private_list handling. Andrew, do you like them more than the previous > version? FWIW, I reviewed this, and it looks OK although I think some comments would be in order. What would be really nice is to avoid the use of b_assoc_buffers completely in this function like I've attempted (untested). I don't know if you'd actually call that an improvement...? Couple of things I noticed while looking at this code. - What is osync_buffers_list supposed to do? I couldn't actually work it out. Why do we care about waiting for these buffers on here that were added while waiting for writeout of other buffers to finish? Can we just remove it completely? I must be missing something. - What are the get_bh(bh) things supposed to do? Protect the lifetime of a given bh while "lock" is dropped? That's nice, ignoring the fact that we brelse(bh) *before* taking the lock again... but isn't every single other buffer that we _have't_ elevated its reference exposed to exactly the same lifetime problem? IOW, either it is not required at all, or it is required for _all_ buffers? (my patch should fix this). Hmm, now I remember why I rewrote this file :P --Boundary-00=_TGplH81UAXcs723 Content-Type: text/x-diff; charset="iso-8859-1"; name="fs-fsync_buffers_list-fix.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="fs-fsync_buffers_list-fix.patch" Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -792,47 +792,53 @@ EXPORT_SYMBOL(__set_page_dirty_buffers); */ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list) { + struct buffer_head *batch[16]; + int i, idx, done; struct buffer_head *bh; - struct list_head tmp; int err = 0, err2; - INIT_LIST_HEAD(&tmp); - +again: spin_lock(lock); + idx = 0; while (!list_empty(list)) { bh = BH_ENTRY(list->next); __remove_assoc_queue(bh); if (buffer_dirty(bh) || buffer_locked(bh)) { - list_add(&bh->b_assoc_buffers, &tmp); - if (buffer_dirty(bh)) { - get_bh(bh); - spin_unlock(lock); - /* - * Ensure any pending I/O completes so that - * ll_rw_block() actually writes the current - * contents - it is a noop if I/O is still in - * flight on potentially older contents. - */ - ll_rw_block(SWRITE, 1, &bh); - brelse(bh); - spin_lock(lock); - } + batch[idx++] = bh; + get_bh(bh); } + + if (idx == 16) + break; } + done = list_empty(list); + spin_unlock(lock); - while (!list_empty(&tmp)) { - bh = BH_ENTRY(tmp.prev); - list_del_init(&bh->b_assoc_buffers); - get_bh(bh); - spin_unlock(lock); + for (i = 0; i < idx; i++) { + bh = batch[i]; + if (buffer_dirty(bh)) { + /* + * Ensure any pending I/O completes so + * that ll_rw_block() actually writes + * the current contents - it is a noop + * if I/O is still in flight on + * potentially older contents. + */ + ll_rw_block(SWRITE, 1, &bh); + } + } + for (i = 0; i < idx; i++) { + bh = batch[i]; wait_on_buffer(bh); if (!buffer_uptodate(bh)) err = -EIO; brelse(bh); - spin_lock(lock); } + + idx = 0; + if (!done) + goto again; - spin_unlock(lock); err2 = osync_buffers_list(lock, list); if (err) return err; --Boundary-00=_TGplH81UAXcs723--