From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Kleikamp Subject: Re: Race between __sync_single_inode() and LogFS garbage collector Date: Mon, 19 Feb 2007 17:05:55 -0600 Message-ID: <1171926356.9771.34.camel@kleikamp.austin.ibm.com> References: <20070219213150.GD7813@lazybastard.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-fsdevel@vger.kernel.org To: =?ISO-8859-1?Q?J=F6rn?= Engel Return-path: Received: from e4.ny.us.ibm.com ([32.97.182.144]:41428 "EHLO e4.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965521AbXBSXGF (ORCPT ); Mon, 19 Feb 2007 18:06:05 -0500 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e4.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l1JN63R5011286 for ; Mon, 19 Feb 2007 18:06:03 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.2) with ESMTP id l1JN63bm300790 for ; Mon, 19 Feb 2007 18:06:03 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l1JN620B003627 for ; Mon, 19 Feb 2007 18:06:02 -0500 In-Reply-To: <20070219213150.GD7813@lazybastard.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Mon, 2007-02-19 at 21:31 +0000, J=F6rn Engel wrote: > Looks like I really write the first log-structured filesystem for Lin= ux. > At least I can into a fairly arcane race that seems to be generic to = all > of them. >=20 > Writing when space is tight may involve calling the garbage collector= =2E > The garbage collector will iget() random inodes, either to verify if = a > block is valid or to copy the block around. At this point, all write= s > to LogFS are serialized. >=20 > __sync_single_inode() will first lock a random inode, then call > write_inode(), then unlock the inode. So we can get this: >=20 >=20 > __sync_single_inode() garbage collector > --------------------------------------------------------------------- > inode->i_state |=3D I_LOCK; ... > ... mutex_lock(&super->s_w_mutex); > write_inode(inode, wait); ... > ... iget(sb, ino); > mutex_lock(&super->s_w_mutex); ... > ... wait_on_inode(inode); > mutex_unlock(&super->s_w_mutex);=09 > ... =09 > ... > inode->i_state &=3D ~I_LOCK; >=20 >=20 > And once in a blue moon, those two will race for the same inode. As = far > as I can see, the race can only get fixed in two ways: > 1. Never iget() inside the garbage collector. That would require hav= ing > a private inode cache for LogFS. > 2. Synchonize __sync_single_inode() and the garbage collector somehow= =2E >=20 > Variant 1 would result in double caching for the same object, somethi= ng > I would like to avoid. So does anyone have suggestions how variant 2 > could be achieved? Essentially what I need is a way to say "don't sy= nc > any inodes right now, I'll be back in 5 milliseconds or so". It'd be nice if you could drop s_w_mutex when the garbage collector calls i_get(). Otherwise, you may be able to call ilookup5_nowait() in the garbage collector, and skip that inode if I_LOCK is set. >=20 > J=F6rn >=20 --=20 David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html