From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: [RFC] How to fix broken freezing? Date: Fri, 6 Jan 2012 15:09:31 +0100 Message-ID: <20120106140931.GD20291@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Surbhi Palande , Kamal Mostafa , Christoph Hellwig , Dave Chinner , Al Viro To: linux-fsdevel@vger.kernel.org Return-path: Received: from cantor2.suse.de ([195.135.220.15]:38866 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752166Ab2AFOJg (ORCPT ); Fri, 6 Jan 2012 09:09:36 -0500 Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hello, I was looking at what causes filesystem to have dirty data after it is frozen. After some thought I realized freezing code is inherently racy and all filesystems (ext3, ext4, xfs) can have dirty data on frozen filesystem. The race is basically following: Task 1 Task 2 freeze_super() __generic_file_aio_write() ... vfs_check_frozen(sb, SB_FREEZE_WRITE) sb->s_frozen = SB_FREEZE_WRITE; sync_filesystem(sb); do the write /* Here we create dirty data * which is left on frozen fs */ sb->s_frozen = SB_FREEZE_TRANS; ... ->freeze_fs() The problem is that you can never make checking for frozen filesystem race-free with the current s_frozen scheme - the filesystem can always be frozen the instant after you check for it and you end up creating dirty data on frozen filesystem. The question is what to do with this problem. I outline the possibilities that come to my mind below: 1) Ignore the problem - depending on the exact fs details this could lead to fs snapshot being corrupted, also flusher thread can hang on the frozen filesystem (e.g. because of sync(1)) creating all sorts of secondary issues. So I don't think this is really an option. 2) Have a rwlock in the superblock that is held for writing while filesystem freezing is in progress and held for reading by the filesystem while a transaction is running except for transactions that are required to do writeback. This is kind of ugly but at least for ext3/4 relatively easy to implement. 3) Have the same rwlock but already VFS will take the lock in kernel entry points which modify a filesystem. Lot of these places is already guarded by mnt_want_write/mnt_drop_write pair so we could hook into it but there are entry points which use file descriptor and thus are not guarded by mnt_want_write/mnt_drop_write so we would have to modify these places. Note that this in particular also means ioctl calls and such so it won't be trivial to catch all the places. This approach looks the cleanest to me but it's quite some work and it's a bit fragile - requires all people adding an entry point modifying filesystem to think of fs freezing. What do people think about this? Any idea other idea how to solve the problem? Honza -- Jan Kara SUSE Labs, CR