On Tue, Apr 04 2017, Dave Chinner wrote: > On Mon, Apr 03, 2017 at 04:00:55PM +0200, Jan Kara wrote: >> On Sun 02-04-17 09:05:26, Dave Chinner wrote: >> > On Thu, Mar 30, 2017 at 12:12:31PM -0400, J. Bruce Fields wrote: >> > > On Thu, Mar 30, 2017 at 07:11:48AM -0400, Jeff Layton wrote: >> > > > On Thu, 2017-03-30 at 08:47 +0200, Jan Kara wrote: >> > > > > Because if above is acceptable we could make reported i_version to be a sum >> > > > > of "superblock crash counter" and "inode i_version". We increment >> > > > > "superblock crash counter" whenever we detect unclean filesystem shutdown. >> > > > > That way after a crash we are guaranteed each inode will report new >> > > > > i_version (the sum would probably have to look like "superblock crash >> > > > > counter" * 65536 + "inode i_version" so that we avoid reusing possible >> > > > > i_version numbers we gave away but did not write to disk but still...). >> > > > > Thoughts? >> > > >> > > How hard is this for filesystems to support? Do they need an on-disk >> > > format change to keep track of the crash counter? >> > >> > Yes. We'll need version counter in the superblock, and we'll need to >> > know what the increment semantics are. >> > >> > The big question is how do we know there was a crash? The only thing >> > a journalling filesystem knows at mount time is whether it is clean >> > or requires recovery. Filesystems can require recovery for many >> > reasons that don't involve a crash (e.g. root fs is never unmounted >> > cleanly, so always requires recovery). Further, some filesystems may >> > not even know there was a crash at mount time because their >> > architecture always leaves a consistent filesystem on disk (e.g. COW >> > filesystems).... >> >> What filesystems can or cannot easily do obviously differs. Ext4 has a >> recovery flag set in superblock on RW mount/remount and cleared on >> umount/RO remount. > > Even this doesn't help. A recent bug that was reported to the XFS > list - turns out that systemd can't remount-ro the root > filesystem sucessfully on shutdown because there are open write fds > on the root filesystem when it attempts the remount. So it just > reboots without a remount-ro. This uncovered a bug in grub in Filesystems could use register_reboot_notifier() to get a notification that even systemd cannot stuff-up. It could check for dirty data and, if there is none (which there shouldn't be if a sync happened), it does a single write to disk to update the superblock (or a single write to each disk... or something). md does this, because getting the root device to be marked read-only is even harder than getting the root filesystem to be remounted read-only. NeilBrown