On Wednesday 23 October 2019 16:21:19 Chris Murphy wrote:
> On Wed, Oct 23, 2019 at 1:50 PM Pali Rohár <pali.rohar@gmail.com> wrote:
> >
> > Hi!
> >
> > On Wednesday 23 October 2019 02:10:50 Chris Murphy wrote:
> > > a. write bootloader file to a temp location
> > > b. fsync
> > > c. mv temp final
> > > d. fsync
> > >
> > > if the crash happens anywhere from before a. to just after c. the old
> > > configuration file is still present and old kernel+initramfs are used.
> > > No problem. If the crash happens well after c. probably the new one is
> > > in place, for sure after d. it's in place, and the new kernel+
> > > initramfs are used.
> >
> > I do not think that kernel guarantee for any filesystem that rename
> > operation would be atomic on underlying disk storage.
> >
> > But somebody else should confirm it.
> 
> I don't know either or how to confirm it.

Somebody who is watching linuxfs-devel and has deep knowledge in this
area... could provide more information.

> But, being ignorant about a
> great many things, my instinct is literal fsync (flush buffer to disk)
> should go away at the application level, and fsync should only be used
> to indicate write order and what is part of a "commit" that is to be
> atomic (completely succeeds or fails). And of course that can only be
> guaranteed as far as the kernel is concerned, it doesn't guarantee
> anything about how the hardware block device actually behaves (warts
> bugs and all).
> 
> Anyway it made me think of this:
> https://lwn.net/Articles/789600/
> 
> 
> > So if kernel crashes in the middle of c or between c and d you need to
> > repair filesystem externally prior trying to boot from such disk.
> 
> Nice in theory, but in practice the user simply reboots, and screams
> WTF outloud if the system face plants. And people wonder why things
> are still broken 20 years later with all the same kinds of problems
> and prescriptions to boot off some rescue media instead of it being
> fail safe by design. It's definitely not fail safe to have a kernel
> update that could possibly result in an unbootable system. I can't
> think of any ordinary server, cloud, desktop, mobile user who wants to
> have to boot from rescue media to do a simple repair. Of course they
> all just want to reboot and have the right thing always happen no
> matter what, otherwise they get so nervous about doing updates that
> they postpone them longer than they should.

Still, in any time when you improperly unmount filesystem you should
check for error, if you do not want to loose your data.

And critical area should have some "recovery" mechanism to repair broken
bootloader / kernel image.

Anyway, chance that kernel crashes at step when replacing old kernel
disk image by new one is low. So it should not be such big issue to need
to do external recovery.

> > > I'm not sure how to test the following: write kernel and initramfs to
> > > final locations. And bootloader configuration is written to a temp
> > > path. Then at the decision moment, rename it so that it goes from temp
> > > path to final path doing at most 1 sector change. 1 512 byte sector
> > > is a reasonable number to assume can be completely atomic for a
> > > system. I have no idea if FAT can do such a 'mv' event with only one
> > > sector change
> >
> > Theoretically it could be possible to implement it for FAT (with more
> > restrictions), but I doubt that general purpose implementation of any
> > filesystem in kernel can do such thing. So no practically.
> 
> Now I'm wondering what the UEFI spec says about this, and whether this
> problem was anticipated, and how surprised I should be if it wasn't
> anticipated.

I know that UEFI spec has reference for FAT filesystems to MS
specification (fagen103.doc). I do not know if it says anything about
filesystem details, but I guess it specify requirements, that
implementations must be compatible with FAT12, FAT16 and FAT32 according
to specification.

> > > GRUB has an option to blindly overwrite the 1024 byte contents of
> > > grubenv (no file system modification), that's pretty close to atomic.
> > > Most devices have physical sector bigger than 512 bytes. This write is
> > > done in the pre-boot environment for saving state like boot counts.
> >
> > This depends on grub's FAT implementation. As said I would be very
> > careful about such "atomic" writes. There are also some caches, include
> > hardware on-disk, etc...
> 
> GRUB doesn't use any file system driver for writes, ever. It uses a
> file system driver only to find out what two LBAs the "grubenv"
> occupies, and then blindly overwrites those two sectors to save state.
> There is no file system metadata update at all.

Yes, you are right. Looking at the code and grub's filesystem drivers
are read-only. No write support.

> >
> > > And add to the mix that I guess some UEFI firmware allow writing to
> > > FAT in the pre-boot environment?
> >
> > Yes, UEFI API allows you to write to disk devices. And UEFI fileystem
> > implementation can also supports writing to FAT fs.
> >
> > > I don't know if that's universally true. How do firmware handle a dirty bit being set?
> >
> > Bad implementation would ignore it. This is something which you should
> > expect.
> 
> Maybe a project for someone is to bake xfstests into an EFI program so
> we can start testing these firmware FAT drivers and see what we learn
> about how bad they are?

That is possible.

Also UEFI allows you to write our own UEFI filesystem drivers which
other UEFI programs and bootloaders can use.

-- 
Pali Rohár
pali.rohar@gmail.com