All of lore.kernel.org
 help / color / mirror / Atom feed
* Is rename(2) atomic on FAT?
@ 2019-10-21 19:57 Chris Murphy
  2019-10-21 21:44 ` Richard Weinberger
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2019-10-21 19:57 UTC (permalink / raw)
  To: Linux FS Devel

http://man7.org/linux/man-pages/man2/rename.2.html

Use case is atomically updating bootloader configuration on EFI System
partitions. Some bootloader implementations have configuration files
bigger than 512 bytes, which could possibly be torn on write. But I'm
also not sure what write order FAT uses.

1.
FAT32 file system is mounted at /boot/efi

2.
# echo "hello" > /boot/efi/tmp/test.txt
# mv /boot/efi/tmp/test.txt /boot/efi/EFI/fedora/

3.
When I strace the above mv command I get these lines:
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
"/boot/efi/EFI/fedora/", RENAME_NOREPLACE) = -1 EEXIST (File exists)
stat("/boot/efi/EFI/fedora/", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0
renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
"/boot/efi/EFI/fedora/test.txt", RENAME_NOREPLACE) = 0
lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
close(0)

I can't tell from documentation if renameat2() with flag
RENAME_NOREPLACE is atomic, assuming the file doesn't exist at
destination.

4.
Do it again exactly as before, small change
# echo "hello" > /boot/efi/tmp/test.txt
# mv /boot/efi/tmp/test.txt /boot/efi/EFI/fedora/

5.
The strace shows fallback to rename()

ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
"/boot/efi/EFI/fedora/", RENAME_NOREPLACE) = -1 EEXIST (File exists)
stat("/boot/efi/EFI/fedora/", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0
renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
"/boot/efi/EFI/fedora/test.txt", RENAME_NOREPLACE) = -1 EEXIST (File
exists)
lstat("/boot/efi/tmp/test.txt", {st_mode=S_IFREG|0700, st_size=7, ...}) = 0
newfstatat(AT_FDCWD, "/boot/efi/EFI/fedora/test.txt",
{st_mode=S_IFREG|0700, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
geteuid()                               = 0
rename("/boot/efi/tmp/test.txt", "/boot/efi/EFI/fedora/test.txt") = 0
lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
close(0)                                = 0


Per documentation that should be atomic. So the questions are, are
both atomic, or neither atomice, and if not what should be used to
ensure bootloader updates are atomic.

There are plausibly three kinds:

A. write a new file with file name that doesn't previously exist
B. write a new file with a new file name, then do a rename stomping on
the old one
C. overwrite an existing file

It seems C is risky. It probably isn't atomic and can't be made to be
atomic on FAT.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-21 19:57 Is rename(2) atomic on FAT? Chris Murphy
@ 2019-10-21 21:44 ` Richard Weinberger
  2019-10-22 10:54   ` Pali Rohár
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Weinberger @ 2019-10-21 21:44 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux FS Devel, Pali Rohár

Chris,

[CC'ing fsdevel and Pali]

On Mon, Oct 21, 2019 at 9:59 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> http://man7.org/linux/man-pages/man2/rename.2.html
>
> Use case is atomically updating bootloader configuration on EFI System
> partitions. Some bootloader implementations have configuration files
> bigger than 512 bytes, which could possibly be torn on write. But I'm
> also not sure what write order FAT uses.
>
> 1.
> FAT32 file system is mounted at /boot/efi
>
> 2.
> # echo "hello" > /boot/efi/tmp/test.txt
> # mv /boot/efi/tmp/test.txt /boot/efi/EFI/fedora/
>
> 3.
> When I strace the above mv command I get these lines:
> ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
> renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
> "/boot/efi/EFI/fedora/", RENAME_NOREPLACE) = -1 EEXIST (File exists)
> stat("/boot/efi/EFI/fedora/", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0
> renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
> "/boot/efi/EFI/fedora/test.txt", RENAME_NOREPLACE) = 0
> lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
> close(0)
>
> I can't tell from documentation if renameat2() with flag
> RENAME_NOREPLACE is atomic, assuming the file doesn't exist at
> destination.
>
> 4.
> Do it again exactly as before, small change
> # echo "hello" > /boot/efi/tmp/test.txt
> # mv /boot/efi/tmp/test.txt /boot/efi/EFI/fedora/
>
> 5.
> The strace shows fallback to rename()
>
> ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
> renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
> "/boot/efi/EFI/fedora/", RENAME_NOREPLACE) = -1 EEXIST (File exists)
> stat("/boot/efi/EFI/fedora/", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0
> renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
> "/boot/efi/EFI/fedora/test.txt", RENAME_NOREPLACE) = -1 EEXIST (File
> exists)
> lstat("/boot/efi/tmp/test.txt", {st_mode=S_IFREG|0700, st_size=7, ...}) = 0
> newfstatat(AT_FDCWD, "/boot/efi/EFI/fedora/test.txt",
> {st_mode=S_IFREG|0700, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
> geteuid()                               = 0
> rename("/boot/efi/tmp/test.txt", "/boot/efi/EFI/fedora/test.txt") = 0
> lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
> close(0)                                = 0
>
>
> Per documentation that should be atomic. So the questions are, are
> both atomic, or neither atomice, and if not what should be used to
> ensure bootloader updates are atomic.

According of my understanding of FAT rename() is not atomic at all.
It can downgrade to a hardlink. i.e. rename("foo", "bar") can result in having
both "foo" and "bar."
...or worse.

Pali has probably more input to share. :-)

> There are plausibly three kinds:
>
> A. write a new file with file name that doesn't previously exist
> B. write a new file with a new file name, then do a rename stomping on
> the old one
> C. overwrite an existing file
>
> It seems C is risky. It probably isn't atomic and can't be made to be
> atomic on FAT.
>
>
> --
> Chris Murphy

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-21 21:44 ` Richard Weinberger
@ 2019-10-22 10:54   ` Pali Rohár
  2019-10-23  0:10     ` Chris Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Pali Rohár @ 2019-10-22 10:54 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Chris Murphy, Linux FS Devel

Hi Chris!

The first question is what do you mean by "atomic". Either if is
"atomic" at process level, that any process which access filesystem see
consistent data at any time, or if by atomic you mean consistency of
filesystem on underlying block device itself, or you mean atomicity at
disk storage level.

On Monday 21 October 2019 23:44:25 Richard Weinberger wrote:
> Chris,
> 
> [CC'ing fsdevel and Pali]
> 
> On Mon, Oct 21, 2019 at 9:59 PM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > http://man7.org/linux/man-pages/man2/rename.2.html
> >
> > Use case is atomically updating bootloader configuration on EFI System
> > partitions. Some bootloader implementations have configuration files
> > bigger than 512 bytes, which could possibly be torn on write. But I'm
> > also not sure what write order FAT uses.
> >
> > 1.
> > FAT32 file system is mounted at /boot/efi
> >
> > 2.
> > # echo "hello" > /boot/efi/tmp/test.txt
> > # mv /boot/efi/tmp/test.txt /boot/efi/EFI/fedora/
> >
> > 3.
> > When I strace the above mv command I get these lines:
> > ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
> > renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
> > "/boot/efi/EFI/fedora/", RENAME_NOREPLACE) = -1 EEXIST (File exists)
> > stat("/boot/efi/EFI/fedora/", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0
> > renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
> > "/boot/efi/EFI/fedora/test.txt", RENAME_NOREPLACE) = 0
> > lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
> > close(0)
> >
> > I can't tell from documentation if renameat2() with flag
> > RENAME_NOREPLACE is atomic, assuming the file doesn't exist at
> > destination.

RENAME_NOREPLACE is atomic at VFS level, independently of used
filesystem. There is no race condition when multiple processes access
that directory at same time.

> > 4.
> > Do it again exactly as before, small change
> > # echo "hello" > /boot/efi/tmp/test.txt
> > # mv /boot/efi/tmp/test.txt /boot/efi/EFI/fedora/
> >
> > 5.
> > The strace shows fallback to rename()
> >
> > ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
> > renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
> > "/boot/efi/EFI/fedora/", RENAME_NOREPLACE) = -1 EEXIST (File exists)
> > stat("/boot/efi/EFI/fedora/", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0
> > renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD,
> > "/boot/efi/EFI/fedora/test.txt", RENAME_NOREPLACE) = -1 EEXIST (File
> > exists)
> > lstat("/boot/efi/tmp/test.txt", {st_mode=S_IFREG|0700, st_size=7, ...}) = 0
> > newfstatat(AT_FDCWD, "/boot/efi/EFI/fedora/test.txt",
> > {st_mode=S_IFREG|0700, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
> > geteuid()                               = 0
> > rename("/boot/efi/tmp/test.txt", "/boot/efi/EFI/fedora/test.txt") = 0
> > lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
> > close(0)                                = 0
> >
> >
> > Per documentation that should be atomic. So the questions are, are
> > both atomic, or neither atomice, and if not what should be used to
> > ensure bootloader updates are atomic.

At VFS level both are atomic independently of filesystem.

> According of my understanding of FAT rename() is not atomic at all.
> It can downgrade to a hardlink. i.e. rename("foo", "bar") can result in having
> both "foo" and "bar."
> ...or worse.

Generally rename() may really cause that at some period of time both
"foo" and "bar" may points to same inode. (But is this a really problem
for your scenario?)

But looking at vfat source code (file namei_vfat.c), both rename and
lookup operation are locked by mutex, so during rename operation there
should not be access to read directory and therefore race condition
should not be there (which would cause reading inconsistent directory
during rename operation).

If you want atomic rename of two files independently of filesystem, you
can use RENAME_EXCHANGE flag. It exchanges that two specified files
atomically, so there would not be that race condition like in rename()
that in some period of time both "foo" and "bar" would point to same
inode.


But... if you are asking for consistency and atomicity at filesystem
level (e.g. you turn off disk / power supply during rename operation)
then this is not atomic and probably it cannot be implemented. When FAT
filesystem is mounted (either by Windows or Linux kernel) it is marked
by "dirty" flag and later when doing unmount, "dirty" flag is cleared.

This is there to ensure that operations like rename were finished and
were not stopped/killed in between. So future when you read from FAT
filesystem you would know if it is in consistent state or not.

> Pali has probably more input to share. :-)
> 
> > There are plausibly three kinds:
> >
> > A. write a new file with file name that doesn't previously exist
> > B. write a new file with a new file name, then do a rename stomping on
> > the old one
> > C. overwrite an existing file
> >
> > It seems C is risky. It probably isn't atomic and can't be made to be
> > atomic on FAT.

Option C is really risky. Overwriting file means following operations:

1. truncate file to zero size
2. write first N blocks
3. write second N blocks
...
4. write last M blocks


Option B is a common practise. IIRC also config files in KDE are updated
in this way.

> >
> > --
> > Chris Murphy
> 

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-22 10:54   ` Pali Rohár
@ 2019-10-23  0:10     ` Chris Murphy
  2019-10-23 11:50       ` Pali Rohár
  2019-10-23 12:53       ` Colin Walters
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Murphy @ 2019-10-23  0:10 UTC (permalink / raw)
  To: Pali Rohár; +Cc: Richard Weinberger, Chris Murphy, Linux FS Devel

On Tue, Oct 22, 2019 at 12:54 PM Pali Rohár <pali.rohar@gmail.com> wrote:
>
> Hi Chris!
>
> The first question is what do you mean by "atomic". Either if is
> "atomic" at process level, that any process which access filesystem see
> consistent data at any time, or if by atomic you mean consistency of
> filesystem on underlying block device itself, or you mean atomicity at
> disk storage level.

Yeah, good question. It's a bit more complicated in reality, because
distros do things differently.

In the case of making kernel updates "atomic", it's to ensure only one
of two things happens: the old boot works or the new boot works. No
matter what, including a crash or power fail at any point during the
update. Possibly three or more files make up a "boot": kernel,
initramfs (could be more than one), and bootloader configuration. In
theory, the new kernel is written first, initramfs second, and only
once they are on stable media is the bootloader configuration file
modified, replaced, or newly written.

In the case of one kernel and initramfs, I'd have to believe no one is
doing a literal overwrite of those files (same inode). If there's a
crash or power fail, that kind of update almost certainly means an
unbootable system due to partial write of kernel or initramfs. So the
best practice for single kernel updating should be write out all new
files for kernel + initramfs, fsync, write out bootloader change to a
new file, fsync, then rename, fsync. (?)

For multiple kernels,  it doesn't matter if a crash happens anywhere
from new kernel being written to FAT, through initramfs, because the
old bootloader configuration still points to old kernel + initramfs.
But in multiple kernel distros, the bootloader configuration needs
modification or a new drop in scriptlet to point to the new
kernel+initramfs pair. And that needs to be completely atomic: write
new files to a tmp location, that way a crash won't matter. The tricky
part is to write out the bootloader configuration change such that it
can be an atomic operation.

a. write bootloader file to a temp location
b. fsync
c. mv temp final
d. fsync

if the crash happens anywhere from before a. to just after c. the old
configuration file is still present and old kernel+initramfs are used.
No problem. If the crash happens well after c. probably the new one is
in place, for sure after d. it's in place, and the new kernel+
initramfs are used.



> > According of my understanding of FAT rename() is not atomic at all.
> > It can downgrade to a hardlink. i.e. rename("foo", "bar") can result in having
> > both "foo" and "bar."
> > ...or worse.
>
> Generally rename() may really cause that at some period of time both
> "foo" and "bar" may points to same inode. (But is this a really problem
> for your scenario?)

Probably not. Either the old boot works or the new boot works.

There is a goofy thing that can happen on journaled file systems, were
file (kernel, initramfs, journalcdt) journal is updated but not normal
file system metadata, then a crash happens. In that case the
bootloader file system code can't do journal replay, and might fail to
find either old or new file intact.



>
> But looking at vfat source code (file namei_vfat.c), both rename and
> lookup operation are locked by mutex, so during rename operation there
> should not be access to read directory and therefore race condition
> should not be there (which would cause reading inconsistent directory
> during rename operation).
>
> If you want atomic rename of two files independently of filesystem, you
> can use RENAME_EXCHANGE flag. It exchanges that two specified files
> atomically, so there would not be that race condition like in rename()
> that in some period of time both "foo" and "bar" would point to same
> inode.

I'm not sure how to test the following: write kernel and initramfs to
final locations. And bootloader configuration is written to a temp
path. Then at the decision moment, rename it so that it goes from temp
path to final path doing at most 1 sector change. 1 512 byte sector
is a reasonable number to assume can be completely atomic for a
system. I have no idea if FAT can do such a 'mv' event with only one
sector change



>
>
> But... if you are asking for consistency and atomicity at filesystem
> level (e.g. you turn off disk / power supply during rename operation)
> then this is not atomic and probably it cannot be implemented. When FAT
> filesystem is mounted (either by Windows or Linux kernel) it is marked
> by "dirty" flag and later when doing unmount, "dirty" flag is cleared.

Right. And at least on UEFI and arm boards, it's not the linux kernel
that needs to read it right after a crash. It's the firmware's FAT
driver. I have no idea how they react to the dirty flag. Most distros
set /etc/fstab FS_PASSNO to 2, maybe it should be a 1, but in any case
if we boot something far enough along to get to user space fsck, the
dirty flag is cleaned up.

>
> This is there to ensure that operations like rename were finished and
> were not stopped/killed in between. So future when you read from FAT
> filesystem you would know if it is in consistent state or not.

GRUB has an option to blindly overwrite the 1024 byte contents of
grubenv (no file system modification), that's pretty close to atomic.
Most devices have physical sector bigger than 512 bytes. This write is
done in the pre-boot environment for saving state like boot counts.

And add to the mix that I guess some UEFI firmware allow writing to
FAT in the pre-boot environment? I don't know if that's universally
true. How do firmware handle a dirty bit being set? It's bad if the
firmware writes to such a file system anyway. But also bad if it can't
save state, now it's not possible to save boot attempts for fallback
purposes.


--
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23  0:10     ` Chris Murphy
@ 2019-10-23 11:50       ` Pali Rohár
  2019-10-23 14:21         ` Chris Murphy
  2019-10-23 12:53       ` Colin Walters
  1 sibling, 1 reply; 22+ messages in thread
From: Pali Rohár @ 2019-10-23 11:50 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Richard Weinberger, Linux FS Devel

Hi!

On Wednesday 23 October 2019 02:10:50 Chris Murphy wrote:
> a. write bootloader file to a temp location
> b. fsync
> c. mv temp final
> d. fsync
> 
> if the crash happens anywhere from before a. to just after c. the old
> configuration file is still present and old kernel+initramfs are used.
> No problem. If the crash happens well after c. probably the new one is
> in place, for sure after d. it's in place, and the new kernel+
> initramfs are used.

I do not think that kernel guarantee for any filesystem that rename
operation would be atomic on underlying disk storage.

But somebody else should confirm it.

So if kernel crashes in the middle of c or between c and d you need to
repair filesystem externally prior trying to boot from such disk.

> 
> > But looking at vfat source code (file namei_vfat.c), both rename and
> > lookup operation are locked by mutex, so during rename operation there
> > should not be access to read directory and therefore race condition
> > should not be there (which would cause reading inconsistent directory
> > during rename operation).
> >
> > If you want atomic rename of two files independently of filesystem, you
> > can use RENAME_EXCHANGE flag. It exchanges that two specified files
> > atomically, so there would not be that race condition like in rename()
> > that in some period of time both "foo" and "bar" would point to same
> > inode.
> 
> I'm not sure how to test the following: write kernel and initramfs to
> final locations. And bootloader configuration is written to a temp
> path. Then at the decision moment, rename it so that it goes from temp
> path to final path doing at most 1 sector change. 1 512 byte sector
> is a reasonable number to assume can be completely atomic for a
> system. I have no idea if FAT can do such a 'mv' event with only one
> sector change

Theoretically it could be possible to implement it for FAT (with more
restrictions), but I doubt that general purpose implementation of any
filesystem in kernel can do such thing. So no practically.

> >
> >
> > But... if you are asking for consistency and atomicity at filesystem
> > level (e.g. you turn off disk / power supply during rename operation)
> > then this is not atomic and probably it cannot be implemented. When FAT
> > filesystem is mounted (either by Windows or Linux kernel) it is marked
> > by "dirty" flag and later when doing unmount, "dirty" flag is cleared.
> 
> Right. And at least on UEFI and arm boards, it's not the linux kernel
> that needs to read it right after a crash. It's the firmware's FAT
> driver. I have no idea how they react to the dirty flag.

Those bootloader firmwares which just load & run bootloader practically
do not write anything to that FAT filesystem. In most cases their
implementation of FAT is read-only and very stupid. I doubt that there
is check for dirty flag.

I saw lot of commercial devices of different kind which can read & write
(backup) data to (FAT) SD card. And lot of time they were not able to
read FAT filesystem formatted by other tool, only by their (or by
in-device FAT formatted).

So such firmwares can be full of bugs and it really is not a good idea
to try booting bootloader from inconsistent FAT filesystem.

> Most distros
> set /etc/fstab FS_PASSNO to 2, maybe it should be a 1, but in any case
> if we boot something far enough along to get to user space fsck, the
> dirty flag is cleaned up.

fs_passno set to 2 should be fine. You need to set it to 1 only for root
device, on which is running linux system. All other disks which are not
needed for running linux system can have fs_passno set to 2.

> > This is there to ensure that operations like rename were finished and
> > were not stopped/killed in between. So future when you read from FAT
> > filesystem you would know if it is in consistent state or not.
> 
> GRUB has an option to blindly overwrite the 1024 byte contents of
> grubenv (no file system modification), that's pretty close to atomic.
> Most devices have physical sector bigger than 512 bytes. This write is
> done in the pre-boot environment for saving state like boot counts.

This depends on grub's FAT implementation. As said I would be very
careful about such "atomic" writes. There are also some caches, include
hardware on-disk, etc...

> And add to the mix that I guess some UEFI firmware allow writing to
> FAT in the pre-boot environment?

Yes, UEFI API allows you to write to disk devices. And UEFI fileystem
implementation can also supports writing to FAT fs.

> I don't know if that's universally true. How do firmware handle a dirty bit being set?

Bad implementation would ignore it. This is something which you should
expect.

> It's bad if the
> firmware writes to such a file system anyway. But also bad if it can't
> save state, now it's not possible to save boot attempts for fallback
> purposes.

The best is to always have fragile filesystem in consistent state. And
if it goes broken, repair it on external system prior trying to write to
it by some untrusted/broken/bad filesystem driver. This would prevent
data damage.

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23  0:10     ` Chris Murphy
  2019-10-23 11:50       ` Pali Rohár
@ 2019-10-23 12:53       ` Colin Walters
  2019-10-23 14:24         ` Chris Murphy
  1 sibling, 1 reply; 22+ messages in thread
From: Colin Walters @ 2019-10-23 12:53 UTC (permalink / raw)
  To: Chris Murphy, Pali Rohár; +Cc: Richard Weinberger, Linux FS Devel



On Tue, Oct 22, 2019, at 8:10 PM, Chris Murphy wrote:
>
> For multiple kernels,  it doesn't matter if a crash happens anywhere
> from new kernel being written to FAT, through initramfs, because the
> old bootloader configuration still points to old kernel + initramfs.
> But in multiple kernel distros, the bootloader configuration needs
> modification or a new drop in scriptlet to point to the new
> kernel+initramfs pair. And that needs to be completely atomic: write
> new files to a tmp location, that way a crash won't matter. The tricky
> part is to write out the bootloader configuration change such that it
> can be an atomic operation.

Related: https://github.com/ostreedev/ostree/issues/1951
There I'm proposing there to not try to fix this at the kernel/filesystem
level (since we can't do much on FAT, and even on real filesystems we
have the journaling-vs-bootloader issues), but instead create a protocol
between things writing bootloader data and the bootloaders to help
verify integrity.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23 11:50       ` Pali Rohár
@ 2019-10-23 14:21         ` Chris Murphy
  2019-10-23 17:16           ` Pali Rohár
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2019-10-23 14:21 UTC (permalink / raw)
  To: Pali Rohár; +Cc: Chris Murphy, Richard Weinberger, Linux FS Devel

On Wed, Oct 23, 2019 at 1:50 PM Pali Rohár <pali.rohar@gmail.com> wrote:
>
> Hi!
>
> On Wednesday 23 October 2019 02:10:50 Chris Murphy wrote:
> > a. write bootloader file to a temp location
> > b. fsync
> > c. mv temp final
> > d. fsync
> >
> > if the crash happens anywhere from before a. to just after c. the old
> > configuration file is still present and old kernel+initramfs are used.
> > No problem. If the crash happens well after c. probably the new one is
> > in place, for sure after d. it's in place, and the new kernel+
> > initramfs are used.
>
> I do not think that kernel guarantee for any filesystem that rename
> operation would be atomic on underlying disk storage.
>
> But somebody else should confirm it.

I don't know either or how to confirm it. But, being ignorant about a
great many things, my instinct is literal fsync (flush buffer to disk)
should go away at the application level, and fsync should only be used
to indicate write order and what is part of a "commit" that is to be
atomic (completely succeeds or fails). And of course that can only be
guaranteed as far as the kernel is concerned, it doesn't guarantee
anything about how the hardware block device actually behaves (warts
bugs and all).

Anyway it made me think of this:
https://lwn.net/Articles/789600/


> So if kernel crashes in the middle of c or between c and d you need to
> repair filesystem externally prior trying to boot from such disk.

Nice in theory, but in practice the user simply reboots, and screams
WTF outloud if the system face plants. And people wonder why things
are still broken 20 years later with all the same kinds of problems
and prescriptions to boot off some rescue media instead of it being
fail safe by design. It's definitely not fail safe to have a kernel
update that could possibly result in an unbootable system. I can't
think of any ordinary server, cloud, desktop, mobile user who wants to
have to boot from rescue media to do a simple repair. Of course they
all just want to reboot and have the right thing always happen no
matter what, otherwise they get so nervous about doing updates that
they postpone them longer than they should.

> > I'm not sure how to test the following: write kernel and initramfs to
> > final locations. And bootloader configuration is written to a temp
> > path. Then at the decision moment, rename it so that it goes from temp
> > path to final path doing at most 1 sector change. 1 512 byte sector
> > is a reasonable number to assume can be completely atomic for a
> > system. I have no idea if FAT can do such a 'mv' event with only one
> > sector change
>
> Theoretically it could be possible to implement it for FAT (with more
> restrictions), but I doubt that general purpose implementation of any
> filesystem in kernel can do such thing. So no practically.

Now I'm wondering what the UEFI spec says about this, and whether this
problem was anticipated, and how surprised I should be if it wasn't
anticipated.


>
> > >
> > >
> > > But... if you are asking for consistency and atomicity at filesystem
> > > level (e.g. you turn off disk / power supply during rename operation)
> > > then this is not atomic and probably it cannot be implemented. When FAT
> > > filesystem is mounted (either by Windows or Linux kernel) it is marked
> > > by "dirty" flag and later when doing unmount, "dirty" flag is cleared.
> >
> > Right. And at least on UEFI and arm boards, it's not the linux kernel
> > that needs to read it right after a crash. It's the firmware's FAT
> > driver. I have no idea how they react to the dirty flag.
>
> Those bootloader firmwares which just load & run bootloader practically
> do not write anything to that FAT filesystem. In most cases their
> implementation of FAT is read-only and very stupid. I doubt that there
> is check for dirty flag.
>
> I saw lot of commercial devices of different kind which can read & write
> (backup) data to (FAT) SD card. And lot of time they were not able to
> read FAT filesystem formatted by other tool, only by their (or by
> in-device FAT formatted).
>
> So such firmwares can be full of bugs and it really is not a good idea
> to try booting bootloader from inconsistent FAT filesystem.

Right. I've had quite a bit of experience with this too, but lately I
think my experience is actually chock full of noisy data and what I
thought I was seeing, might not actually be what I was seeing.

Since ancient times in digital photography and video, it's been
considered widely that the camera firmware's FAT driver is crap, and
often corrupts the flash media, in particular when doing things like
individual image file deletes, or exchanging cards between unlike
cameras (make or model). As it turns out, this narrative is mostly
pushed by the flash media vendors.

Fast forward to the advent of cheap ARM boards and even Intel NUC type
computers, and people experiencing various kinds of corruption with
consumer name brand SD cards. The more generic, the more likely the
card goes suddenly read only forever. But even the name brand cards
I've used in an Intel NUC have had this happen, being replaced without
complaint by the manufacturer under warranty, yet it still keeps on
happening. Then found HN threads about this and people saying, yeah
you have to use industrial flash cards for this purpose, totally
solves the problem. And voila, there's enough anecdotal data out there
that really it's consumer flash being super sensitive to power cuts.

It may in fact have never had a thing to do with crap file system drivers.


> > GRUB has an option to blindly overwrite the 1024 byte contents of
> > grubenv (no file system modification), that's pretty close to atomic.
> > Most devices have physical sector bigger than 512 bytes. This write is
> > done in the pre-boot environment for saving state like boot counts.
>
> This depends on grub's FAT implementation. As said I would be very
> careful about such "atomic" writes. There are also some caches, include
> hardware on-disk, etc...

GRUB doesn't use any file system driver for writes, ever. It uses a
file system driver only to find out what two LBAs the "grubenv"
occupies, and then blindly overwrites those two sectors to save state.
There is no file system metadata update at all.


>
> > And add to the mix that I guess some UEFI firmware allow writing to
> > FAT in the pre-boot environment?
>
> Yes, UEFI API allows you to write to disk devices. And UEFI fileystem
> implementation can also supports writing to FAT fs.
>
> > I don't know if that's universally true. How do firmware handle a dirty bit being set?
>
> Bad implementation would ignore it. This is something which you should
> expect.

Maybe a project for someone is to bake xfstests into an EFI program so
we can start testing these firmware FAT drivers and see what we learn
about how bad they are?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23 12:53       ` Colin Walters
@ 2019-10-23 14:24         ` Chris Murphy
  2019-10-23 17:26           ` Colin Walters
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2019-10-23 14:24 UTC (permalink / raw)
  To: Colin Walters
  Cc: Chris Murphy, Pali Rohár, Richard Weinberger, Linux FS Devel

On Wed, Oct 23, 2019 at 2:53 PM Colin Walters <walters@verbum.org> wrote:
>
>
>
> On Tue, Oct 22, 2019, at 8:10 PM, Chris Murphy wrote:
> >
> > For multiple kernels,  it doesn't matter if a crash happens anywhere
> > from new kernel being written to FAT, through initramfs, because the
> > old bootloader configuration still points to old kernel + initramfs.
> > But in multiple kernel distros, the bootloader configuration needs
> > modification or a new drop in scriptlet to point to the new
> > kernel+initramfs pair. And that needs to be completely atomic: write
> > new files to a tmp location, that way a crash won't matter. The tricky
> > part is to write out the bootloader configuration change such that it
> > can be an atomic operation.
>
> Related: https://github.com/ostreedev/ostree/issues/1951
> There I'm proposing there to not try to fix this at the kernel/filesystem
> level (since we can't do much on FAT, and even on real filesystems we
> have the journaling-vs-bootloader issues), but instead create a protocol
> between things writing bootloader data and the bootloaders to help
> verify integrity.

The symlink method now being used, you describe as an OSTree-specific
invention. How is the new method you're proposing more generic such
that it's not also an OSTree-specific invention?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23 14:21         ` Chris Murphy
@ 2019-10-23 17:16           ` Pali Rohár
  2019-10-23 19:18             ` Chris Murphy
  2019-10-23 21:21             ` Richard Weinberger
  0 siblings, 2 replies; 22+ messages in thread
From: Pali Rohár @ 2019-10-23 17:16 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Richard Weinberger, Linux FS Devel

[-- Attachment #1: Type: text/plain, Size: 5717 bytes --]

On Wednesday 23 October 2019 16:21:19 Chris Murphy wrote:
> On Wed, Oct 23, 2019 at 1:50 PM Pali Rohár <pali.rohar@gmail.com> wrote:
> >
> > Hi!
> >
> > On Wednesday 23 October 2019 02:10:50 Chris Murphy wrote:
> > > a. write bootloader file to a temp location
> > > b. fsync
> > > c. mv temp final
> > > d. fsync
> > >
> > > if the crash happens anywhere from before a. to just after c. the old
> > > configuration file is still present and old kernel+initramfs are used.
> > > No problem. If the crash happens well after c. probably the new one is
> > > in place, for sure after d. it's in place, and the new kernel+
> > > initramfs are used.
> >
> > I do not think that kernel guarantee for any filesystem that rename
> > operation would be atomic on underlying disk storage.
> >
> > But somebody else should confirm it.
> 
> I don't know either or how to confirm it.

Somebody who is watching linuxfs-devel and has deep knowledge in this
area... could provide more information.

> But, being ignorant about a
> great many things, my instinct is literal fsync (flush buffer to disk)
> should go away at the application level, and fsync should only be used
> to indicate write order and what is part of a "commit" that is to be
> atomic (completely succeeds or fails). And of course that can only be
> guaranteed as far as the kernel is concerned, it doesn't guarantee
> anything about how the hardware block device actually behaves (warts
> bugs and all).
> 
> Anyway it made me think of this:
> https://lwn.net/Articles/789600/
> 
> 
> > So if kernel crashes in the middle of c or between c and d you need to
> > repair filesystem externally prior trying to boot from such disk.
> 
> Nice in theory, but in practice the user simply reboots, and screams
> WTF outloud if the system face plants. And people wonder why things
> are still broken 20 years later with all the same kinds of problems
> and prescriptions to boot off some rescue media instead of it being
> fail safe by design. It's definitely not fail safe to have a kernel
> update that could possibly result in an unbootable system. I can't
> think of any ordinary server, cloud, desktop, mobile user who wants to
> have to boot from rescue media to do a simple repair. Of course they
> all just want to reboot and have the right thing always happen no
> matter what, otherwise they get so nervous about doing updates that
> they postpone them longer than they should.

Still, in any time when you improperly unmount filesystem you should
check for error, if you do not want to loose your data.

And critical area should have some "recovery" mechanism to repair broken
bootloader / kernel image.

Anyway, chance that kernel crashes at step when replacing old kernel
disk image by new one is low. So it should not be such big issue to need
to do external recovery.

> > > I'm not sure how to test the following: write kernel and initramfs to
> > > final locations. And bootloader configuration is written to a temp
> > > path. Then at the decision moment, rename it so that it goes from temp
> > > path to final path doing at most 1 sector change. 1 512 byte sector
> > > is a reasonable number to assume can be completely atomic for a
> > > system. I have no idea if FAT can do such a 'mv' event with only one
> > > sector change
> >
> > Theoretically it could be possible to implement it for FAT (with more
> > restrictions), but I doubt that general purpose implementation of any
> > filesystem in kernel can do such thing. So no practically.
> 
> Now I'm wondering what the UEFI spec says about this, and whether this
> problem was anticipated, and how surprised I should be if it wasn't
> anticipated.

I know that UEFI spec has reference for FAT filesystems to MS
specification (fagen103.doc). I do not know if it says anything about
filesystem details, but I guess it specify requirements, that
implementations must be compatible with FAT12, FAT16 and FAT32 according
to specification.

> > > GRUB has an option to blindly overwrite the 1024 byte contents of
> > > grubenv (no file system modification), that's pretty close to atomic.
> > > Most devices have physical sector bigger than 512 bytes. This write is
> > > done in the pre-boot environment for saving state like boot counts.
> >
> > This depends on grub's FAT implementation. As said I would be very
> > careful about such "atomic" writes. There are also some caches, include
> > hardware on-disk, etc...
> 
> GRUB doesn't use any file system driver for writes, ever. It uses a
> file system driver only to find out what two LBAs the "grubenv"
> occupies, and then blindly overwrites those two sectors to save state.
> There is no file system metadata update at all.

Yes, you are right. Looking at the code and grub's filesystem drivers
are read-only. No write support.

> >
> > > And add to the mix that I guess some UEFI firmware allow writing to
> > > FAT in the pre-boot environment?
> >
> > Yes, UEFI API allows you to write to disk devices. And UEFI fileystem
> > implementation can also supports writing to FAT fs.
> >
> > > I don't know if that's universally true. How do firmware handle a dirty bit being set?
> >
> > Bad implementation would ignore it. This is something which you should
> > expect.
> 
> Maybe a project for someone is to bake xfstests into an EFI program so
> we can start testing these firmware FAT drivers and see what we learn
> about how bad they are?

That is possible.

Also UEFI allows you to write our own UEFI filesystem drivers which
other UEFI programs and bootloaders can use.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23 14:24         ` Chris Murphy
@ 2019-10-23 17:26           ` Colin Walters
  0 siblings, 0 replies; 22+ messages in thread
From: Colin Walters @ 2019-10-23 17:26 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Pali Rohár, Richard Weinberger, Linux FS Devel



On Wed, Oct 23, 2019, at 10:24 AM, Chris Murphy wrote:
> On Wed, Oct 23, 2019 at 2:53 PM Colin Walters <walters@verbum.org> wrote:
> >
> >
> >
> > On Tue, Oct 22, 2019, at 8:10 PM, Chris Murphy wrote:
> > >
> > > For multiple kernels,  it doesn't matter if a crash happens anywhere
> > > from new kernel being written to FAT, through initramfs, because the
> > > old bootloader configuration still points to old kernel + initramfs.
> > > But in multiple kernel distros, the bootloader configuration needs
> > > modification or a new drop in scriptlet to point to the new
> > > kernel+initramfs pair. And that needs to be completely atomic: write
> > > new files to a tmp location, that way a crash won't matter. The tricky
> > > part is to write out the bootloader configuration change such that it
> > > can be an atomic operation.
> >
> > Related: https://github.com/ostreedev/ostree/issues/1951
> > There I'm proposing there to not try to fix this at the kernel/filesystem
> > level (since we can't do much on FAT, and even on real filesystems we
> > have the journaling-vs-bootloader issues), but instead create a protocol
> > between things writing bootloader data and the bootloaders to help
> > verify integrity.
> 
> The symlink method now being used, you describe as an OSTree-specific
> invention. How is the new method you're proposing more generic such
> that it's not also an OSTree-specific invention?

It'd take a usual slow process of gathering consensus among the two groups of projects writing data in /boot, and bootloaders.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23 17:16           ` Pali Rohár
@ 2019-10-23 19:18             ` Chris Murphy
  2019-10-23 21:21             ` Richard Weinberger
  1 sibling, 0 replies; 22+ messages in thread
From: Chris Murphy @ 2019-10-23 19:18 UTC (permalink / raw)
  To: Pali Rohár; +Cc: Chris Murphy, Richard Weinberger, Linux FS Devel

On Wed, Oct 23, 2019 at 7:16 PM Pali Rohár <pali.rohar@gmail.com> wrote:
> On Wednesday 23 October 2019 16:21:19 Chris Murphy wrote:

> > I don't know either or how to confirm it.
>
> Somebody who is watching linuxfs-devel and has deep knowledge in this
> area... could provide more information.

Maybe dm-log-writes can do this? Just log all the writes, and
hopefully it's straightforward to match the 'mv' rename command with
the resulting writes.


> > Nice in theory, but in practice the user simply reboots, and screams
> > WTF outloud if the system face plants. And people wonder why things
> > are still broken 20 years later with all the same kinds of problems
> > and prescriptions to boot off some rescue media instead of it being
> > fail safe by design. It's definitely not fail safe to have a kernel
> > update that could possibly result in an unbootable system. I can't
> > think of any ordinary server, cloud, desktop, mobile user who wants to
> > have to boot from rescue media to do a simple repair. Of course they
> > all just want to reboot and have the right thing always happen no
> > matter what, otherwise they get so nervous about doing updates that
> > they postpone them longer than they should.
>
> Still, in any time when you improperly unmount filesystem you should
> check for error, if you do not want to loose your data.

Perhaps, but it's archaic. The user usually has no idea what went
wrong, and all kinds of factors strongly disincentivize doing an
offline fsck, and incentivize just rebooting and seeing what happens.
If they get past the bootloader, systemd/init is going to run an fsck
on all volumes that need it or kernel code does log replay to make
them up to date.

> And critical area should have some "recovery" mechanism to repair broken
> bootloader / kernel image.
>
> Anyway, chance that kernel crashes at step when replacing old kernel
> disk image by new one is low. So it should not be such big issue to need
> to do external recovery.

'strace -D -ff -o' on grub2-mkconfig causes over 1800 PID files to be
generated. Filtering for lines containing grub.cfg...

# grep grub.cfg *
grub.12167:execve("/usr/sbin/grub2-mkconfig", ["grub2-mkconfig", "-o",
"/boot/efi/EFI/fedora/grub.cfg"], 0x7ffc68054470 /* 24 vars */) = 0
grub.12167:read(3, "/boot/efi/EFI/fedora/grub.cfg\n", 128) = 30
grub.12167:openat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
grub.12167:read(255, "\nif test \"x${grub_cfg}\" != \"x\" ;"..., 8192) = 567
grub.12174:write(1, "/boot/efi/EFI/fedora/grub.cfg\n", 30) = 30
grub.12349:execve("/usr/bin/rm", ["rm", "-f",
"/boot/efi/EFI/fedora/grub.cfg.ne"...], 0x55c599fde980 /* 48 vars */)
= 0
grub.12349:newfstatat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new",
0x556be17d9758, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or
directory)
grub.12349:unlinkat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new", 0)
= -1 ENOENT (No such file or directory)
grub.14064:execve("/usr/bin/grub2-script-check",
["/usr/bin/grub2-script-check",
"/boot/efi/EFI/fedora/grub.cfg.ne"...], 0x55c599fde980 /* 48 vars */)
= 0
grub.14064:openat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new", O_RDONLY) = 3
grub.14065:openat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
grub.14065:execve("/usr/bin/cat", ["cat",
"/boot/efi/EFI/fedora/grub.cfg.ne"...], 0x55c599fde980 /* 48 vars */)
= 0
grub.14065:openat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new", O_RDONLY) = 3
grub.14066:execve("/usr/bin/rm", ["rm", "-f",
"/boot/efi/EFI/fedora/grub.cfg.ne"...], 0x55c599fde980 /* 48 vars */)
= 0
grub.14066:newfstatat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new",
{st_mode=S_IFREG|0700, st_size=6080, ...}, AT_SYMLINK_NOFOLLOW) = 0
grub.14066:unlinkat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new", 0) = 0

I'm not able to parse this. My best guess is it's writing out an all
new file, grub.cfg.new, and then doesn't rename it. Instead it uses
cat to copy the contents of the new file and overwrites the old one?
Yeah, the inode stays the same, as does access time. Is this fragile?

Android and ChromeOS and some others, have A and B kernel partitions
which are just blobs. They use some other form of hint to indicate
which partition is actually used at one time, meaning they can
reliably ensure a failsafe update of the other partition, and sanity
testing it, before committing the switch. Crude but effective.

Apple goes so far as to get all of their product firmware the ability
to natively read APFS, which contains the kernel and early boot files.

I have no idea how Windows does kernel or bootloader updates, except
they don't keep the EFI system partition persistently mounted all day
long, like virtually all Linux distributions today, at /boot/efi -
that does seem guaranteed to result in many dirty flag FAT file system
cleanups. I know I've seen such fix ups in my journal files.




> > > > I'm not sure how to test the following: write kernel and initramfs to
> > > > final locations. And bootloader configuration is written to a temp
> > > > path. Then at the decision moment, rename it so that it goes from temp
> > > > path to final path doing at most 1 sector change. 1 512 byte sector
> > > > is a reasonable number to assume can be completely atomic for a
> > > > system. I have no idea if FAT can do such a 'mv' event with only one
> > > > sector change
> > >
> > > Theoretically it could be possible to implement it for FAT (with more
> > > restrictions), but I doubt that general purpose implementation of any
> > > filesystem in kernel can do such thing. So no practically.
> >
> > Now I'm wondering what the UEFI spec says about this, and whether this
> > problem was anticipated, and how surprised I should be if it wasn't
> > anticipated.
>
> I know that UEFI spec has reference for FAT filesystems to MS
> specification (fagen103.doc). I do not know if it says anything about
> filesystem details, but I guess it specify requirements, that
> implementations must be compatible with FAT12, FAT16 and FAT32 according
> to specification.

My understanding of the UEFI spec is the file system is called the
'EFI file system' and was intended to be predicated on FAT12, FAT16,
FAT32 at a specific moment in time, bugs and warts and all. By now
probably around 20 years ago. And then not ever changed. In practice
it seems there is no such separate thing as the EFI file system. No
separate mkfs flag, or mount options, to make sure this is *the*
canonical EFI file system, rather than just today's latest bug fixed
and feature enhanced FAT file system as supported by Linux.

So god only knows what bugs might arise from that discrepancy one day.

> Also UEFI allows you to write our own UEFI filesystem drivers which
> other UEFI programs and bootloaders can use.

I'm not finding it this second but someone basically did this work
already, but wrapping existing GRUB file system modules into EFI file
system drivers.

OK so plausibly on UEFI, it could be handed a better FAT driver very
soon after POST to avoid firmware FAT bugs. Or for that matter, create
"A" and "B" EFI system partitions, containing identical static boot
data, that merely points to a purpose built $BOOT volume that can host
early boot files and supports atomic updates. That'd be clever, but
also not generic. It's UEFI specific.

It'd be neat to have a superset implementation that can work anywhere.
But then allow for optimizations. But the problem with the generic
solution? Who will follow it? The Bootloaderspec pretty much fell on
deaf ears. The GRUB folks don't care to upstream it, nor sysliux, nor
uboot near as I can tell. Simple 1 page spec. Fedora's GRUB carries
patches for it, and now uses them by default. Son hilariously Fedora
is maybe the first distribution to actively support three
substantially different bootloader update mechanisms: grub-mkconfig,
grubby, and bootloaderspec.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23 17:16           ` Pali Rohár
  2019-10-23 19:18             ` Chris Murphy
@ 2019-10-23 21:21             ` Richard Weinberger
  2019-10-23 21:56               ` Chris Murphy
  1 sibling, 1 reply; 22+ messages in thread
From: Richard Weinberger @ 2019-10-23 21:21 UTC (permalink / raw)
  To: Pali Rohár; +Cc: Chris Murphy, Linux FS Devel

On Wed, Oct 23, 2019 at 7:16 PM Pali Rohár <pali.rohar@gmail.com> wrote:
> On Wednesday 23 October 2019 16:21:19 Chris Murphy wrote:
> > On Wed, Oct 23, 2019 at 1:50 PM Pali Rohár <pali.rohar@gmail.com> wrote:
> > > I do not think that kernel guarantee for any filesystem that rename
> > > operation would be atomic on underlying disk storage.
> > >
> > > But somebody else should confirm it.
> >
> > I don't know either or how to confirm it.
>
> Somebody who is watching linuxfs-devel and has deep knowledge in this
> area... could provide more information.

This is filesystem specific.
For example on UBIFS we make sure that the rename operation is atomic.
Changing multiple directory entries is one journal commit, so either it happened
completely or not at all.
On JFFS2, on the other hand, rename can degrade to a hard link.

I'd go so far and claim that any modern Linux filesystem guarantees
that rename is atomic.
But bugs still happen, crashmonkey found some interesting issues in
this area[0].

[0] http://www.cs.utexas.edu/~vijay/papers/osdi18-crashmonkey.pdf

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23 21:21             ` Richard Weinberger
@ 2019-10-23 21:56               ` Chris Murphy
  2019-10-23 22:22                 ` Richard Weinberger
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2019-10-23 21:56 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Pali Rohár, Chris Murphy, Linux FS Devel

On Wed, Oct 23, 2019 at 11:21 PM Richard Weinberger
<richard.weinberger@gmail.com> wrote:
>
> On Wed, Oct 23, 2019 at 7:16 PM Pali Rohár <pali.rohar@gmail.com> wrote:
> > On Wednesday 23 October 2019 16:21:19 Chris Murphy wrote:
> > > On Wed, Oct 23, 2019 at 1:50 PM Pali Rohár <pali.rohar@gmail.com> wrote:
> > > > I do not think that kernel guarantee for any filesystem that rename
> > > > operation would be atomic on underlying disk storage.
> > > >
> > > > But somebody else should confirm it.
> > >
> > > I don't know either or how to confirm it.
> >
> > Somebody who is watching linuxfs-devel and has deep knowledge in this
> > area... could provide more information.
>
> This is filesystem specific.
> For example on UBIFS we make sure that the rename operation is atomic.
> Changing multiple directory entries is one journal commit, so either it happened
> completely or not at all.
> On JFFS2, on the other hand, rename can degrade to a hard link.
>
> I'd go so far and claim that any modern Linux filesystem guarantees
> that rename is atomic.

Any atomicity that depends on journal commits cannot be considered to
have atomicity in a boot context, because bootloaders don't do journal
replay. It's completely ignored.

If a journal is present, is it appropriate to consider it a separate
and optional part of the file system? I don't know for sure but I can
pretty much guess any of the bootloader upstreams would say: we are
not file system experts, if file system developers consider the
journal inseparable from the file system, and that journal replay is
non-optional when indicated that it should be performed, then we
welcome patches from file system developers to add such support in
bootladers X, Y, and Z.

And having already asked about bootloaders doing journal replay on XFS
list, and maybe a while ago on ext4 list (I forget) that was sorta
taken as a bit of comedy. Like, how would that work? And it'd
inevitably lead to a fork in journal replay code. Possibly more than
one to account for the different bootloader limitations and memory
handling differences, etc. So it's not very realistic. Probably. And
more realistic if they aren't separable is, if you care about atomic
guarantees for things related to bootloading, don't use journaled file
systems. Proscribed.

Which is why this thread exists to see what can be done about FAT
since it's really the only file system we have to be able to boot
from.

---
Chris Murphy

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23 21:56               ` Chris Murphy
@ 2019-10-23 22:22                 ` Richard Weinberger
  2019-10-24 21:46                   ` Chris Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Weinberger @ 2019-10-23 22:22 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Pali Rohár, Linux FS Devel

On Wed, Oct 23, 2019 at 11:56 PM Chris Murphy <lists@colorremedies.com> wrote:
> Any atomicity that depends on journal commits cannot be considered to
> have atomicity in a boot context, because bootloaders don't do journal
> replay. It's completely ignored.

It depends on the bootloader. If you care about atomicity you need to handle
the journal.
There are also filesystems which *require* the journal to be handled.
In that case you can still replay to memory.

And yes, filesystem implementations in many bootloaders are in beyond
shameful state.

> If a journal is present, is it appropriate to consider it a separate
> and optional part of the file system?

No. This is filesystem specific.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-23 22:22                 ` Richard Weinberger
@ 2019-10-24 21:46                   ` Chris Murphy
  2019-10-24 21:57                     ` Pali Rohár
  2019-10-24 22:16                     ` Richard Weinberger
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Murphy @ 2019-10-24 21:46 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Chris Murphy, Pali Rohár, Linux FS Devel

On Thu, Oct 24, 2019 at 12:22 AM Richard Weinberger
<richard.weinberger@gmail.com> wrote:
>
> On Wed, Oct 23, 2019 at 11:56 PM Chris Murphy <lists@colorremedies.com> wrote:
> > Any atomicity that depends on journal commits cannot be considered to
> > have atomicity in a boot context, because bootloaders don't do journal
> > replay. It's completely ignored.
>
> It depends on the bootloader. If you care about atomicity you need to handle
> the journal.
> There are also filesystems which *require* the journal to be handled.
> In that case you can still replay to memory.

I'm vaguely curious about examples of bootloaders that do journal
replay, only because I can't think of any that apply. Certainly none
that do replay on either ext4 or XFS. I've got some stale brain cells
telling me there was at one time JBD code in GRUB for, I think ext3
journal replay (?) and all of that got ripped out a very long time
ago. Maybe even before GRUB 2.


> And yes, filesystem implementations in many bootloaders are in beyond
> shameful state.

Right. And while that's polite language, in their defence its just not
their area of expertise. I tend to think that bootloader support is a
burden primarily on file system folks. If you want this use case
supported, then do the work. Ideally the upstreams would pair
interested parties from each discipline to make this happen. But
anyway, as I've heard it described by file system folks, it may not be
practical to support it, in which case for the atomic update use case,
the modern journaled file systems are just flat out disqualified.

Which again leads me to FAT. We must have a solution that works there,
even if it's some odd duck like thing, where the FAT ESP is
essentially a static configuration, not changing, that points to some
other block device (a different partition and different file system)
that has the desired behavioral charactersistics.

> > If a journal is present, is it appropriate to consider it a separate
> > and optional part of the file system?
>
> No. This is filesystem specific.

I understand it's optional for ext3/4 insofar as it can optionally be
disabled, where on XFS it's compulsory. But mere presence of a journal
doesn't mean replay is required, there's a file system specific flag
that indicates replay is needed for the file system to be valid/cought
up to date. To what degree a file system indicating journal replace is
required, but can't be replayed, is still a valid file system isn't
answered by file system metadata. The assumption is, replay must
happen when indicated. So if a bootloader flat out can't do that, it
essentially means the combination of GRUB2, das uboot,
syslinux/extlinux and ext3/4 or XFS, is *proscribed* if the use case
requires atomic kernel updates. Given the current state of affairs.

So that leads me to, what about FAT? i.e. how does this get solved on
FAT? And does that help us solve it on journaled file systems? If not,
can it also be generic enough to solve it here? I'm actually not
convinced it can be solved in journaled file systems at all, unless
the bootloader can do journal replay, but I'm not a file system expert
:P

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-24 21:46                   ` Chris Murphy
@ 2019-10-24 21:57                     ` Pali Rohár
  2019-10-24 22:19                       ` Chris Murphy
  2019-10-24 22:16                     ` Richard Weinberger
  1 sibling, 1 reply; 22+ messages in thread
From: Pali Rohár @ 2019-10-24 21:57 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Richard Weinberger, Linux FS Devel

[-- Attachment #1: Type: text/plain, Size: 584 bytes --]

On Thursday 24 October 2019 23:46:43 Chris Murphy wrote:
> So that leads me to, what about FAT? i.e. how does this get solved on FAT?

Hi Chris! I think that for FAT in most cases it used ostrich algorithm.
Probability that kernel crashes in the middle of operation which is
updating kernel image on boot partition is very low.

I'm Looking at grub's fat source code and there is no handling of dirty
bit... http://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/fs/fat.c
It just expects that whole FAT fs is in consistent state.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-24 21:46                   ` Chris Murphy
  2019-10-24 21:57                     ` Pali Rohár
@ 2019-10-24 22:16                     ` Richard Weinberger
  2019-10-24 22:26                       ` Chris Murphy
  1 sibling, 1 reply; 22+ messages in thread
From: Richard Weinberger @ 2019-10-24 22:16 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Pali Rohár, linux-fsdevel

----- Ursprüngliche Mail -----
> Von: "Chris Murphy" <lists@colorremedies.com>
> An: "Richard Weinberger" <richard.weinberger@gmail.com>
> CC: "Chris Murphy" <lists@colorremedies.com>, "Pali Rohár" <pali.rohar@gmail.com>, "linux-fsdevel"
> <linux-fsdevel@vger.kernel.org>
> Gesendet: Donnerstag, 24. Oktober 2019 23:46:43
> Betreff: Re: Is rename(2) atomic on FAT?

> On Thu, Oct 24, 2019 at 12:22 AM Richard Weinberger
> <richard.weinberger@gmail.com> wrote:
>>
>> On Wed, Oct 23, 2019 at 11:56 PM Chris Murphy <lists@colorremedies.com> wrote:
>> > Any atomicity that depends on journal commits cannot be considered to
>> > have atomicity in a boot context, because bootloaders don't do journal
>> > replay. It's completely ignored.
>>
>> It depends on the bootloader. If you care about atomicity you need to handle
>> the journal.
>> There are also filesystems which *require* the journal to be handled.
>> In that case you can still replay to memory.
> 
> I'm vaguely curious about examples of bootloaders that do journal
> replay, only because I can't think of any that apply. Certainly none
> that do replay on either ext4 or XFS. I've got some stale brain cells
> telling me there was at one time JBD code in GRUB for, I think ext3
> journal replay (?) and all of that got ripped out a very long time
> ago. Maybe even before GRUB 2.

U-boot, for example. Of course it does not so for any filesystem, but where
it is needed and makes sense.

Another approach is using Linux as bootloader and kexec another kernel.
That way you can have a full filesystem implementation and bring the filesystem
in a consistent state before reading from it.
 
> 
>> And yes, filesystem implementations in many bootloaders are in beyond
>> shameful state.
> 
> Right. And while that's polite language, in their defence its just not
> their area of expertise. I tend to think that bootloader support is a
> burden primarily on file system folks. If you want this use case
> supported, then do the work. Ideally the upstreams would pair
> interested parties from each discipline to make this happen. But
> anyway, as I've heard it described by file system folks, it may not be
> practical to support it, in which case for the atomic update use case,
> the modern journaled file systems are just flat out disqualified.
> 
> Which again leads me to FAT. We must have a solution that works there,
> even if it's some odd duck like thing, where the FAT ESP is
> essentially a static configuration, not changing, that points to some
> other block device (a different partition and different file system)
> that has the desired behavioral charactersistics.
> 
>> > If a journal is present, is it appropriate to consider it a separate
>> > and optional part of the file system?
>>
>> No. This is filesystem specific.
> 
> I understand it's optional for ext3/4 insofar as it can optionally be
> disabled, where on XFS it's compulsory. But mere presence of a journal
> doesn't mean replay is required, there's a file system specific flag
> that indicates replay is needed for the file system to be valid/cought
> up to date. To what degree a file system indicating journal replace is
> required, but can't be replayed, is still a valid file system isn't
> answered by file system metadata. The assumption is, replay must
> happen when indicated. So if a bootloader flat out can't do that, it
> essentially means the combination of GRUB2, das uboot,
> syslinux/extlinux and ext3/4 or XFS, is *proscribed* if the use case
> requires atomic kernel updates. Given the current state of affairs.
> 
> So that leads me to, what about FAT? i.e. how does this get solved on
> FAT? And does that help us solve it on journaled file systems? If not,
> can it also be generic enough to solve it here? I'm actually not
> convinced it can be solved in journaled file systems at all, unless
> the bootloader can do journal replay, but I'm not a file system expert
> :P

Like I mentioned above, use Linux as bootloader.
Have a minimal Linux kernel which can do kexec and the journaling filesystem
of your choice.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-24 21:57                     ` Pali Rohár
@ 2019-10-24 22:19                       ` Chris Murphy
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Murphy @ 2019-10-24 22:19 UTC (permalink / raw)
  To: Pali Rohár; +Cc: Chris Murphy, Richard Weinberger, Linux FS Devel

On Thu, Oct 24, 2019 at 11:57 PM Pali Rohár <pali.rohar@gmail.com> wrote:
>
> On Thursday 24 October 2019 23:46:43 Chris Murphy wrote:
> > So that leads me to, what about FAT? i.e. how does this get solved on FAT?
>
> Hi Chris! I think that for FAT in most cases it used ostrich algorithm.
> Probability that kernel crashes in the middle of operation which is
> updating kernel image on boot partition is very low.
>
> I'm Looking at grub's fat source code and there is no handling of dirty
> bit... http://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/fs/fat.c
> It just expects that whole FAT fs is in consistent state.

I can't estimate how likely the same situation is for typical UEFI
firmware. But many follow TianoCore and if TianoCore is being overly
optimistic, now what?

So then I think of ugly but effective things, just like ChromeOS,
where we have two mirrored ESP's, and create a faux dirty bit with a
hidden file /.dirty or some ugly crap and hope to some deity that we
could get agreement among bootloader developers to prefer the ESP
without that file. File gets set, do all the modifications, and only
once fsync() exits 0, remove the /.dirty file? I mean...that's crazy
isn't it?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-24 22:16                     ` Richard Weinberger
@ 2019-10-24 22:26                       ` Chris Murphy
  2019-10-24 22:33                         ` Richard Weinberger
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2019-10-24 22:26 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Chris Murphy, Pali Rohár, linux-fsdevel

(sorry, hate it when i rever to old habits and don't reply all)


> > On Thu, Oct 24, 2019 at 12:22 AM Richard Weinberger
> > <richard.weinberger@gmail.com> wrote:
> >>
> >> On Wed, Oct 23, 2019 at 11:56 PM Chris Murphy <lists@colorremedies.com=
> wrote:
> >> > Any atomicity that depends on journal commits cannot be considered t=
o
> >> > have atomicity in a boot context, because bootloaders don't do journ=
al
> >> > replay. It's completely ignored.
> >>
> >> It depends on the bootloader. If you care about atomicity you need to =
handle
> >> the journal.
> >> There are also filesystems which *require* the journal to be handled.
> >> In that case you can still replay to memory.
> >
> > I'm vaguely curious about examples of bootloaders that do journal
> > replay, only because I can't think of any that apply. Certainly none
> > that do replay on either ext4 or XFS. I've got some stale brain cells
> > telling me there was at one time JBD code in GRUB for, I think ext3
> > journal replay (?) and all of that got ripped out a very long time
> > ago. Maybe even before GRUB 2.
>
> U-boot, for example. Of course it does not so for any filesystem, but whe=
re
> it is needed and makes sense.

Really? uboot does journal replay on ext3/4? I think at this point the
most common file system on Linux distros is unquestionably ext4, and
the most common bootloader is GRUB and for sure GRUB is no doing
journal replay on anything, including ext4.


> Another approach is using Linux as bootloader and kexec another kernel.
> That way you can have a full filesystem implementation and bring the file=
system
> in a consistent state before reading from it.

Sure the one or more file systems must be assumed to be dirty already.
The EFI system partition on UEFI; and the FAT32 $BOOT on ARM; as well
as the more conventional /boot which is ext4. Those must be assumed to
be dirty with journal replay required. Yes they should have been
cleanly unmounted and thus journal replay not required, but what if
that's not the case? We can't really claim atomic updates in ideal
cases, but rather worst case scenario.

>
> >
> >> And yes, filesystem implementations in many bootloaders are in beyond
> >> shameful state.
> >
> > Right. And while that's polite language, in their defence its just not
> > their area of expertise. I tend to think that bootloader support is a
> > burden primarily on file system folks. If you want this use case
> > supported, then do the work. Ideally the upstreams would pair
> > interested parties from each discipline to make this happen. But
> > anyway, as I've heard it described by file system folks, it may not be
> > practical to support it, in which case for the atomic update use case,
> > the modern journaled file systems are just flat out disqualified.
> >
> > Which again leads me to FAT. We must have a solution that works there,
> > even if it's some odd duck like thing, where the FAT ESP is
> > essentially a static configuration, not changing, that points to some
> > other block device (a different partition and different file system)
> > that has the desired behavioral charactersistics.
> >
> >> > If a journal is present, is it appropriate to consider it a separate
> >> > and optional part of the file system?
> >>
> >> No. This is filesystem specific.
> >
> > I understand it's optional for ext3/4 insofar as it can optionally be
> > disabled, where on XFS it's compulsory. But mere presence of a journal
> > doesn't mean replay is required, there's a file system specific flag
> > that indicates replay is needed for the file system to be valid/cought
> > up to date. To what degree a file system indicating journal replace is
> > required, but can't be replayed, is still a valid file system isn't
> > answered by file system metadata. The assumption is, replay must
> > happen when indicated. So if a bootloader flat out can't do that, it
> > essentially means the combination of GRUB2, das uboot,
> > syslinux/extlinux and ext3/4 or XFS, is *proscribed* if the use case
> > requires atomic kernel updates. Given the current state of affairs.
> >
> > So that leads me to, what about FAT? i.e. how does this get solved on
> > FAT? And does that help us solve it on journaled file systems? If not,
> > can it also be generic enough to solve it here? I'm actually not
> > convinced it can be solved in journaled file systems at all, unless
> > the bootloader can do journal replay, but I'm not a file system expert
> > :P
>
> Like I mentioned above, use Linux as bootloader.
> Have a minimal Linux kernel which can do kexec and the journaling filesys=
tem
> of your choice.

Yeah that's got its own difficulties, including the way distro build
systems work. I'm not opposed to it, but it's a practical barrier to
adoption. I'd almost say it's easier to make Btrfs $BOOT compulsory,
make static ESP compulsory, and voila!


--=20
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-24 22:26                       ` Chris Murphy
@ 2019-10-24 22:33                         ` Richard Weinberger
  2019-10-25  9:22                           ` Chris Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Weinberger @ 2019-10-24 22:33 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Pali Rohár, linux-fsdevel

----- Ursprüngliche Mail -----
>> U-boot, for example. Of course it does not so for any filesystem, but whe=
> re
>> it is needed and makes sense.
> 
> Really? uboot does journal replay on ext3/4? I think at this point the
> most common file system on Linux distros is unquestionably ext4, and
> the most common bootloader is GRUB and for sure GRUB is no doing
> journal replay on anything, including ext4.

For ext4 it does a replay when you start to write to it.
 
> Yeah that's got its own difficulties, including the way distro build
> systems work. I'm not opposed to it, but it's a practical barrier to
> adoption. I'd almost say it's easier to make Btrfs $BOOT compulsory,
> make static ESP compulsory, and voila!

I really don't get your point. I thought you are designing a "sane"
system which can tolerate powercuts down an update.
Why care about distros?
The approach with Linux being a "bootloader" is common for embedded/secure
systems.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-24 22:33                         ` Richard Weinberger
@ 2019-10-25  9:22                           ` Chris Murphy
  2019-10-25  9:50                             ` Richard Weinberger
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2019-10-25  9:22 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Chris Murphy, Pali Rohár, linux-fsdevel

On Fri, Oct 25, 2019 at 12:33 AM Richard Weinberger <richard@nod.at> wrote:
>
> ----- Ursprüngliche Mail -----
> >> U-boot, for example. Of course it does not so for any filesystem, but whe=
> > re
> >> it is needed and makes sense.
> >
> > Really? uboot does journal replay on ext3/4? I think at this point the
> > most common file system on Linux distros is unquestionably ext4, and
> > the most common bootloader is GRUB and for sure GRUB is no doing
> > journal replay on anything, including ext4.
>
> For ext4 it does a replay when you start to write to it.

That strikes me as weird. The bootloader will read from the file
system before it writes, and possibly get the wrong view of the file
system's true state because journal replay wasn't done.

> > Yeah that's got its own difficulties, including the way distro build
> > systems work. I'm not opposed to it, but it's a practical barrier to
> > adoption. I'd almost say it's easier to make Btrfs $BOOT compulsory,
> > make static ESP compulsory, and voila!
>
> I really don't get your point. I thought you are designing a "sane"
> system which can tolerate powercuts down an update.
> Why care about distros?
> The approach with Linux being a "bootloader" is common for embedded/secure
> systems.

I want something as generic as possible, so as many use cases have
reliable kernel/bootloader updates as possible, so that fewer users
experience systems face planting following such updates. Any system
can experience an unscheduled, unclean shutdown. Exceptions to the
generic case should be rare and pretty much be about handling hardware
edge cases.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Is rename(2) atomic on FAT?
  2019-10-25  9:22                           ` Chris Murphy
@ 2019-10-25  9:50                             ` Richard Weinberger
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Weinberger @ 2019-10-25  9:50 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Pali Rohár, linux-fsdevel

----- Ursprüngliche Mail -----
> Von: "Chris Murphy" <lists@colorremedies.com>
> An: "richard" <richard@nod.at>
> CC: "Chris Murphy" <lists@colorremedies.com>, "Pali Rohár" <pali.rohar@gmail.com>, "linux-fsdevel"
> <linux-fsdevel@vger.kernel.org>
> Gesendet: Freitag, 25. Oktober 2019 11:22:17
> Betreff: Re: Is rename(2) atomic on FAT?

> On Fri, Oct 25, 2019 at 12:33 AM Richard Weinberger <richard@nod.at> wrote:
>>
>> ----- Ursprüngliche Mail -----
>> >> U-boot, for example. Of course it does not so for any filesystem, but whe=
>> > re
>> >> it is needed and makes sense.
>> >
>> > Really? uboot does journal replay on ext3/4? I think at this point the
>> > most common file system on Linux distros is unquestionably ext4, and
>> > the most common bootloader is GRUB and for sure GRUB is no doing
>> > journal replay on anything, including ext4.
>>
>> For ext4 it does a replay when you start to write to it.
> 
> That strikes me as weird. The bootloader will read from the file
> system before it writes, and possibly get the wrong view of the file
> system's true state because journal replay wasn't done.

This can't happen. U-boot is strictly single threaded, no interrupts,
no nothing.

For the ext4 case in U-boot, it does a replay not to have clean file
system upon read but to not corrupt it upon write.

For UBIFS, for example, U-boot does a replay also before reading.
But it replays into memory. The journal size is fixed and known,
so no big deal.

>> > Yeah that's got its own difficulties, including the way distro build
>> > systems work. I'm not opposed to it, but it's a practical barrier to
>> > adoption. I'd almost say it's easier to make Btrfs $BOOT compulsory,
>> > make static ESP compulsory, and voila!
>>
>> I really don't get your point. I thought you are designing a "sane"
>> system which can tolerate powercuts down an update.
>> Why care about distros?
>> The approach with Linux being a "bootloader" is common for embedded/secure
>> systems.
> 
> I want something as generic as possible, so as many use cases have
> reliable kernel/bootloader updates as possible, so that fewer users
> experience systems face planting following such updates. Any system
> can experience an unscheduled, unclean shutdown. Exceptions to the
> generic case should be rare and pretty much be about handling hardware
> edge cases.

Don't forget UEFI updates. ;-)

Really, you can't have it all on a x86 system in such a generic way.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-10-25  9:50 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 19:57 Is rename(2) atomic on FAT? Chris Murphy
2019-10-21 21:44 ` Richard Weinberger
2019-10-22 10:54   ` Pali Rohár
2019-10-23  0:10     ` Chris Murphy
2019-10-23 11:50       ` Pali Rohár
2019-10-23 14:21         ` Chris Murphy
2019-10-23 17:16           ` Pali Rohár
2019-10-23 19:18             ` Chris Murphy
2019-10-23 21:21             ` Richard Weinberger
2019-10-23 21:56               ` Chris Murphy
2019-10-23 22:22                 ` Richard Weinberger
2019-10-24 21:46                   ` Chris Murphy
2019-10-24 21:57                     ` Pali Rohár
2019-10-24 22:19                       ` Chris Murphy
2019-10-24 22:16                     ` Richard Weinberger
2019-10-24 22:26                       ` Chris Murphy
2019-10-24 22:33                         ` Richard Weinberger
2019-10-25  9:22                           ` Chris Murphy
2019-10-25  9:50                             ` Richard Weinberger
2019-10-23 12:53       ` Colin Walters
2019-10-23 14:24         ` Chris Murphy
2019-10-23 17:26           ` Colin Walters

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.