linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: David Sterba <dsterba@suse.cz>,
	Chris Murphy <lists@colorremedies.com>,
	Andrei Borzenkov <arvidjaar@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Does GRUB btrfs support log tree?
Date: Mon, 11 Nov 2019 19:37:08 +0000	[thread overview]
Message-ID: <CAJCQCtSiDQA4919YDTyQkW7jPkxMds1K32ym=HgO6KHQLzHw+w@mail.gmail.com> (raw)
In-Reply-To: <20191104193454.GD3001@twin.jikos.cz>

On Mon, Nov 4, 2019 at 7:34 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Sun, Oct 27, 2019 at 09:05:54PM +0100, Chris Murphy wrote:
> > > > Since log tree writes means a
> > > > full file system update hasn't happened, the old file system state
> > > > hasn't been dereferenced, so even in an SSD + discard case, the system
> > > > should still be bootable. And at that point Btrfs kernel code does log
> > > > replay, and catches the system up, and the next update will boot the
> > > > new state.
> > > >
> > > > Correct?
> > > >
> > >
> > > Yes. If we speak about grub here, it actually tries very hard to ensure
> > > writes has hit disk (it fsyncs files as it writes them and it flushes
> > > raw devices). But I guess that fsync on btrfs just goes into log and
> > > does not force transaction. Is it possible to force transaction on btrfs
> > > from user space?
>
> * sync/syncfs
> * the ioctl BTRFS_IOC_SYNC (calls syncfs)
> * ioctls BTRFS_IOC_START_SYNC + BTRFS_IOC_WAIT_SYNC
>
> > The only fsync I ever see Fedora's grub2-mkconfig do is for grubenv.
> > The grub.cfg is not fsync'd. When I do a strace of grub2-mkconfig,
> > it's so incredibly complicated. Using -ff -o options, I get over 1800
> > separate PID files exported. From what I can tell, it creates a brand
> > new file "grub.cfg.new" and writes to that. Then does a cat from
> > "grub.cfg.new" into "grub.cfg" - maybe it's file system specific
> > behavior, I'm not sure.
> >
> > I'm pretty sure "sync" will do what you want, it calls syncfs() and
> > best as I can tell it does a full file system sync, doesn't use the
> > log tree. I'd argue grub-mkconfig should write all of its files, and
> > then sync that file system, rather than doing any fsync at all.
>
> This would work in most cases. I'm not sure, but the update does not
> seem to be atomic. Ie. all old kernels match the old grub.cfg, or there
> are new kernels that match the new cfg.
>
> Even if there's not fsyncs and just the final sync, some other activity
> in the filesystem can do the sync before between updates of kernels and
> grub.cfg. Like this
>
> start:
>
> - kernel1
> - grub.cfg (v1)
>
> update:
>
> - add kernel2
> - remove kernel1
> - <something calls sync>
> - update grub.cfg (v2)
> - grub calls sync
>
> If the crash happens after sync and before update, kernel1 won't be
> reachable and kernel2 won't be in the grub.cfg.

Right. It's probably a bad practice to remove the fallback kernel,
which would be variably defined depending on the distribution, unless
the method of updating the kernel is atomic by design, proven by
testing.

In the single kernel case it could be done atomically with generic
filenaming, i.e. vmlinuz and initramfs, no versioning in the filename,
and a static bootloader configuration that's never updated, only ever
looks for vmlinuz and initramfs. The update would write out
vmlinuz.new and initramfs.new, and then sync. And then rename()
vmlinuz.new vmlinuz, and initramfs.new initramfs. Since it's two
files, it's not strictly atomic, likely more than one sector changes.
But it might be good enough?

I'm not really sure what the best practice is though. I asked about
this in a UEFI, EFI System partitioning (and thus FAT) context and it
seems like there really aren't any atomicity guarantees possible at
all which is a bit troubling. About the only way to do it is like on
Android with an A and B partition for the kernel and initramfs as
blobs, rather than being stored on file systems, and then indicate A
vs B by setting a partition attribute to indicate to the bootloader A
vs B priority with the other being fallback.

Anyway, the lack of a generic (file system independent) way to handle
this use case is actually a bit concerning.

-- 
Chris Murphy

  reply	other threads:[~2019-11-11 19:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-25  9:47 Does GRUB btrfs support log tree? Chris Murphy
2019-10-25  9:50 ` Chris Murphy
2019-10-26  7:12 ` Andrei Borzenkov
2019-10-27 20:05   ` Chris Murphy
2019-11-04 19:34     ` David Sterba
2019-11-11 19:37       ` Chris Murphy [this message]
2019-11-12 20:04         ` Goffredo Baroncelli
2019-11-13 17:00           ` Chris Murphy
2019-11-13 18:54             ` Goffredo Baroncelli
2019-11-13 21:50               ` Chris Murphy
2019-11-14  8:18                 ` Andrei Borzenkov
2019-11-17 23:24                   ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJCQCtSiDQA4919YDTyQkW7jPkxMds1K32ym=HgO6KHQLzHw+w@mail.gmail.com' \
    --to=lists@colorremedies.com \
    --cc=arvidjaar@gmail.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).