All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Goffredo Baroncelli <kreijack@inwind.it>
Cc: Graham Cobb <g.btrfs@cobb.uk.net>, linux-btrfs@vger.kernel.org
Subject: Re: [RFC][PATCH V3] btrfs: ssd_metadata: storing metadata on SSD
Date: Mon, 6 Apr 2020 13:40:52 -0400	[thread overview]
Message-ID: <20200406174052.GL2693@hungrycats.org> (raw)
In-Reply-To: <0d598f1a-b86e-ffab-01a2-3eff56b2d3b1@inwind.it>

On Mon, Apr 06, 2020 at 07:33:16PM +0200, Goffredo Baroncelli wrote:
> On 4/6/20 7:21 PM, Zygo Blaxell wrote:
> > On Mon, Apr 06, 2020 at 06:43:04PM +0200, Goffredo Baroncelli wrote:
> > > On 4/6/20 4:24 AM, Zygo Blaxell wrote:
> > > > > > Of course btrfs is slower than ext4 when a lot of sync/flush are involved. Using
> > > > > > apt on a rotational was a dramatic experience. And IMHO  this should be replaced
> > > > > > by using the btrfs snapshot capabilities. But this is another (not easy) story.
> > > > flushoncommit and eatmydata work reasonably well...once you patch out the
> > > > noise warnings from fs-writeback.
> > > > 
> > > 
> > > You wrote flushoncommit, but did you mean "noflushoncommit" ?
> > 
> > No.  "noflushoncommit" means applications have to call fsync() all the
> > time, or their files get trashed on a crash.  I meant flushoncommit
> > and eatmydata.
> 
> It is a tristate value (default, flushoncommit, noflushoncommit), or
> flushoncommit IS the default ?

noflushoncommit is the default.  flushoncommit is sort of terrible--it
used to have deadlock bugs up to 4.15, and spams the kernel log with
warnings since 4.15.

> > While dpkg runs, it must never call fsync, or it breaks the write
> > ordering provided by flushoncommit (or you have to zero-log on boot).
> > btrfs effectively does a point-in-time snapshot at every commit interval.
> > dpkg's ordering of write operations and renames does the rest.
> > 
> > dpkg runs much faster, so the window for interruption is smaller, and
> > if it is interrupted, then the result is more or less the same as if
> > you had run with fsync() on noflushoncommit.  The difference is that
> > the filesystem might roll back to an earlier state after a crash, which
> > could be a problem e.g. if your maintainer scripts are manipulating data
> > on multiple filesystems.
> > 
> > 
> > > Regarding eatmydata, I used it too. However I was never happy. Below my script:
> > > ----------------------------------
> > > ghigo@venice:/etc/apt/apt.conf.d$ cat 10btrfs.conf
> > > 
> > > DPkg::Pre-Invoke {"bash /var/btrfs/btrfs-apt.sh snapshot";};
> > > DPkg::Post-Invoke {"bash /var/btrfs/btrfs-apt.sh clean";};
> > > Dpkg::options {"--force-unsafe-io";};
> > > ---------------------------------
> > > ghigo@venice:/etc/apt/apt.conf.d$ cat /var/btrfs/btrfs-apt.sh
> > > 
> > > btrfsroot=/var/btrfs/debian
> > > btrfsrollback=/var/btrfs/debian-rollback
> > > 
> > > 
> > > do_snapshot() {
> > > 	if [ -d "$btrfsrollback" ]; then
> > > 		btrfs subvolume delete "$btrfsrollback"
> > > 	fi
> > > 
> > > 	i=20
> > > 	while [ $i -gt 0 -a -d "$btrfsrollback" ]; do
> > > 		i=$(( $i + 1 ))
> > > 		sleep 0.1
> > > 	done
> > > 	if [ $i -eq 0 ]; then
> > > 		exit 100
> > > 	fi
> > > 
> > > 	btrfs subvolume snapshot "$btrfsroot" "$btrfsrollback"
> > > 	
> > > }
> > > 
> > > do_removerollback() {
> > > 	if [ -d "$btrfsrollback" ]; then
> > > 		btrfs subvolume delete "$btrfsrollback"
> > > 	fi
> > > }
> > > 
> > > if [ "$1" = "snapshot" ]; then
> > > 	do_snapshot
> > > elif [ "$1" = "clean" ]; then
> > > 	do_removerollback
> > > else
> > > 	echo "usage: $0  snapshot|clean"
> > > fi
> > > --------------------------------------------------------------
> > > 
> > > Suggestion are welcome how detect automatically where is mount the
> > > btrfs root (subvolume=/) and  my root subvolume name (debian in my
> > > case). So I will avoid to wrote directly in my script.
> > 
> > You can figure out where "/" is within a btrfs filesystem by recusively
> > looking up parent subvol IDs with TREE_SEARCH_V2 until you get to 5
> > FS_ROOT (sort of like the way pwd works on traditional Unix); however,
> > root can be a bind mount, so "path from fs_root to /" is not guaranteed
> > to end at a subvol root.
> 
> May be an use case for a new ioctl :-) ? Snapshot a subvolume without
> mounting the root subvolume....

That would make access control mechanisms like chroot...challenging.
;)  But I hear we have a delete-by-id ioctl now, so might as well have
snap-by-id too.

> > Also, sometimes people put /var on its own subvol, so you'd need to
> > find "the set of all subvols relevant to dpkg" and that's definitely
> > not trivial in the general case.
> 
> I know that a general rule it is not easy. Anyway I also would put /boot
> and /home in a dedicated subvolume.
> If the "roolback" is done at boot, /boot should be an invariant...
> However I think that there are a lot of corner case even here (what happens
> if the boot kernel doesn't have modules in the root subvolume ?)
> 
> It is not an easy job. It must be performed at distribution level...
> 
> > 
> > It's not as easy to figure out if there's an existing fs_root mount
> > point (partly because namespacing mangles every path in /proc/mounts
> > and mountinfo), but if you know the btrfs device (and can access it
> > from your namespace) you can just mount it somewhere and then you do
> > know where it is.
> 
> I agree, looking from root to the "root device", then mount the
> root subvolume in a know place, where it is possible to snapshot
> the root subvolume.
> 
> > 
> > > BR
> > > G.Baroncelli
> > > -- 
> > > gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> > > Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> > > 
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> 

  reply	other threads:[~2020-04-06 17:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-05  8:26 [RFC][PATCH V3] btrfs: ssd_metadata: storing metadata on SSD Goffredo Baroncelli
2020-04-05  8:26 ` [PATCH] btrfs: add ssd_metadata mode Goffredo Baroncelli
2020-04-14  5:24   ` Paul Jones
2020-10-23  7:23   ` Wang Yugui
2020-10-23 10:11     ` Adam Borowski
2020-10-23 11:25       ` Qu Wenruo
2020-10-23 12:37         ` Wang Yugui
2020-10-23 12:45           ` Qu Wenruo
2020-10-23 13:10           ` Steven Davies
2020-10-23 13:49             ` Wang Yugui
2020-10-23 18:03           ` Goffredo Baroncelli
2020-10-24  3:26             ` Paul Jones
2020-04-05 10:57 ` [RFC][PATCH V3] btrfs: ssd_metadata: storing metadata on SSD Graham Cobb
2020-04-05 18:47   ` Goffredo Baroncelli
2020-04-05 21:58     ` Adam Borowski
2020-04-06  2:24   ` Zygo Blaxell
2020-04-06 16:43     ` Goffredo Baroncelli
2020-04-06 17:21       ` Zygo Blaxell
2020-04-06 17:33         ` Goffredo Baroncelli
2020-04-06 17:40           ` Zygo Blaxell [this message]
2020-05-29 16:06 ` Hans van Kranenburg
2020-05-29 16:40   ` Goffredo Baroncelli
2020-05-29 18:37     ` Hans van Kranenburg
2020-05-30  4:59 ` Qu Wenruo
2020-05-30  6:48   ` Goffredo Baroncelli
2020-05-30  8:57     ` Paul Jones
2020-04-27 15:06 Torstein Eide
2020-04-28 19:31 ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200406174052.GL2693@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=g.btrfs@cobb.uk.net \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.