linux-bcachefs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <kent.overstreet@linux.dev>
To: Martin Steigerwald <martin@lichtvoll.de>
Cc: linux-bcachefs@vger.kernel.org
Subject: Re: Questions related to BCacheFS
Date: Sat, 18 Nov 2023 16:07:27 -0500	[thread overview]
Message-ID: <20231118210727.6s7bi3e4lldnrpoj@moria.home.lan> (raw)
In-Reply-To: <2210413.NgBsaNRSFp@lichtvoll.de>

On Sat, Nov 18, 2023 at 09:57:50PM +0100, Martin Steigerwald wrote:
> Hi Kent.
> 
> Thanks for answering so timely. Feel free to skip answering during rest of 
> the weekend :)
> 
> Kent Overstreet - 18.11.23, 20:50:24 CET:
> > > 10) Anything you think an article about BCacheFS should absolutely
> > > mention?
> > 
> > Would personally love to see some non-phoronix benchmarks :)
> 
> I see. Well thing is, I am not really satisfied about Samsung 980 Pro 2 TB 
> NVME SSD performance on this ThinkPad T14 AMD Gen 1 under Linux, so not 
> sure whether performance benchmarks would be suitable on that setup. At 
> least not without going about a firmware upgrade again and hoping it helps 
> this time, if available. However I remember not really liking to dig out 
> the firmware upgrade from an ISO image for Samsung not providing via LVFS. 
> Also benchmarking may more be in scope of a later article if at all, cause 
> I think even with just explaining about BCacheFS the article will become 
> long enough :). It is challenging to get benchmarking right and obtain 
> actually meaningful results. And before getting it wrong, I'd rather skip 
> or delay that. But anyway: Any suggestion for a specific benchmark?
> 
> Any advice about Phoronix benchmarks? I bet the one I saw was with some 
> debug option on, that may better be off. I think it has been: 
> CONFIG_BCACHEFS_DEBUG_TRANSACTIONS? I did not check whether Michael 
> Larabel did a new one already with that turned off.
> 
> As far as I understand one specific performance related aspect of BCacheFS 
> would be low latencies due to the frontend / backend architecture which in 
> principle is based on what has been there in BCache already. I am 
> intending to explore a bit into that concept in my article.

The low latency stuff actually wasn't in bcache - that work came later.

Things like
 - six locks - so we have intent locks that don't block readers, and
   only need to take write locks for the actual btree node update

 - asynchronous interior btree node updates; in bcache when we split a
   node we have to wait for writes to complete before updating the
   parent node, in bcachefs work after IO completion is fully
   asynchronous

 - the big one that no other filesystem has: a 'btree_trans' object that
   tracks all btree locks, and can be unlocked and then relocked when we
   do an operation that might block (at the cost of a potential
   transaction restart at relock() time) - we never have to block with
   btree locks held.

> > I've put a ton of effort into performance, my goal is a COW filesystem
> > that can compete with XFS on performance and scalabality - which is a
> > tall order! but we're getting close.
> > 
> > With the btree write buffer rewrite (still not quite merged, any day
> > now) - I'm pushing _900k_ iops, 4k random writes - through the COW write
> > path.
> > 
> > This is in my benchmarking/profiling mode, with checksums off and data
> > reads/writes to the device turned off - i.e. just showing bcachefs
> > overhead. So not real world nummbers, but indicative of how well we can
> > scale.
> 
> Interesting. Only thing regarding performance I noticed so far that 
> deleting an almost 8 GiB large DVD ISO image file took a bit longer than 
> instant, but I was using Dolphin on Plasma, so not sure whether this tiny 
> delay was filesystem or GUI related.

It could be that we still have work to do; there are plenty of higher
level filesystem operations that I haven't specifically benchmarked. If
you do happen to do a head to head comparison with other filesystems and
find that unlink (or anything else) is slow - please report it!

> Also I found that free space with "df -hT" was only 35,8 GiB initially, 
> now 36 GiB of 40 GiB instead of the initial 37 GiB after making the 
> filesystem, but I bet that may just be related to allocation behavior. 
> Some kind of chunk allocated but not freed again so it can be reused 
> later. But I need to dig into this a bit deeper. I read about some 
> reservation as well, but need to dig that up again.

That's the copygc reserve.

> I'd really love to dig a bit into what makes BCacheFS unique, also in 
> comparison with BTRFS and maybe a bit also ZFS. Also to explain: "Why yet 
> another filesystem?" to the reader :). My own hope is that indeed BCacheFS 
> will improve on some of the performance issues with BTRFS. And also with 
> BCacheFS you can have cache devices which AFAIK is still not implemented 
> for BTRFS. There was VFS Hot Data Tracking + BTRFS part patches on BTRFS 
> mailing list some longer time ago, but AFAIK they never went in.

Performance with more than a few snapshots is a big selling point vs.
btrfs - Dave Chinner did some comparisons awhile back, bcachefs beats
the pants off of btrfs in snapshot scalability :)

  reply	other threads:[~2023-11-18 21:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-18 19:15 Questions related to BCacheFS Martin Steigerwald
2023-11-18 19:50 ` Kent Overstreet
2023-11-18 20:57   ` Martin Steigerwald
2023-11-18 21:07     ` Kent Overstreet [this message]
2023-11-18 23:15       ` Martin Steigerwald
2023-11-18 23:42         ` Kent Overstreet
2023-11-19 11:13           ` Martin Steigerwald
2023-11-19 16:43             ` Martin Steigerwald
2023-11-19 23:10             ` Kent Overstreet
2023-11-20 17:34               ` Martin Steigerwald
2023-12-03 16:58               ` Martin Steigerwald
2023-12-18 16:50                 ` Martin Steigerwald
2023-12-28 22:29     ` deletion time of big files (was: Re: Questions related to BCacheFS) Martin Steigerwald
2023-12-29 18:48       ` Kent Overstreet
2023-12-30 10:51         ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231118210727.6s7bi3e4lldnrpoj@moria.home.lan \
    --to=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=martin@lichtvoll.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).