From: Kent Overstreet <kent.overstreet@linux.dev>
To: Martin Steigerwald <martin@lichtvoll.de>
Cc: linux-bcachefs@vger.kernel.org
Subject: Re: Questions related to BCacheFS
Date: Sat, 18 Nov 2023 16:07:27 -0500 [thread overview]
Message-ID: <20231118210727.6s7bi3e4lldnrpoj@moria.home.lan> (raw)
In-Reply-To: <2210413.NgBsaNRSFp@lichtvoll.de>
On Sat, Nov 18, 2023 at 09:57:50PM +0100, Martin Steigerwald wrote:
> Hi Kent.
>
> Thanks for answering so timely. Feel free to skip answering during rest of
> the weekend :)
>
> Kent Overstreet - 18.11.23, 20:50:24 CET:
> > > 10) Anything you think an article about BCacheFS should absolutely
> > > mention?
> >
> > Would personally love to see some non-phoronix benchmarks :)
>
> I see. Well thing is, I am not really satisfied about Samsung 980 Pro 2 TB
> NVME SSD performance on this ThinkPad T14 AMD Gen 1 under Linux, so not
> sure whether performance benchmarks would be suitable on that setup. At
> least not without going about a firmware upgrade again and hoping it helps
> this time, if available. However I remember not really liking to dig out
> the firmware upgrade from an ISO image for Samsung not providing via LVFS.
> Also benchmarking may more be in scope of a later article if at all, cause
> I think even with just explaining about BCacheFS the article will become
> long enough :). It is challenging to get benchmarking right and obtain
> actually meaningful results. And before getting it wrong, I'd rather skip
> or delay that. But anyway: Any suggestion for a specific benchmark?
>
> Any advice about Phoronix benchmarks? I bet the one I saw was with some
> debug option on, that may better be off. I think it has been:
> CONFIG_BCACHEFS_DEBUG_TRANSACTIONS? I did not check whether Michael
> Larabel did a new one already with that turned off.
>
> As far as I understand one specific performance related aspect of BCacheFS
> would be low latencies due to the frontend / backend architecture which in
> principle is based on what has been there in BCache already. I am
> intending to explore a bit into that concept in my article.
The low latency stuff actually wasn't in bcache - that work came later.
Things like
- six locks - so we have intent locks that don't block readers, and
only need to take write locks for the actual btree node update
- asynchronous interior btree node updates; in bcache when we split a
node we have to wait for writes to complete before updating the
parent node, in bcachefs work after IO completion is fully
asynchronous
- the big one that no other filesystem has: a 'btree_trans' object that
tracks all btree locks, and can be unlocked and then relocked when we
do an operation that might block (at the cost of a potential
transaction restart at relock() time) - we never have to block with
btree locks held.
> > I've put a ton of effort into performance, my goal is a COW filesystem
> > that can compete with XFS on performance and scalabality - which is a
> > tall order! but we're getting close.
> >
> > With the btree write buffer rewrite (still not quite merged, any day
> > now) - I'm pushing _900k_ iops, 4k random writes - through the COW write
> > path.
> >
> > This is in my benchmarking/profiling mode, with checksums off and data
> > reads/writes to the device turned off - i.e. just showing bcachefs
> > overhead. So not real world nummbers, but indicative of how well we can
> > scale.
>
> Interesting. Only thing regarding performance I noticed so far that
> deleting an almost 8 GiB large DVD ISO image file took a bit longer than
> instant, but I was using Dolphin on Plasma, so not sure whether this tiny
> delay was filesystem or GUI related.
It could be that we still have work to do; there are plenty of higher
level filesystem operations that I haven't specifically benchmarked. If
you do happen to do a head to head comparison with other filesystems and
find that unlink (or anything else) is slow - please report it!
> Also I found that free space with "df -hT" was only 35,8 GiB initially,
> now 36 GiB of 40 GiB instead of the initial 37 GiB after making the
> filesystem, but I bet that may just be related to allocation behavior.
> Some kind of chunk allocated but not freed again so it can be reused
> later. But I need to dig into this a bit deeper. I read about some
> reservation as well, but need to dig that up again.
That's the copygc reserve.
> I'd really love to dig a bit into what makes BCacheFS unique, also in
> comparison with BTRFS and maybe a bit also ZFS. Also to explain: "Why yet
> another filesystem?" to the reader :). My own hope is that indeed BCacheFS
> will improve on some of the performance issues with BTRFS. And also with
> BCacheFS you can have cache devices which AFAIK is still not implemented
> for BTRFS. There was VFS Hot Data Tracking + BTRFS part patches on BTRFS
> mailing list some longer time ago, but AFAIK they never went in.
Performance with more than a few snapshots is a big selling point vs.
btrfs - Dave Chinner did some comparisons awhile back, bcachefs beats
the pants off of btrfs in snapshot scalability :)
next prev parent reply other threads:[~2023-11-18 21:07 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-18 19:15 Questions related to BCacheFS Martin Steigerwald
2023-11-18 19:50 ` Kent Overstreet
2023-11-18 20:57 ` Martin Steigerwald
2023-11-18 21:07 ` Kent Overstreet [this message]
2023-11-18 23:15 ` Martin Steigerwald
2023-11-18 23:42 ` Kent Overstreet
2023-11-19 11:13 ` Martin Steigerwald
2023-11-19 16:43 ` Martin Steigerwald
2023-11-19 23:10 ` Kent Overstreet
2023-11-20 17:34 ` Martin Steigerwald
2023-12-03 16:58 ` Martin Steigerwald
2023-12-18 16:50 ` Martin Steigerwald
2023-12-28 22:29 ` deletion time of big files (was: Re: Questions related to BCacheFS) Martin Steigerwald
2023-12-29 18:48 ` Kent Overstreet
2023-12-30 10:51 ` Martin Steigerwald
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231118210727.6s7bi3e4lldnrpoj@moria.home.lan \
--to=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=martin@lichtvoll.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).