linux-bcachefs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Questions related to BCacheFS
@ 2023-11-18 19:15 Martin Steigerwald
  2023-11-18 19:50 ` Kent Overstreet
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Steigerwald @ 2023-11-18 19:15 UTC (permalink / raw)
  To: linux-bcachefs

Hi!

Awesome that BCacheFS is finally merged! Many thanks to everyone who made 
this happen. I appreciate it!

I am writing an article about BCacheFS. I am willing to provide a link 
once it is published. It will be in German language.

I do have a few questions:

1) Is discard supported? fstrim says it is not. However /sys/fs/bcachefs/
UUID/options/discard shows "1". BCacheFS User manual Principles of 
Operations mentions it at a device option. I am not completely sure how 
these work. Auto-detected and just IOCTL for fstrim missing?

2) What are the plans for scrubbing? Right now it is not yet implemented, 
right?

3) Is the documentation of mount and other options in
https://bcachefs.org/bcachefs-principles-of-operation.pdf complete? If 
not, care to elaborate what is missing?

4) What are the plans or ideas for documentation? I specially ask as there 
does not seem to be a manpage like mount.bcachefs or mkfs.bcachefs yet. 
There is no mention of bcachefs in mount manpage either. And no bcachefs 
manpage in section 5 like with btrfs or xfs. There is a bcachefs manpage 
in section 8 which for example for a complete list of mount options refers 
to above Principles of Operation user manual. And it has information on 
"bcachefs format" and some other sub commands. I bet it is still too early 
or maybe you have different plans on how to go about documentation. 
Anything you can share already regarding this?

5) Is the feature implementation status on bcachefs.org up-to-date? How 
about the one in Principles of Operation user manual? Is any of these more 
up-to-date? If anything is missing from these, care to elaborate?

6) What is the status for xxhash checksums? They are mentioned as an 
option in the output of "bcachefs format". Yet no mention of it in 
bcachefs manpage nor in Principles of Operation user manual.

7) On mounting BCacheFS without compression enabled on 6.7-rc1, shortly 
before rc2, commit 791c8ab095f71327899023223940dd52257a4173 also LZ4 
compression modules  lz4hc_compress and lz4_compress are loaded. Why?

8) Regarding bcachefs-tools. More out of curiosity, cause there is already 
a bcachefs-tools package in Debian repo, albeit only version 1.2. I see a 
"debian" directory, however version number is 1.0.8-2~bpo8+1 while 
compiling via make gives version 1.33. So I suppose packaging information 
is not up to date? For now I am going with "make install" from bcachefs-
tools git repo, as package in Debian repo is outdated.

9) What is the preferred way to report bugs? Mailing list? Kernel bug 
tracker? Both? Anything else?

10) Anything you think an article about BCacheFS should absolutely 
mention?

There may be more at a later time. :)

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-18 19:15 Questions related to BCacheFS Martin Steigerwald
@ 2023-11-18 19:50 ` Kent Overstreet
  2023-11-18 20:57   ` Martin Steigerwald
  0 siblings, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2023-11-18 19:50 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-bcachefs

On Sat, Nov 18, 2023 at 08:15:50PM +0100, Martin Steigerwald wrote:
> Hi!
> 
> Awesome that BCacheFS is finally merged! Many thanks to everyone who made 
> this happen. I appreciate it!
> 
> I am writing an article about BCacheFS. I am willing to provide a link 
> once it is published. It will be in German language.
> 
> I do have a few questions:
> 
> 1) Is discard supported? fstrim says it is not. However /sys/fs/bcachefs/
> UUID/options/discard shows "1". BCacheFS User manual Principles of 
> Operations mentions it at a device option. I am not completely sure how 
> these work. Auto-detected and just IOCTL for fstrim missing?

Yes, it's supported. There's no need for fstrim support because we
discard buckets as soon as they become empty.

> 
> 2) What are the plans for scrubbing? Right now it is not yet implemented, 
> right?

Yes, it's very much planned.

> 3) Is the documentation of mount and other options in
> https://bcachefs.org/bcachefs-principles-of-operation.pdf complete? If 
> not, care to elaborate what is missing?

The master option list is here:
https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/opts.h#n122

A few of these are hidden; OPT_FORMAT and OPT_MOUNT options are the
options you'll be looking for - and OPT_DEVICE for device specific
options.

> 4) What are the plans or ideas for documentation? I specially ask as there 
> does not seem to be a manpage like mount.bcachefs or mkfs.bcachefs yet. 
> There is no mention of bcachefs in mount manpage either. And no bcachefs 
> manpage in section 5 like with btrfs or xfs. There is a bcachefs manpage 
> in section 8 which for example for a complete list of mount options refers 
> to above Principles of Operation user manual. And it has information on 
> "bcachefs format" and some other sub commands. I bet it is still too early 
> or maybe you have different plans on how to go about documentation. 
> Anything you can share already regarding this?

I'm entirely too busy with just writing code - I'd love to have more
time for documentation, but it's hard :) But, there are people starting
to contribute to the man pages, so I expect that will improve.

> 5) Is the feature implementation status on bcachefs.org up-to-date? How 
> about the one in Principles of Operation user manual? Is any of these more 
> up-to-date? If anything is missing from these, care to elaborate?

Reasonably up to date, yes. The main areas that still need work and
testing are snapshots and erasure coding; with snapshots it's looking
like just minor bugs are left and fleshing out features, erasure coding
is improving but still needs quite a bit of work.

> 6) What is the status for xxhash checksums? They are mentioned as an 
> option in the output of "bcachefs format". Yet no mention of it in 
> bcachefs manpage nor in Principles of Operation user manual.

I initially had concerns about whether that code was actually solid - I
think it's been resolved; I'll just want to hear some positive feedback
from people using it before I add it to the documentation.

> 7) On mounting BCacheFS without compression enabled on 6.7-rc1, shortly 
> before rc2, commit 791c8ab095f71327899023223940dd52257a4173 also LZ4 
> compression modules  lz4hc_compress and lz4_compress are loaded. Why?

We just add hard dependencies on the compression modules because a) the
crypto interface (that lets you use them as runtime dependencies)
_sucks_ and the lz4 modules at least are pretty small. zstd is bigger
though, so making these runtime dependencies would be a worthwhile
enhancement for anyone who's interested.

> 8) Regarding bcachefs-tools. More out of curiosity, cause there is already 
> a bcachefs-tools package in Debian repo, albeit only version 1.2. I see a 
> "debian" directory, however version number is 1.0.8-2~bpo8+1 while 
> compiling via make gives version 1.33. So I suppose packaging information 
> is not up to date? For now I am going with "make install" from bcachefs-
> tools git repo, as package in Debian repo is outdated.

Yeah I'm in contact with the debian maintainer, it should be updated
soon.

> 9) What is the preferred way to report bugs? Mailing list? Kernel bug 
> tracker? Both? Anything else?

Mailing list is good, or the github bugtrackers (that really should be
linked on the website:

https://github.com/koverstreet/bcachefs/issues/
https://github.com/koverstreet/bcachefs-tools/issues/

> 10) Anything you think an article about BCacheFS should absolutely 
> mention?

Would personally love to see some non-phoronix benchmarks :)

I've put a ton of effort into performance, my goal is a COW filesystem
that can compete with XFS on performance and scalabality - which is a
tall order! but we're getting close.

With the btree write buffer rewrite (still not quite merged, any day
now) - I'm pushing _900k_ iops, 4k random writes - through the COW write
path.

This is in my benchmarking/profiling mode, with checksums off and data
reads/writes to the device turned off - i.e. just showing bcachefs
overhead. So not real world nummbers, but indicative of how well we can
scale.

Cheers,
Kent

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-18 19:50 ` Kent Overstreet
@ 2023-11-18 20:57   ` Martin Steigerwald
  2023-11-18 21:07     ` Kent Overstreet
  2023-12-28 22:29     ` deletion time of big files (was: Re: Questions related to BCacheFS) Martin Steigerwald
  0 siblings, 2 replies; 15+ messages in thread
From: Martin Steigerwald @ 2023-11-18 20:57 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Hi Kent.

Thanks for answering so timely. Feel free to skip answering during rest of 
the weekend :)

Kent Overstreet - 18.11.23, 20:50:24 CET:
> > 10) Anything you think an article about BCacheFS should absolutely
> > mention?
> 
> Would personally love to see some non-phoronix benchmarks :)

I see. Well thing is, I am not really satisfied about Samsung 980 Pro 2 TB 
NVME SSD performance on this ThinkPad T14 AMD Gen 1 under Linux, so not 
sure whether performance benchmarks would be suitable on that setup. At 
least not without going about a firmware upgrade again and hoping it helps 
this time, if available. However I remember not really liking to dig out 
the firmware upgrade from an ISO image for Samsung not providing via LVFS. 
Also benchmarking may more be in scope of a later article if at all, cause 
I think even with just explaining about BCacheFS the article will become 
long enough :). It is challenging to get benchmarking right and obtain 
actually meaningful results. And before getting it wrong, I'd rather skip 
or delay that. But anyway: Any suggestion for a specific benchmark?

Any advice about Phoronix benchmarks? I bet the one I saw was with some 
debug option on, that may better be off. I think it has been: 
CONFIG_BCACHEFS_DEBUG_TRANSACTIONS? I did not check whether Michael 
Larabel did a new one already with that turned off.

As far as I understand one specific performance related aspect of BCacheFS 
would be low latencies due to the frontend / backend architecture which in 
principle is based on what has been there in BCache already. I am 
intending to explore a bit into that concept in my article.

> I've put a ton of effort into performance, my goal is a COW filesystem
> that can compete with XFS on performance and scalabality - which is a
> tall order! but we're getting close.
> 
> With the btree write buffer rewrite (still not quite merged, any day
> now) - I'm pushing _900k_ iops, 4k random writes - through the COW write
> path.
> 
> This is in my benchmarking/profiling mode, with checksums off and data
> reads/writes to the device turned off - i.e. just showing bcachefs
> overhead. So not real world nummbers, but indicative of how well we can
> scale.

Interesting. Only thing regarding performance I noticed so far that 
deleting an almost 8 GiB large DVD ISO image file took a bit longer than 
instant, but I was using Dolphin on Plasma, so not sure whether this tiny 
delay was filesystem or GUI related.

Also I found that free space with "df -hT" was only 35,8 GiB initially, 
now 36 GiB of 40 GiB instead of the initial 37 GiB after making the 
filesystem, but I bet that may just be related to allocation behavior. 
Some kind of chunk allocated but not freed again so it can be reused 
later. But I need to dig into this a bit deeper. I read about some 
reservation as well, but need to dig that up again.

I'd really love to dig a bit into what makes BCacheFS unique, also in 
comparison with BTRFS and maybe a bit also ZFS. Also to explain: "Why yet 
another filesystem?" to the reader :). My own hope is that indeed BCacheFS 
will improve on some of the performance issues with BTRFS. And also with 
BCacheFS you can have cache devices which AFAIK is still not implemented 
for BTRFS. There was VFS Hot Data Tracking + BTRFS part patches on BTRFS 
mailing list some longer time ago, but AFAIK they never went in.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-18 20:57   ` Martin Steigerwald
@ 2023-11-18 21:07     ` Kent Overstreet
  2023-11-18 23:15       ` Martin Steigerwald
  2023-12-28 22:29     ` deletion time of big files (was: Re: Questions related to BCacheFS) Martin Steigerwald
  1 sibling, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2023-11-18 21:07 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-bcachefs

On Sat, Nov 18, 2023 at 09:57:50PM +0100, Martin Steigerwald wrote:
> Hi Kent.
> 
> Thanks for answering so timely. Feel free to skip answering during rest of 
> the weekend :)
> 
> Kent Overstreet - 18.11.23, 20:50:24 CET:
> > > 10) Anything you think an article about BCacheFS should absolutely
> > > mention?
> > 
> > Would personally love to see some non-phoronix benchmarks :)
> 
> I see. Well thing is, I am not really satisfied about Samsung 980 Pro 2 TB 
> NVME SSD performance on this ThinkPad T14 AMD Gen 1 under Linux, so not 
> sure whether performance benchmarks would be suitable on that setup. At 
> least not without going about a firmware upgrade again and hoping it helps 
> this time, if available. However I remember not really liking to dig out 
> the firmware upgrade from an ISO image for Samsung not providing via LVFS. 
> Also benchmarking may more be in scope of a later article if at all, cause 
> I think even with just explaining about BCacheFS the article will become 
> long enough :). It is challenging to get benchmarking right and obtain 
> actually meaningful results. And before getting it wrong, I'd rather skip 
> or delay that. But anyway: Any suggestion for a specific benchmark?
> 
> Any advice about Phoronix benchmarks? I bet the one I saw was with some 
> debug option on, that may better be off. I think it has been: 
> CONFIG_BCACHEFS_DEBUG_TRANSACTIONS? I did not check whether Michael 
> Larabel did a new one already with that turned off.
> 
> As far as I understand one specific performance related aspect of BCacheFS 
> would be low latencies due to the frontend / backend architecture which in 
> principle is based on what has been there in BCache already. I am 
> intending to explore a bit into that concept in my article.

The low latency stuff actually wasn't in bcache - that work came later.

Things like
 - six locks - so we have intent locks that don't block readers, and
   only need to take write locks for the actual btree node update

 - asynchronous interior btree node updates; in bcache when we split a
   node we have to wait for writes to complete before updating the
   parent node, in bcachefs work after IO completion is fully
   asynchronous

 - the big one that no other filesystem has: a 'btree_trans' object that
   tracks all btree locks, and can be unlocked and then relocked when we
   do an operation that might block (at the cost of a potential
   transaction restart at relock() time) - we never have to block with
   btree locks held.

> > I've put a ton of effort into performance, my goal is a COW filesystem
> > that can compete with XFS on performance and scalabality - which is a
> > tall order! but we're getting close.
> > 
> > With the btree write buffer rewrite (still not quite merged, any day
> > now) - I'm pushing _900k_ iops, 4k random writes - through the COW write
> > path.
> > 
> > This is in my benchmarking/profiling mode, with checksums off and data
> > reads/writes to the device turned off - i.e. just showing bcachefs
> > overhead. So not real world nummbers, but indicative of how well we can
> > scale.
> 
> Interesting. Only thing regarding performance I noticed so far that 
> deleting an almost 8 GiB large DVD ISO image file took a bit longer than 
> instant, but I was using Dolphin on Plasma, so not sure whether this tiny 
> delay was filesystem or GUI related.

It could be that we still have work to do; there are plenty of higher
level filesystem operations that I haven't specifically benchmarked. If
you do happen to do a head to head comparison with other filesystems and
find that unlink (or anything else) is slow - please report it!

> Also I found that free space with "df -hT" was only 35,8 GiB initially, 
> now 36 GiB of 40 GiB instead of the initial 37 GiB after making the 
> filesystem, but I bet that may just be related to allocation behavior. 
> Some kind of chunk allocated but not freed again so it can be reused 
> later. But I need to dig into this a bit deeper. I read about some 
> reservation as well, but need to dig that up again.

That's the copygc reserve.

> I'd really love to dig a bit into what makes BCacheFS unique, also in 
> comparison with BTRFS and maybe a bit also ZFS. Also to explain: "Why yet 
> another filesystem?" to the reader :). My own hope is that indeed BCacheFS 
> will improve on some of the performance issues with BTRFS. And also with 
> BCacheFS you can have cache devices which AFAIK is still not implemented 
> for BTRFS. There was VFS Hot Data Tracking + BTRFS part patches on BTRFS 
> mailing list some longer time ago, but AFAIK they never went in.

Performance with more than a few snapshots is a big selling point vs.
btrfs - Dave Chinner did some comparisons awhile back, bcachefs beats
the pants off of btrfs in snapshot scalability :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-18 21:07     ` Kent Overstreet
@ 2023-11-18 23:15       ` Martin Steigerwald
  2023-11-18 23:42         ` Kent Overstreet
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Steigerwald @ 2023-11-18 23:15 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Thanks again, Kent.

Kent Overstreet - 18.11.23, 22:07:27 CET:
> > As far as I understand one specific performance related aspect of
> > BCacheFS would be low latencies due to the frontend / backend
> > architecture which in principle is based on what has been there in
> > BCache already. I am intending to explore a bit into that concept in
> > my article.
> 
> The low latency stuff actually wasn't in bcache - that work came later.

So the frontend / backend architecture is not that much of what makes 
BCacheFS unique? Important to know as it seems I may have misunderstood 
something here.

I may need to shift the approach to my article a bit then. Good that I 
asked early on.

-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-18 23:15       ` Martin Steigerwald
@ 2023-11-18 23:42         ` Kent Overstreet
  2023-11-19 11:13           ` Martin Steigerwald
  0 siblings, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2023-11-18 23:42 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-bcachefs

On Sun, Nov 19, 2023 at 12:15:19AM +0100, Martin Steigerwald wrote:
> Thanks again, Kent.
> 
> Kent Overstreet - 18.11.23, 22:07:27 CET:
> > > As far as I understand one specific performance related aspect of
> > > BCacheFS would be low latencies due to the frontend / backend
> > > architecture which in principle is based on what has been there in
> > > BCache already. I am intending to explore a bit into that concept in
> > > my article.
> > 
> > The low latency stuff actually wasn't in bcache - that work came later.
> 
> So the frontend / backend architecture is not that much of what makes 
> BCacheFS unique? Important to know as it seems I may have misunderstood 
> something here.

The "filesystem on top of a database" is the main thing that makes
bcachefs unique - you have that right.

bcache had much of the core btree design - log structured btree nodes
with eytzinger search trees; that's how we got a high enough performance
btree to make the "filesystem on top of a database" thing practical.

But the btree in bcache was, from a performance POV, prototype quality -
stable, but a lot of performance corner cases unfinished.

The latency work, real iterators, and the whole transaction layer came
later :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-18 23:42         ` Kent Overstreet
@ 2023-11-19 11:13           ` Martin Steigerwald
  2023-11-19 16:43             ` Martin Steigerwald
  2023-11-19 23:10             ` Kent Overstreet
  0 siblings, 2 replies; 15+ messages in thread
From: Martin Steigerwald @ 2023-11-19 11:13 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Take your time and have a great Sunday :)

Kent Overstreet - 19.11.23, 00:42:05 CET:
> On Sun, Nov 19, 2023 at 12:15:19AM +0100, Martin Steigerwald wrote:
> > Kent Overstreet - 18.11.23, 22:07:27 CET:
> > > > As far as I understand one specific performance related aspect of
> > > > BCacheFS would be low latencies due to the frontend / backend
> > > > architecture which in principle is based on what has been there in
> > > > BCache already. I am intending to explore a bit into that concept
> > > > in
> > > > my article.
> > > 
> > > The low latency stuff actually wasn't in bcache - that work came
> > > later.
> > 
> > So the frontend / backend architecture is not that much of what makes
> > BCacheFS unique? Important to know as it seems I may have
> > misunderstood
> > something here.
> 
> The "filesystem on top of a database" is the main thing that makes
> bcachefs unique - you have that right.

Phew! Seems I did not get it completely wrong then :)

> bcache had much of the core btree design - log structured btree nodes
> with eytzinger search trees; that's how we got a high enough performance
> btree to make the "filesystem on top of a database" thing practical.
> 
> But the btree in bcache was, from a performance POV, prototype quality -
> stable, but a lot of performance corner cases unfinished.
> 
> The latency work, real iterators, and the whole transaction layer came
> later :)

So it is fair to say that being based on BCache enabled that low latency 
work?

I am trying to find a balance here. The audience of the article are 
experienced sys admins working in small and large organizations, but not 
(primarily) kernel hackers. So I like to explain possible reasons to 
consider BCacheFS while also having BTRFS and ZFS and XFS, but without 
going into so much detail that one needs to be a kernel developer to 
understand it. For XFS it is easy as while there was a proof of concept 
with subvolumes and snapshots based on XFS in files on XFS, AFAIR from 
Dave Chinner, some years back, it does not have those advanced features 
(yet). For ZFS one can always argue it is not in the mainline kernel. But 
regarding BTRFS it becomes important to really explain something. I 
started to do some BCacheFS / BTRFS feature comparison chart, but I also 
like to explain on the benefits BCacheFS can have in the text and a bit 
about the background of those benefits. Of course also mentioning that 
BCacheFS is in development for more than 10 years, even without taking all 
the work for BCache itself into account.

So my idea currently is to explain that the BCache BTree and/or frontend/
backend architecture, not sure how to best word it, enabled a database 
approach to a filesystem to be feasible. And that in a sense it also 
enabled the latency work. And I can mention sixlocks and all the nice 
other stuff you mentioned. However… I may not explain exactly what those 
nice things are and how they work for example. For two reasons: 1. I need 
to understand those nice features myself, 2. limit of pages I may use and 
scope of the article. Currently I understood that sixlocks for certain use 
cases are the next best thing since the invention of a wheel or something 
like that. But not much more :)

Would something like that be accurate enough in your opinion?

I will review the user manual once again, I read about the database 
approach, and aim to find a good balance. Cause for certain I won't be 
getting 24 pages for the article :)


I got two setbacks regarding trying BCacheFS on my laptop yesterday and 
today:

1) Linux 6.7-rc1, as mentioned almost rc2, did not hibernate on ThinkPad 
T14 AMD Gen 1. It just blanked screen and nothing happened. So I went back 
to 6.6.1 temporarily. I do not really intend to do a git bisect between 
rc1 and 6.6 on a production laptop. It would be very time-consuming and 
possibly be dangerous. I may go back to 6.7 even without hibernation, just 
using standby over night for a while. I bet BCacheFS is compatible with 
hibernation (as in writing memory contents to encrypted swap and resuming 
from it)?

2) But after I removed 6.7-rc1 yesterday night after having booted into 
6.6.1 this morning I was greeted by GRUB command line. As I had a meeting 
scheduled my primary aim was to get things running again. I certainly did 
not really get the humorous aspect about it. I thought I'd had copied etc 
in the broken state to a another place, but do not find it anymore, maybe 
I copied it to /root or the GRML live distro accidently and so it is gone. 
Also I missed to copy the broken /boot to another place. So I am not 
really able to do any forensic analysis of what might have happened. I do 
not recall having seen any error messages either, but I may have missed 
something. I reviewed hook and script for initial RAM disk in bcachefs-
tools repo, but did not find anything in there that could have caused grub 
2 not finding its config file and modules anymore. Also it booted into 6.7 
and even 6.6.1 then after having executed "make install" from bcachefs-
tools directory. Also I see nothing being done with grub itself within 
bcachefs-tools. So this really is quite the mystery for me. I am on Devuan 
Ceres (based on Debian Sid) so maybe something else got messed up. Will 
review their bug trackers. I really have no idea what went wrong here, 
luckily I was able to recover with GRML. I use LVM on LUKS for BTRFS and 
test BCacheFS filesystem.

Maybe I need to continue testing BCacheFS on a virtual machine, but I'd 
really love to have a BCacheFS filesystem on my laptop and actually really 
using it for something. Well I am going to leave it at that and probably 
research on this after the weekend. Now it is time to have a Sunday off.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-19 11:13           ` Martin Steigerwald
@ 2023-11-19 16:43             ` Martin Steigerwald
  2023-11-19 23:10             ` Kent Overstreet
  1 sibling, 0 replies; 15+ messages in thread
From: Martin Steigerwald @ 2023-11-19 16:43 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Martin Steigerwald - 19.11.23, 12:13:29 CET:
> Maybe I need to continue testing BCacheFS on a virtual machine, but I'd
> really love to have a BCacheFS filesystem on my laptop and actually
> really using it for something. Well I am going to leave it at that and
> probably research on this after the weekend. Now it is time to have a
> Sunday off.

Well will stay away from that 6.7-almost-rc2 kernel on this laptop for now 
after having had experienced and recovered from:

parent transid verify failed + level verify failed

https://lore.kernel.org/linux-btrfs/9221302.CDJkKcVGEf@lichtvoll.de/T/#u

Not sure what caused it, but for now it appears to me that it is not safe 
for me to run this kernel on this laptop.

-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-19 11:13           ` Martin Steigerwald
  2023-11-19 16:43             ` Martin Steigerwald
@ 2023-11-19 23:10             ` Kent Overstreet
  2023-11-20 17:34               ` Martin Steigerwald
  2023-12-03 16:58               ` Martin Steigerwald
  1 sibling, 2 replies; 15+ messages in thread
From: Kent Overstreet @ 2023-11-19 23:10 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-bcachefs

On Sun, Nov 19, 2023 at 12:13:29PM +0100, Martin Steigerwald wrote:
> Take your time and have a great Sunday :)
> 
> Kent Overstreet - 19.11.23, 00:42:05 CET:
> > On Sun, Nov 19, 2023 at 12:15:19AM +0100, Martin Steigerwald wrote:
> > > Kent Overstreet - 18.11.23, 22:07:27 CET:
> > > > > As far as I understand one specific performance related aspect of
> > > > > BCacheFS would be low latencies due to the frontend / backend
> > > > > architecture which in principle is based on what has been there in
> > > > > BCache already. I am intending to explore a bit into that concept
> > > > > in
> > > > > my article.
> > > > 
> > > > The low latency stuff actually wasn't in bcache - that work came
> > > > later.
> > > 
> > > So the frontend / backend architecture is not that much of what makes
> > > BCacheFS unique? Important to know as it seems I may have
> > > misunderstood
> > > something here.
> > 
> > The "filesystem on top of a database" is the main thing that makes
> > bcachefs unique - you have that right.
> 
> Phew! Seems I did not get it completely wrong then :)
> 
> > bcache had much of the core btree design - log structured btree nodes
> > with eytzinger search trees; that's how we got a high enough performance
> > btree to make the "filesystem on top of a database" thing practical.
> > 
> > But the btree in bcache was, from a performance POV, prototype quality -
> > stable, but a lot of performance corner cases unfinished.
> > 
> > The latency work, real iterators, and the whole transaction layer came
> > later :)
> 
> So it is fair to say that being based on BCache enabled that low latency 
> work?
> 
> I am trying to find a balance here. The audience of the article are 
> experienced sys admins working in small and large organizations, but not 
> (primarily) kernel hackers. So I like to explain possible reasons to 
> consider BCacheFS while also having BTRFS and ZFS and XFS, but without 
> going into so much detail that one needs to be a kernel developer to 
> understand it. For XFS it is easy as while there was a proof of concept 
> with subvolumes and snapshots based on XFS in files on XFS, AFAIR from 
> Dave Chinner, some years back, it does not have those advanced features 
> (yet). For ZFS one can always argue it is not in the mainline kernel. But 
> regarding BTRFS it becomes important to really explain something. I 
> started to do some BCacheFS / BTRFS feature comparison chart, but I also 
> like to explain on the benefits BCacheFS can have in the text and a bit 
> about the background of those benefits. Of course also mentioning that 
> BCacheFS is in development for more than 10 years, even without taking all 
> the work for BCache itself into account.
> 
> So my idea currently is to explain that the BCache BTree and/or frontend/
> backend architecture, not sure how to best word it, enabled a database 
> approach to a filesystem to be feasible. And that in a sense it also 
> enabled the latency work. And I can mention sixlocks and all the nice 
> other stuff you mentioned. However… I may not explain exactly what those 
> nice things are and how they work for example. For two reasons: 1. I need 
> to understand those nice features myself, 2. limit of pages I may use and 
> scope of the article. Currently I understood that sixlocks for certain use 
> cases are the next best thing since the invention of a wheel or something 
> like that. But not much more :)
> 
> Would something like that be accurate enough in your opinion?

Yeah, that all sounds reasonable :)

It would be a great project to get all this stuff documented better...
when I have more free time... :)

> I will review the user manual once again, I read about the database 
> approach, and aim to find a good balance. Cause for certain I won't be 
> getting 24 pages for the article :)
> 
> 
> I got two setbacks regarding trying BCacheFS on my laptop yesterday and 
> today:
> 
> 1) Linux 6.7-rc1, as mentioned almost rc2, did not hibernate on ThinkPad 
> T14 AMD Gen 1. It just blanked screen and nothing happened. So I went back 
> to 6.6.1 temporarily. I do not really intend to do a git bisect between 
> rc1 and 6.6 on a production laptop. It would be very time-consuming and 
> possibly be dangerous. I may go back to 6.7 even without hibernation, just 
> using standby over night for a while. I bet BCacheFS is compatible with 
> hibernation (as in writing memory contents to encrypted swap and resuming 
> from it)?

I believe so - I have not tested hibernation specifically.

> 
> 2) But after I removed 6.7-rc1 yesterday night after having booted into 
> 6.6.1 this morning I was greeted by GRUB command line. As I had a meeting 
> scheduled my primary aim was to get things running again. I certainly did 
> not really get the humorous aspect about it. I thought I'd had copied etc 
> in the broken state to a another place, but do not find it anymore, maybe 
> I copied it to /root or the GRML live distro accidently and so it is gone. 
> Also I missed to copy the broken /boot to another place. So I am not 
> really able to do any forensic analysis of what might have happened. I do 
> not recall having seen any error messages either, but I may have missed 
> something. I reviewed hook and script for initial RAM disk in bcachefs-
> tools repo, but did not find anything in there that could have caused grub 
> 2 not finding its config file and modules anymore. Also it booted into 6.7 
> and even 6.6.1 then after having executed "make install" from bcachefs-
> tools directory. Also I see nothing being done with grub itself within 
> bcachefs-tools. So this really is quite the mystery for me. I am on Devuan 
> Ceres (based on Debian Sid) so maybe something else got messed up. Will 
> review their bug trackers. I really have no idea what went wrong here, 
> luckily I was able to recover with GRML. I use LVM on LUKS for BTRFS and 
> test BCacheFS filesystem.
> 
> Maybe I need to continue testing BCacheFS on a virtual machine, but I'd 
> really love to have a BCacheFS filesystem on my laptop and actually really 
> using it for something. Well I am going to leave it at that and probably 
> research on this after the weekend. Now it is time to have a Sunday off.

The Nixos install process with bcachefs as root is pretty smooth, just
make sure to set up a separate filesystem for /boot!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-19 23:10             ` Kent Overstreet
@ 2023-11-20 17:34               ` Martin Steigerwald
  2023-12-03 16:58               ` Martin Steigerwald
  1 sibling, 0 replies; 15+ messages in thread
From: Martin Steigerwald @ 2023-11-20 17:34 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Kent Overstreet - 20.11.23, 00:10:01 CET:
> Maybe I need to continue testing BCacheFS on a virtual machine, but I'd
> 
> > really love to have a BCacheFS filesystem on my laptop and actually
> > really using it for something. Well I am going to leave it at that
> > and probably research on this after the weekend. Now it is time to
> > have a Sunday off.
> The Nixos install process with bcachefs as root is pretty smooth, just
> make sure to set up a separate filesystem for /boot!

Thanks for that idea.

While it might have been that those BTRFS filesystems that got corrupted 
where corrupted by an "thou shalt not resume from this hibernation image" 
I will indeed wait before I try 5.7 kernel again on this laptop. Also I 
will likely wait for an update bcachefs-tools package. I still don't know 
why GRUB greeted me with a command prompt after having removed 5.7 kernel 
again. My current idea on why the BTRFS corruption happened is cause of 
kernel 6.6.1 resuming from hibernation after I fixed up the boot loader in 
two GRML sessions, accessing / and /boot, but not doing anything to 
invalidate the hibernation image on the swap volume. Reminder to self: 
Absolutely do that next time!

I had to restore / and /home BTRFS filesystem as well as /boot, which got 
GRUB broken once more:

xfsprogs-6.5.0 with grub 2.12~rc1-12: unknown filesystem

https://lore.kernel.org/linux-xfs/1889442.tdWV9SEqCh@lichtvoll.de/T/#t

So yes I am very well aware of the advantages of a separate /boot. But 
even with that GRUB was funny on me. Maybe that is why I have seen Ext4 in 
Ext3 mode for /boot. It does not gain new features.

I may investigate an alternative boot loader… it is not the first time GRUB 
broke on new filesystem features and it is not even surprising. It would be 
best to have GRUB use a very simple file system that does not gain new 
features. A bootfs filesystem just for the purposes of a boot loader to 
store a few files and be done with it.

Anyway, it will be a VM for now.

Ciao,
-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-11-19 23:10             ` Kent Overstreet
  2023-11-20 17:34               ` Martin Steigerwald
@ 2023-12-03 16:58               ` Martin Steigerwald
  2023-12-18 16:50                 ` Martin Steigerwald
  1 sibling, 1 reply; 15+ messages in thread
From: Martin Steigerwald @ 2023-12-03 16:58 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Kent Overstreet - 20.11.23, 00:10:01 CET:
> > I got two setbacks regarding trying BCacheFS on my laptop yesterday
> > and
> > today:
> > 
> > 1) Linux 6.7-rc1, as mentioned almost rc2, did not hibernate on
> > ThinkPad T14 AMD Gen 1. It just blanked screen and nothing happened.
> > So I went back to 6.6.1 temporarily. I do not really intend to do a
> > git bisect between rc1 and 6.6 on a production laptop. It would be
> > very time-consuming and possibly be dangerous. I may go back to 6.7
> > even without hibernation, just using standby over night for a while.
> > I bet BCacheFS is compatible with hibernation (as in writing memory
> > contents to encrypted swap and resuming from it)?
> 
> I believe so - I have not tested hibernation specifically.

Today I tried with 6.7-rc4 and the issue of non working hibernation does 
not seem to be related to BCacheFS, cause the module was not loaded and it 
again did not hibernate.

-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Questions related to BCacheFS
  2023-12-03 16:58               ` Martin Steigerwald
@ 2023-12-18 16:50                 ` Martin Steigerwald
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Steigerwald @ 2023-12-18 16:50 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Martin Steigerwald - 03.12.23, 17:58:36 CET:
> > I believe so - I have not tested hibernation specifically.
> 
> Today I tried with 6.7-rc4 and the issue of non working hibernation does
> not seem to be related to BCacheFS, cause the module was not loaded and
> it again did not hibernate.

Finally.

6.7-rc6 does hibernate again. How awesome is that?

However bcachefs-tools package

https://bugs.debian.org/1057295

is broken regarding mounting via mount command unless one moves

mount.bcachefs out of the way :)

Then it mounts okay. And also with mounted BCacheFS hibernation still 
works.

So that obstacle is gone.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* deletion time of big files (was: Re: Questions related to BCacheFS)
  2023-11-18 20:57   ` Martin Steigerwald
  2023-11-18 21:07     ` Kent Overstreet
@ 2023-12-28 22:29     ` Martin Steigerwald
  2023-12-29 18:48       ` Kent Overstreet
  1 sibling, 1 reply; 15+ messages in thread
From: Martin Steigerwald @ 2023-12-28 22:29 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Hi Kent!

Martin Steigerwald - 18.11.23, 21:57:50 CET:
> Interesting. Only thing regarding performance I noticed so far that
> deleting an almost 8 GiB large DVD ISO image file took a bit longer than
> instant, but I was using Dolphin on Plasma, so not sure whether this
> tiny delay was filesystem or GUI related.

Meanwhile I have a working BCacheFS test setup on my laptop. Currently 
with 6.7-rc7.

I can confirm on the longer than instant deletion times for almost 8 GiB 
large DVD ISO image files. It took 3,4 seconds for two of them.

It appears to me that there is some optimization potential hidden in that.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: deletion time of big files (was: Re: Questions related to BCacheFS)
  2023-12-28 22:29     ` deletion time of big files (was: Re: Questions related to BCacheFS) Martin Steigerwald
@ 2023-12-29 18:48       ` Kent Overstreet
  2023-12-30 10:51         ` Martin Steigerwald
  0 siblings, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2023-12-29 18:48 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-bcachefs

On Thu, Dec 28, 2023 at 11:29:19PM +0100, Martin Steigerwald wrote:
> Hi Kent!
> 
> Martin Steigerwald - 18.11.23, 21:57:50 CET:
> > Interesting. Only thing regarding performance I noticed so far that
> > deleting an almost 8 GiB large DVD ISO image file took a bit longer than
> > instant, but I was using Dolphin on Plasma, so not sure whether this
> > tiny delay was filesystem or GUI related.
> 
> Meanwhile I have a working BCacheFS test setup on my laptop. Currently 
> with 6.7-rc7.
> 
> I can confirm on the longer than instant deletion times for almost 8 GiB 
> large DVD ISO image files. It took 3,4 seconds for two of them.
> 
> It appears to me that there is some optimization potential hidden in that.

Bug me again in a few weeks/months if I don't get to it; Dave's fs
metadata benchmarks were also pointing out things that need to be
opimized as well, this is probably related - but I'm going to be deep in
the disk space accounting rewrite for a good while

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: deletion time of big files (was: Re: Questions related to BCacheFS)
  2023-12-29 18:48       ` Kent Overstreet
@ 2023-12-30 10:51         ` Martin Steigerwald
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Steigerwald @ 2023-12-30 10:51 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Kent Overstreet - 29.12.23, 19:48:55 CET:
> > I can confirm on the longer than instant deletion times for almost 8
> > GiB large DVD ISO image files. It took 3,4 seconds for two of them.
> > 
> > It appears to me that there is some optimization potential hidden in
> > that.
>
> Bug me again in a few weeks/months if I don't get to it; Dave's fs
> metadata benchmarks were also pointing out things that need to be
> opimized as well, this is probably related - but I'm going to be deep in
> the disk space accounting rewrite for a good while

Alright. All in due time. It is not that I have gazillions of such large 
files laying around or even the amount of storage capacity available. I 
intend to format an external 4TB NVME SSD with encrypted BCacheFS. And for 
testing copy over 1 TB of data from an external 2TB NVME SSD to it and see 
how that goes.

I am happy that BCacheFS went into the mainline kernel and I rather see 
things done right than fast.

Have a Happy New Year.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-12-30 10:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-18 19:15 Questions related to BCacheFS Martin Steigerwald
2023-11-18 19:50 ` Kent Overstreet
2023-11-18 20:57   ` Martin Steigerwald
2023-11-18 21:07     ` Kent Overstreet
2023-11-18 23:15       ` Martin Steigerwald
2023-11-18 23:42         ` Kent Overstreet
2023-11-19 11:13           ` Martin Steigerwald
2023-11-19 16:43             ` Martin Steigerwald
2023-11-19 23:10             ` Kent Overstreet
2023-11-20 17:34               ` Martin Steigerwald
2023-12-03 16:58               ` Martin Steigerwald
2023-12-18 16:50                 ` Martin Steigerwald
2023-12-28 22:29     ` deletion time of big files (was: Re: Questions related to BCacheFS) Martin Steigerwald
2023-12-29 18:48       ` Kent Overstreet
2023-12-30 10:51         ` Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).