linux-bcachefs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* content-addressed storage?
@ 2022-04-25  5:37 Wout Mertens
  2022-05-07 18:09 ` Kent Overstreet
  0 siblings, 1 reply; 2+ messages in thread
From: Wout Mertens @ 2022-04-25  5:37 UTC (permalink / raw)
  To: linux-bcachefs

Hi,

I'm idly wondering if bcachefs could support lookups and reuse of
variable-size chunks based on their hash checksum, thus allowing
deduplication.

So, instead of using a user-level storage layer like
https://github.com/systemd/casync, it would be a layer deeper, inside
bcachefs.

Would that be more optimal in memory/cpu/disk, or would it not make
much difference besides comfort?

As a quick overview, casync splits files into chunks. The chunks are
variable-sized based on their content using a rolling hash, which
makes insertions and deletions change only the affected blocks and not
the blocks following them. Then, the chunks are compressed and hashed,
and stored under their hash.
So files stored this way are lists of variable-size hashed chunks. If
two files are similar, they will be sharing a lot of the chunks
automatically.

A straightforward implementation would result in a single directory
with millions/billions of small files. Is that something bcachefs can
handle?

Inside bcachefs I'd instead imagine a btree with all the chunks as
extents, keyed by their hash, and files somehow being able to
arbitrarily point at those extents.

Thoughts? Just musing.

Wout.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: content-addressed storage?
  2022-04-25  5:37 content-addressed storage? Wout Mertens
@ 2022-05-07 18:09 ` Kent Overstreet
  0 siblings, 0 replies; 2+ messages in thread
From: Kent Overstreet @ 2022-05-07 18:09 UTC (permalink / raw)
  To: Wout Mertens; +Cc: linux-bcachefs

On Mon, Apr 25, 2022 at 07:37:51AM +0200, Wout Mertens wrote:
> Hi,
> 
> I'm idly wondering if bcachefs could support lookups and reuse of
> variable-size chunks based on their hash checksum, thus allowing
> deduplication.
> 
> So, instead of using a user-level storage layer like
> https://github.com/systemd/casync, it would be a layer deeper, inside
> bcachefs.
> 
> Would that be more optimal in memory/cpu/disk, or would it not make
> much difference besides comfort?
> 
> As a quick overview, casync splits files into chunks. The chunks are
> variable-sized based on their content using a rolling hash, which
> makes insertions and deletions change only the affected blocks and not
> the blocks following them. Then, the chunks are compressed and hashed,
> and stored under their hash.
> So files stored this way are lists of variable-size hashed chunks. If
> two files are similar, they will be sharing a lot of the chunks
> automatically.
> 
> A straightforward implementation would result in a single directory
> with millions/billions of small files. Is that something bcachefs can
> handle?
> 
> Inside bcachefs I'd instead imagine a btree with all the chunks as
> extents, keyed by their hash, and files somehow being able to
> arbitrarily point at those extents.

We'd need to try implementing it and see how well it works, but I've been
thinking along the same lines as what you're describing - it could work.

The thing people have run into with dedup in other filesystems is that doing it
inline has really high memory overhead - because accesses to your index-by-hash
btree are in completely random order, so you need to keep the whole thing in
memory.

But doing it at extent granularity instead of block granularity will help. The
next question is, what if accesses are misaligned? I've heard of some cool
rolling window algorithms that might be worth looking into. But maybe you or the
casync people have some thoughts on this? What affects extent/chunk alignment in
their system? In bcachefs extent alignment will depend on what the allocator is
doing, it's not something upper layers can generally rely on.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-05-07 18:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-25  5:37 content-addressed storage? Wout Mertens
2022-05-07 18:09 ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).