All of lore.kernel.org
 help / color / mirror / Atom feed
* Btrfs send bloat
@ 2019-05-19  8:11 Newbugreport
  2019-05-19 20:06 ` Andrei Borzenkov
  0 siblings, 1 reply; 10+ messages in thread
From: Newbugreport @ 2019-05-19  8:11 UTC (permalink / raw)
  To: linux-btrfs

I have 3-4 years worth of snapshots I use for backup purposes. I keep R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I use both send | receive and send > file. This works well but I get massive deltas when files are moved around in a GUI via samba. Reorganize a bunch of files and the next snapshot is 50 or 100 GB. Perhaps mv or cp with reflink=always would fix the problem but it's just not usable enough for my family.

I'd like a solution to the massive delta problem. Perhaps someone already has a solution, that would be great. If not, I need advice on a few ideas.

It seems a realistic solution to deduplicate the subvolume  before each snapshot is taken, and in theory I could write a small program to do that. However I don't know if that would work. Will Btrfs will let me deduplicate between a file on the live subvolume and a file on the R-O snapshot (really the same file but different path). If so, will Btrfs send with -p result in a small delta?

Failing that I could probably make changes to the send data stream, but that's suboptimal for the live volume and any backup volumes where data has been received.

Also, is it possible to access the Btrfs hash values for files so I don't have to recalculate file hashes for the whole volume myself?

Thanks in advance for any advice.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Btrfs send bloat
  2019-05-19  8:11 Btrfs send bloat Newbugreport
@ 2019-05-19 20:06 ` Andrei Borzenkov
  2019-05-20  9:20   ` David Disseldorp
  2019-05-20 10:34   ` Patrik Lundquist
  0 siblings, 2 replies; 10+ messages in thread
From: Andrei Borzenkov @ 2019-05-19 20:06 UTC (permalink / raw)
  To: Newbugreport, linux-btrfs

19.05.2019 11:11, Newbugreport пишет:
> I have 3-4 years worth of snapshots I use for backup purposes. I keep
> R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I
> use both send | receive and send > file. This works well but I get
> massive deltas when files are moved around in a GUI via samba.

Did you analyze whether it is client or server problem? If client does
file copy (instead of move as you imply) may be the simplest solution
would be to use different tool on client. If problem is on server side,
it is something to discuss with SAMBA folks.

> Reorganize a bunch of files and the next snapshot is 50 or 100 GB.
> Perhaps mv or cp with reflink=always would fix the problem but it's
> just not usable enough for my family.
> 
> I'd like a solution to the massive delta problem. Perhaps someone
> already has a solution, that would be great. If not, I need advice on
> a few ideas.
> 
> It seems a realistic solution to deduplicate the subvolume  before
> each snapshot is taken, and in theory I could write a small program
> to do that.

You mean that none of existing half a dozen tools to perform
deduplication on btrfs fits your requirements?

> However I don't know if that would work. Will Btrfs will
> let me deduplicate between a file on the live subvolume and a file on
> the R-O snapshot (really the same file but different path). If so,

btrfs does not care because it does not perform any deduplication at
all. All tools compute identical file ranges and then invoke kernel
ioctl to replace reference to range in destination file by reference to
identical range in source file. So there is nothing that prevents using
read-only data as source for deduplcation of read-write data. Whether
each of existing tools supports it (or makes it easy to do) I do not know.

> will Btrfs send with -p result in a small delta?
> 

Well, if all data is replaced by reference to existing extents in some
snapshot then delta to this snapshot will be small.

> Failing that I could probably make changes to the send data stream,
> but that's suboptimal for the live volume and any backup volumes
> where data has been received.
> 
> Also, is it possible to access the Btrfs hash values for files so I
> don't have to recalculate file hashes for the whole volume myself?
> 

Currently btrfs does not compute hashes suitable for deduplication. It
only stores CRC32 checksums. You can access checksum tree and at least
one tool makes use of it to speed up scanning; but it then computes
second hash to avoid false positives.

Recently patch series was posted to add support for different hashes (I
believe SHA256 at least); these would be more useful for deduplication
when merged.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Btrfs send bloat
  2019-05-19 20:06 ` Andrei Borzenkov
@ 2019-05-20  9:20   ` David Disseldorp
  2019-05-20 10:34   ` Patrik Lundquist
  1 sibling, 0 replies; 10+ messages in thread
From: David Disseldorp @ 2019-05-20  9:20 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Newbugreport, linux-btrfs

On Sun, 19 May 2019 23:06:25 +0300, Andrei Borzenkov wrote:

> 19.05.2019 11:11, Newbugreport пишет:
> > I have 3-4 years worth of snapshots I use for backup purposes. I keep
> > R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I
> > use both send | receive and send > file. This works well but I get
> > massive deltas when files are moved around in a GUI via samba.  
> 
> Did you analyze whether it is client or server problem? If client does
> file copy (instead of move as you imply) may be the simplest solution
> would be to use different tool on client. If problem is on server side,
> it is something to discuss with SAMBA folks.

Samba supports copy offload via FSCTL_SRV_COPYCHUNK and
FSCTL_DUPLICATE_EXTENTS_TO_FILE, which can be translated to
BTRFS_IOC_CLONE_RANGE via the btrfs Samba VFS module.

Windows explorer and Linux (cifs.ko) are capable of using these
fsctls during copy.

See https://wiki.samba.org/index.php/Server-Side_Copy for details.

Cheers, David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Btrfs send bloat
  2019-05-19 20:06 ` Andrei Borzenkov
  2019-05-20  9:20   ` David Disseldorp
@ 2019-05-20 10:34   ` Patrik Lundquist
  2019-05-20 11:15     ` Newbugreport
  1 sibling, 1 reply; 10+ messages in thread
From: Patrik Lundquist @ 2019-05-20 10:34 UTC (permalink / raw)
  To: Newbugreport; +Cc: linux-btrfs, Andrei Borzenkov

On Mon, 20 May 2019 at 02:36, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>
> 19.05.2019 11:11, Newbugreport пишет:
> > I have 3-4 years worth of snapshots I use for backup purposes. I keep
> > R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I
> > use both send | receive and send > file. This works well but I get
> > massive deltas when files are moved around in a GUI via samba.
>
> Did you analyze whether it is client or server problem? If client does
> file copy (instead of move as you imply) may be the simplest solution
> would be to use different tool on client. If problem is on server side,
> it is something to discuss with SAMBA folks.

Also try the Btrfs module in Samba.
https://wiki.samba.org/index.php/Server-Side_Copy#Btrfs_Enhanced_Server-Side_Copy_Offload

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Btrfs send bloat
  2019-05-20 10:34   ` Patrik Lundquist
@ 2019-05-20 11:15     ` Newbugreport
  2019-05-20 11:58       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 10+ messages in thread
From: Newbugreport @ 2019-05-20 11:15 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs, Andrei Borzenkov

Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support?

Andrea, thank you for the link. bup is impressive but does it work well with btrfs snapshots? My live drive contains the main volume alongside many snapshots and the associated bloat from moved/deleted files. There's not room for another copy of everything, even if it's deduplicated. Perhaps I could switch one of the backup drives and the cloud to bup, but how well would bup work diffing all those snapshots when the backup drive is plugged in?


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, May 20, 2019 10:34 AM, Patrik Lundquist <patrik.lundquist@gmail.com> wrote:

> On Mon, 20 May 2019 at 02:36, Andrei Borzenkov arvidjaar@gmail.com wrote:
>
> > 19.05.2019 11:11, Newbugreport пишет:
> >
> > > I have 3-4 years worth of snapshots I use for backup purposes. I keep
> > > R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I
> > > use both send | receive and send > file. This works well but I get
> > > massive deltas when files are moved around in a GUI via samba.
> >
> > Did you analyze whether it is client or server problem? If client does
> > file copy (instead of move as you imply) may be the simplest solution
> > would be to use different tool on client. If problem is on server side,
> > it is something to discuss with SAMBA folks.
>
> Also try the Btrfs module in Samba.
> https://wiki.samba.org/index.php/Server-Side_Copy#Btrfs_Enhanced_Server-Side_Copy_Offload



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Btrfs send bloat
  2019-05-20 11:15     ` Newbugreport
@ 2019-05-20 11:58       ` Austin S. Hemmelgarn
  2019-05-20 12:14         ` Patrik Lundquist
  0 siblings, 1 reply; 10+ messages in thread
From: Austin S. Hemmelgarn @ 2019-05-20 11:58 UTC (permalink / raw)
  To: Newbugreport, Patrik Lundquist; +Cc: linux-btrfs, Andrei Borzenkov

On 2019-05-20 07:15, Newbugreport wrote:
> Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support?
It shouldn't (Windows' default file manager doesn't, and most stuff on 
Linux uses Samba so it shouldn't either, not sure about macOS though).

Keep in mind, however, that server-side copies only work in SMB within a 
single share.  If you're moving files between two independent shares, 
even if they're on the same server (or even the same filesystem on the 
same server) will always translate to a copy+delete because the client 
system has no other way to tell the server to move the file across shares.
> 
> Andrea, thank you for the link. bup is impressive but does it work well with btrfs snapshots? My live drive contains the main volume alongside many snapshots and the associated bloat from moved/deleted files. There's not room for another copy of everything, even if it's deduplicated. Perhaps I could switch one of the backup drives and the cloud to bup, but how well would bup work diffing all those snapshots when the backup drive is plugged in?
Deduplication will almost never increase the total amount of data, and 
it absolutely won't need a second copy of everything.  The initial pass 
will probably be very slow though, as the ioctl that gets used does a 
bytewise comparison of the ranges that get passed in to make sure they 
are actually identical before it merges them.  Once the data is mostly 
deduplicated, this shouldn't be an issue for most tools as they will see 
the existing deduplicated ranges and not try to re-merge them.
> 
> 
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, May 20, 2019 10:34 AM, Patrik Lundquist <patrik.lundquist@gmail.com> wrote:
> 
>> On Mon, 20 May 2019 at 02:36, Andrei Borzenkov arvidjaar@gmail.com wrote:
>>
>>> 19.05.2019 11:11, Newbugreport пишет:
>>>
>>>> I have 3-4 years worth of snapshots I use for backup purposes. I keep
>>>> R-O live snapshots, two local backups, and AWS Glacier Deep Freeze. I
>>>> use both send | receive and send > file. This works well but I get
>>>> massive deltas when files are moved around in a GUI via samba.
>>>
>>> Did you analyze whether it is client or server problem? If client does
>>> file copy (instead of move as you imply) may be the simplest solution
>>> would be to use different tool on client. If problem is on server side,
>>> it is something to discuss with SAMBA folks.
>>
>> Also try the Btrfs module in Samba.
>> https://wiki.samba.org/index.php/Server-Side_Copy#Btrfs_Enhanced_Server-Side_Copy_Offload
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Btrfs send bloat
  2019-05-20 11:58       ` Austin S. Hemmelgarn
@ 2019-05-20 12:14         ` Patrik Lundquist
  2019-05-20 12:40           ` Btrfs remote reflink with Samba David Disseldorp
  0 siblings, 1 reply; 10+ messages in thread
From: Patrik Lundquist @ 2019-05-20 12:14 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Newbugreport, linux-btrfs, Andrei Borzenkov, ddiss

On Mon, 20 May 2019 at 13:58, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote:
>
> On 2019-05-20 07:15, Newbugreport wrote:
> > Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support?
> It shouldn't (Windows' default file manager doesn't, and most stuff on
> Linux uses Samba so it shouldn't either, not sure about macOS though).

The client side needs support for FSCTL_SRV_COPYCHUNK. Nautilus uses
gvfsd-smb which in turn uses the Samba libs, but I have no idea if it
works. Maybe David Disseldorp knows? Try copying a large file and
compare used space.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Btrfs remote reflink with Samba
  2019-05-20 12:14         ` Patrik Lundquist
@ 2019-05-20 12:40           ` David Disseldorp
  2019-05-20 20:33             ` Patrik Lundquist
  0 siblings, 1 reply; 10+ messages in thread
From: David Disseldorp @ 2019-05-20 12:40 UTC (permalink / raw)
  To: Patrik Lundquist
  Cc: Austin S. Hemmelgarn, Newbugreport, linux-btrfs,
	Andrei Borzenkov, Samba Technical

On Mon, 20 May 2019 14:14:48 +0200, Patrik Lundquist wrote:

> On Mon, 20 May 2019 at 13:58, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote:
> >
> > On 2019-05-20 07:15, Newbugreport wrote:  
> > > Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support?  
> > It shouldn't (Windows' default file manager doesn't, and most stuff on
> > Linux uses Samba so it shouldn't either, not sure about macOS though).  
> 
> The client side needs support for FSCTL_SRV_COPYCHUNK. Nautilus uses
> gvfsd-smb which in turn uses the Samba libs, but I have no idea if it
> works. Maybe David Disseldorp knows?

libsmbclient copychunk functionality was added via:
https://git.samba.org/?p=samba.git;a=commit;h=f73bcf4934be
IIRC, it was added with the intention of being used by Nautilus.
That said, I've not tried it myself, and I don't see any reference to
splice in:
https://gitlab.gnome.org/GNOME/gvfs/blob/master/daemon/gvfsbackendsmb.c
(Perhaps I'm looking in the wrong place?).

Cheers, David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Btrfs remote reflink with Samba
  2019-05-20 12:40           ` Btrfs remote reflink with Samba David Disseldorp
@ 2019-05-20 20:33             ` Patrik Lundquist
  2019-05-20 22:50               ` Chris Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Patrik Lundquist @ 2019-05-20 20:33 UTC (permalink / raw)
  To: David Disseldorp; +Cc: Newbugreport, linux-btrfs, Samba Technical

On Mon, 20 May 2019 at 14:40, David Disseldorp <ddiss@samba.org> wrote:
>
> On Mon, 20 May 2019 14:14:48 +0200, Patrik Lundquist wrote:
>
> > On Mon, 20 May 2019 at 13:58, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote:
> > >
> > > On 2019-05-20 07:15, Newbugreport wrote:
> > > > Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support?
> > > It shouldn't (Windows' default file manager doesn't, and most stuff on
> > > Linux uses Samba so it shouldn't either, not sure about macOS though).
> >
> > The client side needs support for FSCTL_SRV_COPYCHUNK. Nautilus uses
> > gvfsd-smb which in turn uses the Samba libs, but I have no idea if it
> > works. Maybe David Disseldorp knows?
>
> libsmbclient copychunk functionality was added via:
> https://git.samba.org/?p=samba.git;a=commit;h=f73bcf4934be
> IIRC, it was added with the intention of being used by Nautilus.
> That said, I've not tried it myself, and I don't see any reference to
> splice in:
> https://gitlab.gnome.org/GNOME/gvfs/blob/master/daemon/gvfsbackendsmb.c
> (Perhaps I'm looking in the wrong place?).

https://gitlab.gnome.org/GNOME/gvfs/issues/286 is unfortunately
blocked by https://bugzilla.samba.org/show_bug.cgi?id=11413

I don't know if Nautilus tries reflink copying on a cifs mounted Samba
share but Mr. Newbugreport can at least move around (ctrl-x, ctrl-v)
files in Nautilus within the same share without making new copies.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Btrfs remote reflink with Samba
  2019-05-20 20:33             ` Patrik Lundquist
@ 2019-05-20 22:50               ` Chris Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2019-05-20 22:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Samba Technical

On Mon, May 20, 2019 at 2:35 PM Patrik Lundquist
<patrik.lundquist@gmail.com> wrote:
>
> On Mon, 20 May 2019 at 14:40, David Disseldorp <ddiss@samba.org> wrote:
> >
> > On Mon, 20 May 2019 14:14:48 +0200, Patrik Lundquist wrote:
> >
> > > On Mon, 20 May 2019 at 13:58, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote:
> > > >
> > > > On 2019-05-20 07:15, Newbugreport wrote:
> > > > > Patrik, thank you. I've enabled the SAMBA module, which may help in the future. Does the GUI file manager (i.e. Nautilus) need special support?
> > > > It shouldn't (Windows' default file manager doesn't, and most stuff on
> > > > Linux uses Samba so it shouldn't either, not sure about macOS though).
> > >
> > > The client side needs support for FSCTL_SRV_COPYCHUNK. Nautilus uses
> > > gvfsd-smb which in turn uses the Samba libs, but I have no idea if it
> > > works. Maybe David Disseldorp knows?
> >
> > libsmbclient copychunk functionality was added via:
> > https://git.samba.org/?p=samba.git;a=commit;h=f73bcf4934be
> > IIRC, it was added with the intention of being used by Nautilus.
> > That said, I've not tried it myself, and I don't see any reference to
> > splice in:
> > https://gitlab.gnome.org/GNOME/gvfs/blob/master/daemon/gvfsbackendsmb.c
> > (Perhaps I'm looking in the wrong place?).
>
> https://gitlab.gnome.org/GNOME/gvfs/issues/286 is unfortunately
> blocked by https://bugzilla.samba.org/show_bug.cgi?id=11413
>
> I don't know if Nautilus tries reflink copying on a cifs mounted Samba
> share but Mr. Newbugreport can at least move around (ctrl-x, ctrl-v)
> files in Nautilus within the same share without making new copies.


I just did ctrl-c, ctrl-v for a file in one dir to another dir, and it
takes forever. It's clearly being copied over the network to my local
machine and then pushed back to the server. Three minutes to copy a
2GiB file.

Server side:
kernel 5.1.0-1.fc31.x86_64
samba-4.9.5-0.fc29.x86_64
smb.conf contains 'vfs objects = btrfs' for this share

Client side:
samba-client-4.10.2-1.1.fc30.x86_64
gvfs-smb-1.40.1-2.fc30.x86_64
nautilus-3.32.1-1.fc30.x86_64



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-05-20 22:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-19  8:11 Btrfs send bloat Newbugreport
2019-05-19 20:06 ` Andrei Borzenkov
2019-05-20  9:20   ` David Disseldorp
2019-05-20 10:34   ` Patrik Lundquist
2019-05-20 11:15     ` Newbugreport
2019-05-20 11:58       ` Austin S. Hemmelgarn
2019-05-20 12:14         ` Patrik Lundquist
2019-05-20 12:40           ` Btrfs remote reflink with Samba David Disseldorp
2019-05-20 20:33             ` Patrik Lundquist
2019-05-20 22:50               ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.