slowness when cp respectively send/receiving on top of dm-crypt

All of lore.kernel.org
 help / color / mirror / Atom feed

* slowness when cp respectively send/receiving on top of dm-crypt
@ 2015-11-27 17:03 Christoph Anton Mitterer
  2015-11-27 19:00 ` Henk Slager
  2015-11-28  4:55 ` Christoph Anton Mitterer
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Anton Mitterer @ 2015-11-27 17:03 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2022 bytes --]

Hey.

Not sure if that's valuable input for the devs, but here's some vague
real-world report about performance:

I'm just copying (via send/receive) a large filesystem (~7TB) from on
HDD over to another.
The devices are both connected via USB3, and each of the btrfs is on
top of dm-crypt.

It's already obvious that things are slowed down, compared to "normal"
circumstances, but from looking at iotop for a while (and the best disk
IO measuring tool ever: the LEDs on the USB/SATA bridge) it seems that
there are always times when basically no IO happens to disk.

There seems to be a repeating schema like this:
- First, there is some heavy disk IO (200-250 M/s), mostly on btrfs
send and receive processes
- Then there are times when send/receive seem to not do anything, but
either btrfs-transaction (this I see far less however, and the IO% is
far lower, while that of dmcrypt_write is usually to 99%) or
dmcrypt_write eat up all IO (I mean the percent value shown in iotop)
with now total/actual disk write and read being basically zero during
that.

Kinda feels as if there would be some large buffer written first, then
when that gets full, dm-crypt starts encrypting it during which there
is no disk-IO (since it waits for the encryption).

Not sure if this is something that could be optimised or maybe it's
even a non issue that happens for example while many small files are
read/written (the data consists of both, many small files as well as
many big files), which may explain why sometimes the actual IO goes up
to large >200M/s or at least > 150M/s and sometimes it caps at around
40-80M/s

Obviously, since I use dm-crypt and compression on both devices, it may
be a CPU issue, but it's a 8 core machine with i7-3612QM CPU @
2.10GHz... not the fastest, but not the slowest either... and looking
at top/htop is happens quite often that there is only very little CPU
utilisation, so it doesn't seem as if CPU would be the killing factor
here.

HTH,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-27 17:03 slowness when cp respectively send/receiving on top of dm-crypt Christoph Anton Mitterer
@ 2015-11-27 19:00 ` Henk Slager
  2015-11-28  4:14   ` Christoph Anton Mitterer
  2015-11-28  4:55 ` Christoph Anton Mitterer
  1 sibling, 1 reply; 12+ messages in thread
From: Henk Slager @ 2015-11-27 19:00 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-btrfs

> I'm just copying (via send/receive) a large filesystem (~7TB) from on
> HDD over to another.
> The devices are both connected via USB3, and each of the btrfs is on
> top of dm-crypt.
As far as I can guess this is transfers between Seagate Archive 8TB
SMR drives. For max 250GB in new/clean state you would get >100MB/s
write speed. However, due to SMR, you will experience large internal
disk 'rewrites', so throughput will go down to roughly 30MB/s. On
avarage for 8TB, expect something like 50MB/s
I think you know this:
https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg47341.html
and certainly this:
https://bugzilla.kernel.org/show_bug.cgi?id=93581

> It's already obvious that things are slowed down, compared to "normal"
> circumstances, but from looking at iotop for a while (and the best disk
> IO measuring tool ever: the LEDs on the USB/SATA bridge) it seems that
> there are always times when basically no IO happens to disk.
The USB/SATA bridges somehow add some latency to ATA command
read/write (or might prevent queuing things, it is not clear to me in
detail), but saves you from the typical ATA errors as reported in bug
93581, as you also suggested yourself.

> There seems to be a repeating schema like this:
> - First, there is some heavy disk IO (200-250 M/s), mostly on btrfs
I think it is MByte/s (and not Mbit/s) right?

> send and receive processes
> - Then there are times when send/receive seem to not do anything, but
> either btrfs-transaction (this I see far less however, and the IO% is
> far lower, while that of dmcrypt_write is usually to 99%) or
> dmcrypt_write eat up all IO (I mean the percent value shown in iotop)
> with now total/actual disk write and read being basically zero during
> that.
>
> Kinda feels as if there would be some large buffer written first, then
> when that gets full, dm-crypt starts encrypting it during which there
> is no disk-IO (since it waits for the encryption).
I must say that adding compression (compress-force=zlib mount option)
makes the whole transferchain tend to not pipeline. Just dm-crypt I
have not seen it on Core-i7 systems. A test between 2 modern SSDs
(SATA3 connected ) is likely needed to see if there really is tendency
for hiccups in processing/pipelining. On kernels 3.11 to 4.0 I have
seen and experienced far from optimal behavior, but with 4.3 it is
quite OK, although I use large bcache which can mitigate HDD seeks
quite well.

On the tools level, you could insert mbuffer or buffer:
... send <snapshot spec> | mbuffer -m 2G | btrfs receive ...
to help pipelining things, but I am more or less sure that the SMR
disk writing is the weakest link (and also at the end of the transfer
chain).

> Not sure if this is something that could be optimised or maybe it's
> even a non issue that happens for example while many small files are
> read/written (the data consists of both, many small files as well as
> many big files), which may explain why sometimes the actual IO goes up
> to large >200M/s or at least > 150M/s and sometimes it caps at around
> 40-80M/s
Indeed typical behavior of SMR drive

> Obviously, since I use dm-crypt and compression on both devices, it may
> be a CPU issue, but it's a 8 core machine with i7-3612QM CPU @
> 2.10GHz... not the fastest, but not the slowest either... and looking
> at top/htop is happens quite often that there is only very little CPU
> utilisation, so it doesn't seem as if CPU would be the killing factor
> here.
Yes, your CPU should not be the bottleneck here.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-27 19:00 ` Henk Slager
@ 2015-11-28  4:14   ` Christoph Anton Mitterer
  2015-11-28 18:34     ` Chris Murphy
  2015-11-28 18:37     ` Henk Slager
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Anton Mitterer @ 2015-11-28  4:14 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3061 bytes --]

On Fri, 2015-11-27 at 20:00 +0100, Henk Slager wrote:
> As far as I can guess this is transfers between Seagate Archive 8TB
> SMR drives.
Yes it is,... and I though about SMR being the reason at first, too,
but:
- As far as I understood SMR, it shouldn't kick in when I do what is
mostly streaming data. Okay I don't know exactly how btrfs writes it's
data, but when I send/receive 7GB I'd have expected that a great deal
of it is just sequential writing.

- When these disks move data from their non shingled areas to the
shingled ones, that - or at least that's my impression - produces some
typical sounds from the mechanical movements, which I didn't hear

- Bust most importantly,... if the reason was SMR, why should always
when no IO happens dmcrypt_write be at basically full CPU.


> I think you know this:
> https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg47341.h
> tml
> and certainly this:
> https://bugzilla.kernel.org/show_bug.cgi?id=93581
I knew the later, but as you've mentioned, the USB/SATA bridge probably
save me from it. Interesting that USB3 is still slow enough not to get
caught ;)

But thanks for the hint nevertheless :)



> > There seems to be a repeating schema like this:
> > - First, there is some heavy disk IO (200-250 M/s), mostly on btrfs
> I think it is MByte/s (and not Mbit/s) right?
Oh yes.. it's whatever iotop shows (not sure whether the use MB or MiB)


> I must say that adding compression (compress-force=zlib mount option)
> makes the whole transferchain tend to not pipeline.
Ah? Well if I'd have known that in advance ^^ (although I just use
compress)...
Didn't marketing tell people that compression may even speed up IO
because the CPUs are so much faster than the disks?

>  Just dm-crypt I
> have not seen it on Core-i7 systems. A test between 2 modern SSDs
> (SATA3 connected ) is likely needed to see if there really is
> tendency
> for hiccups in processing/pipelining. On kernels 3.11 to 4.0 I have
> seen and experienced far from optimal behavior, but with 4.3 it is
> quite OK, although I use large bcache which can mitigate HDD seeks
> quite well.
I remember that in much earlier times, there was something about dm-
crypt that is used just a single thread or so for IO...forgot the
details though.

> >Not sure if this is something that could be optimised or maybe it's
> > even a non issue that happens for example while many small files
> > are
> > read/written (the data consists of both, many small files as well
> > as
> > many big files), which may explain why sometimes the actual IO goes
> > up
> > to large >200M/s or at least > 150M/s and sometimes it caps at
> > around
> > 40-80M/s
> Indeed typical behavior of SMR drive
Well it's not that I wanted to complain... I can live with that
speeds... I just thought that there may be some bad playing between dm-
crypt and btrfs, which could have been shown by these periods in which
nothing seems to happen except dmcrypt_write doing stuff.


Cheers,
Chris

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-27 17:03 slowness when cp respectively send/receiving on top of dm-crypt Christoph Anton Mitterer
  2015-11-27 19:00 ` Henk Slager
@ 2015-11-28  4:55 ` Christoph Anton Mitterer
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Anton Mitterer @ 2015-11-28  4:55 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1682 bytes --]

Hey.

Send/receiving the master to the backup has finished just before... and
now - not that I wouldn't trust btrfs, the hardware, etc. - I ran a
complete diff --recursive --no-dereference over the snapshots on the
two disks.

The two btrfs are mounted ro (thus no write IO), there is not really
any other IO going on in the system.

I basically see a similar up and down as before during writing:
This time, only the diff process show up in iotop.
Sometimes, I get rates of 280-300 MB/s... for several seconds,
sometimes 3-4s... sometimes longer 10-20s,... then it falls down to 30-
40MB/s

At the same time I look at which files diff is currently comparing,...
and these are all large analog image scans[0] and these are > 800MB
(per file).
Also the slow downs or speed ups don't happen when diff moves on to a
new file... can also be when it still compares the same file for a
while.

I wouldn't assume that these are highly fragmented, since both
filesystems were freshly filled a recently ago, with no much further
writes since.
And AFAIU, SMR shouldn't kick in here either.

I tried to have a short look at how the logical CPUs are utilised at
the slow and at the fast phases.
There is no exact schema, sometimes (but not always) it looks as if 1-2 
cores have some 100% utilisation when it's fast... while when it's slow
all cores are at 30-50%
But that may also be just coincidence, as I observed the opposite
behaviour few times...
So don't give too much on this.

Why can it sometimes be super fast and the falls back to low speed?

Cheers,
Chris.

[0] yay.. the good old childhood images where they had no digikams...
;)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-28  4:14   ` Christoph Anton Mitterer
@ 2015-11-28 18:34     ` Chris Murphy
  2015-11-28 18:38       ` Christoph Anton Mitterer
  2015-11-28 18:37     ` Henk Slager
  1 sibling, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2015-11-28 18:34 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Btrfs BTRFS

On Fri, Nov 27, 2015 at 9:14 PM, Christoph Anton Mitterer
<calestyo@scientia.net> wrote:

> - Bust most importantly,... if the reason was SMR, why should always
> when no IO happens dmcrypt_write be at basically full CPU.

It sounds to me like maybe LUKS is configured to use an encryption
algorithm that isn't subject to CPU optimized support, e.g. aes-xts on
my laptop gets 1600MiB/s where serpent-cbc gets only 68MiB/s and pegs
the CPU. This is reported by 'cryptsetup benchmark'

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-28  4:14   ` Christoph Anton Mitterer
  2015-11-28 18:34     ` Chris Murphy
@ 2015-11-28 18:37     ` Henk Slager
  2015-11-29  5:31       ` Duncan
  1 sibling, 1 reply; 12+ messages in thread
From: Henk Slager @ 2015-11-28 18:37 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-btrfs

Hi Chris,

some more comments after I have done some fresh tests.

On Sat, Nov 28, 2015 at 5:14 AM, Christoph Anton Mitterer
<calestyo@scientia.net> wrote:
> On Fri, 2015-11-27 at 20:00 +0100, Henk Slager wrote:
>> As far as I can guess this is transfers between Seagate Archive 8TB
>> SMR drives.
> Yes it is,... and I though about SMR being the reason at first, too,
> but:
> - As far as I understood SMR, it shouldn't kick in when I do what is
> mostly streaming data. Okay I don't know exactly how btrfs writes it's
> data, but when I send/receive 7GB I'd have expected that a great deal
> of it is just sequential writing.
>
> - When these disks move data from their non shingled areas to the
> shingled ones, that - or at least that's my impression - produces some
> typical sounds from the mechanical movements, which I didn't hear
>
> - Bust most importantly,... if the reason was SMR, why should always
> when no IO happens dmcrypt_write be at basically full CPU.
I did not know iotop, I installed it and did run it next to ksysguard
where I display CPU load and block I/O of the various disk/objects.
I also see quite often 99% dmcrypt_write in iotop and at the same time
no 'external' activity of the SMR drive. I did cp a 165G file to the
SMR drive. The source had 130 extents and just uncompressed btrfs
raid10 fs, steadily 150+MB/s throughput. The fs on the SMR disk is
mounted like this:
/dev/mapper/dmcrypt_smr on /mnt/smr type btrfs
(rw,noatime,compress-force=zlib,nossd,space_cache,subvolid=5,subvol=/)
If I look at the throughput graph, I also see seconds timeframes of no
disk activity, but the diskhead makes movements (I have no LEDs).
iotop shows I/O load, not CPU load; if dmcrypt_write does write a
datablock to disk, there are many times that the SMR disk has to do
internal 'rewrites' during which there is no traffic on the SATA lanes
(so no LED flashing) until the SMR disk indicates towards
dmcrypt_write that it has finished the current block and is able to
accept new blocks. I/O load nowadays is not CPU PIO, but the DMA HW
etc doing the work.

The result of the cp action is that the destination is 1018391
extents, so a diff operation afterwards results in quite slow read
from the SMR drive (even slower than the write) and not the 150+MB/s
sustained advertised throughput. The fs on the SMR drive is almost
exclusively adding files, so assume enough un-fragmented free space
available still.
If you would do the same without compress-force-zlib, (also no other
compression), you will see that btrfs can really do well (like 1
extent per GB or so) even with dm-crypt

>> I must say that adding compression (compress-force=zlib mount option)
>> makes the whole transferchain tend to not pipeline.
> Ah? Well if I'd have known that in advance ^^ (although I just use
> compress)...
> Didn't marketing tell people that compression may even speed up IO
> because the CPUs are so much faster than the disks?
They did not tell that it can cause this million extent creation. And
LZO might be different and force or not also has impact.
I am sorry that the statement might have confused you, but I tried
various compression options the last 2 years, this non-pipelining is I
think from kernel 3.x experience, it made a 3T sized fs more or less
useless and many crashes.
But now with kernel 4.3, I don't see anything wrong w.r.t. throughput
performance. If I write to a fast destination, it is just 100% CPU
load for all 8 CPU threads and expected write I/O throughput. Only if
I use forced zlib, I get enough compression so that it makes sense for
my data. For archiving, or backup of backups, I am fine with the heavy
extent creation (and so likely also on-disk fragmentation) and reduced
I/O rates etc.

/Henk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-28 18:34     ` Chris Murphy
@ 2015-11-28 18:38       ` Christoph Anton Mitterer
  2015-11-28 18:55         ` Chris Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Anton Mitterer @ 2015-11-28 18:38 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1524 bytes --]

On Sat, 2015-11-28 at 11:34 -0700, Chris Murphy wrote:
> It sounds to me like maybe LUKS is configured to use an encryption
> algorithm that isn't subject to CPU optimized support, e.g. aes-xts
> on
> my laptop gets 1600MiB/s where serpent-cbc gets only 68MiB/s and pegs
> the CPU. This is reported by 'cryptsetup benchmark'

hmmm...
$ /sbin/cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       910222 iterations per second
PBKDF2-sha256     590414 iterations per second
PBKDF2-sha512     399609 iterations per second
PBKDF2-ripemd160  548418 iterations per second
PBKDF2-whirlpool  179060 iterations per second
#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b   474,3 MiB/s  1686,2 MiB/s
 serpent-cbc   128b    69,4 MiB/s   235,3 MiB/s
 twofish-cbc   128b   144,5 MiB/s   271,6 MiB/s
     aes-cbc   256b   348,0 MiB/s  1239,4 MiB/s
 serpent-cbc   256b    68,8 MiB/s   231,5 MiB/s
 twofish-cbc   256b   146,6 MiB/s   268,9 MiB/s
     aes-xts   256b  1381,3 MiB/s  1384,3 MiB/s
 serpent-xts   256b   238,6 MiB/s   231,1 MiB/s
 twofish-xts   256b   262,9 MiB/s   266,7 MiB/s
     aes-xts   512b  1085,7 MiB/s  1078,9 MiB/s
 serpent-xts   512b   242,1 MiB/s   230,2 MiB/s
 twofish-xts   512b   266,8 MiB/s   265,9 MiB/s

I'm having aes-xts-plain64 with 512 bit key...
that's still 1 GiB/s


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-28 18:38       ` Christoph Anton Mitterer
@ 2015-11-28 18:55         ` Chris Murphy
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Murphy @ 2015-11-28 18:55 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Chris Murphy, Btrfs BTRFS

On Sat, Nov 28, 2015 at 11:38 AM, Christoph Anton Mitterer
<calestyo@scientia.net> wrote:
> On Sat, 2015-11-28 at 11:34 -0700, Chris Murphy wrote:
>> It sounds to me like maybe LUKS is configured to use an encryption
>> algorithm that isn't subject to CPU optimized support, e.g. aes-xts
>> on
>> my laptop gets 1600MiB/s where serpent-cbc gets only 68MiB/s and pegs
>> the CPU. This is reported by 'cryptsetup benchmark'
>
> hmmm...
> $ /sbin/cryptsetup benchmark
> # Tests are approximate using memory only (no storage IO).
> PBKDF2-sha1       910222 iterations per second
> PBKDF2-sha256     590414 iterations per second
> PBKDF2-sha512     399609 iterations per second
> PBKDF2-ripemd160  548418 iterations per second
> PBKDF2-whirlpool  179060 iterations per second
> #  Algorithm | Key |  Encryption |  Decryption
>      aes-cbc   128b   474,3 MiB/s  1686,2 MiB/s
>  serpent-cbc   128b    69,4 MiB/s   235,3 MiB/s
>  twofish-cbc   128b   144,5 MiB/s   271,6 MiB/s
>      aes-cbc   256b   348,0 MiB/s  1239,4 MiB/s
>  serpent-cbc   256b    68,8 MiB/s   231,5 MiB/s
>  twofish-cbc   256b   146,6 MiB/s   268,9 MiB/s
>      aes-xts   256b  1381,3 MiB/s  1384,3 MiB/s
>  serpent-xts   256b   238,6 MiB/s   231,1 MiB/s
>  twofish-xts   256b   262,9 MiB/s   266,7 MiB/s
>      aes-xts   512b  1085,7 MiB/s  1078,9 MiB/s
>  serpent-xts   512b   242,1 MiB/s   230,2 MiB/s
>  twofish-xts   512b   266,8 MiB/s   265,9 MiB/s
>
> I'm having aes-xts-plain64 with 512 bit key...
> that's still 1 GiB/s

I'm using aes-xts-plain64 with a 512 bit keysize also, and dmcrypt
doesn't even register in iotop or top when I do a scrub of the file
system. So I'm not sure what's going on in your case. But I also don't
use compression. The apparent fragmentation is actually bogus, it's an
artifact of compression. If you look at the fragments with filefrag
-v, you'll see that these are actually contiguous extents for the most
part.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-28 18:37     ` Henk Slager
@ 2015-11-29  5:31       ` Duncan
  2015-11-29 19:29         ` Henk Slager
  0 siblings, 1 reply; 12+ messages in thread
From: Duncan @ 2015-11-29  5:31 UTC (permalink / raw)
  To: linux-btrfs

Henk Slager posted on Sat, 28 Nov 2015 19:37:31 +0100 as excerpted:

> I did cp a 165G file to the SMR drive. The source had 130 extents and
> just uncompressed btrfs raid10 fs, steadily 150+MB/s throughput. The fs
> on the SMR disk is mounted like this:
> /dev/mapper/dmcrypt_smr on /mnt/smr type btrfs
> (rw,noatime,compress-force=zlib,nossd,space_cache,subvolid=5,subvol=/)

Note that compress-force=zlib.  It's important below.

> The result of the cp action is that the destination is 1018391 extents

> The fs on the SMR drive is almost exclusively adding files, so assume
> enough un-fragmented free space available still.

What are you using to tell you it has 1018391 extents?  If you're using 
filefrag, it's known not to understand btrfs compression, which uses 128 
KiB (pre-compression size, I believe, tho I'm not absolutely positive) 
blocks, and as a result, to report each of those blocks as a separate 
extent.

1018391 * 1/8 MiB (aka 128 KiB) * 1/1024 GiB/MiB ~= 124.3 GiB.

Are you sure that was a 165 GiB file, or was it (~) 125 GiB?  And GiB 
(2^30, 1024^3) or GB (10^9, 1000^3), or some horrible mixture of 1024s 
and 1000s?

Because if you're using compress-force, filefrag will see each 128 KiB 
compression block as an extent, and 1018391 reported "extents" (actually 
compression blocks) should be ~ 125 GiB.

> If you would do the same without compress-force-zlib, (also no other
> compression), you will see that btrfs can really do well (like 1 extent
> per GB or so) even with dm-crypt

AFAIK there's no easy "admin-level" way to check extent usage when btrfs 
compression is used on a file.  There's developer-level btrfs-debug 
output, but nothing admin-level or user-level at all.

>> Didn't marketing tell people that compression may even speed up IO
>> because the CPUs are so much faster than the disks?

> They did not tell that it can cause this million extent creation. And
> LZO might be different and force or not also has impact.

Except it's not (that we know of, and most likely not) million plus 
extents.  It's a million plus 128-KiB-each compression blocks.  Big 
difference!

LZO should report an identical number of compression blocks, because 
btrfs uses the same 128 KiB compression block size for both.

And while compress-force won't change the reported "extents" that are 
actually compression-blocks if the file is actually compressed, just 
compress by itself may or may not actually compress the file (there's an 
algorithm used, from what the devs have said, basically it checks whether 
the first block or two compress well, and assumes the rest of the file 
will be similar, compressing or not based on the result of that attempt), 
so it's quite possible you'll get better "extent" numbers if the file 
isn't actually compressed, in which case filefrag actually gets things 
right and reports real extent numbers, vs the number of compression 
blocks if the file is compressed.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-29  5:31       ` Duncan
@ 2015-11-29 19:29         ` Henk Slager
  2015-11-30  5:02           ` Duncan
  0 siblings, 1 reply; 12+ messages in thread
From: Henk Slager @ 2015-11-29 19:29 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Sun, Nov 29, 2015 at 6:31 AM, Duncan <1i5t5.duncan@cox.net> wrote:

> What are you using to tell you it has 1018391 extents?  If you're using
> filefrag, it's known not to understand btrfs compression, which uses 128
> KiB (pre-compression size, I believe, tho I'm not absolutely positive)
> blocks, and as a result, to report each of those blocks as a separate
> extent.
Indeed filefrag, and filefrag -k -v results in a 72M txt file, almost
all 'extents' show length:128, but also several a bit bigger or much
bigger.
>From just quickly browsing and random checks, it seems mostly
contiguous (on linux block layer level or from what filefrag sees), it
looks like it is not so difficult to change filefrag so that it also
reports the amount of discontinuities. But maybe there is other
'misunderstanding' between filefrag and btrfs, so then we would need
something dedicated. filefrag execution takes long time mostly
(minutes), even just for 1 though big file.

> Because if you're using compress-force, filefrag will see each 128 KiB
> compression block as an extent, and 1018391 reported "extents" (actually
> compression blocks) should be ~ 125 GiB.
See also above;
file is 176521162380 bytes ~= 164.4 GiB and ls -lh reports 165G

> AFAIK there's no easy "admin-level" way to check extent usage when btrfs
> compression is used on a file.  There's developer-level btrfs-debug
> output, but nothing admin-level or user-level at all.
I would be interested in tooling that gives more visibility on what
happens on btrfs disk-block level, primarily just for non-compression
use-cases. If you know some, please let us know.

> And while compress-force won't change the reported "extents" that are
> actually compression-blocks if the file is actually compressed, just
> compress by itself may or may not actually compress the file (there's an
> algorithm used, from what the devs have said, basically it checks whether
> the first block or two compress well, and assumes the rest of the file
> will be similar, compressing or not based on the result of that attempt),
This method is probably a good method for live data being written, but
for static content (my data files), it turned out to be not good
enough, i.e. total space gained in TeraByte range was not what I
expected/wanted.

> so it's quite possible you'll get better "extent" numbers if the file
> isn't actually compressed, in which case filefrag actually gets things
> right and reports real extent numbers, vs the number of compression
> blocks if the file is compressed.
I think you mean that when the destination block of the file are not
re-compressed (and stored on disk) but left and stored as the
unprocessed source datablock. Indeed, when I first e.g gzip the file,
it's then 75G and filefrag reports 8613 extents, most of them 512KiB

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-29 19:29         ` Henk Slager
@ 2015-11-30  5:02           ` Duncan
  2015-11-30 18:26             ` Henk Slager
  0 siblings, 1 reply; 12+ messages in thread
From: Duncan @ 2015-11-30  5:02 UTC (permalink / raw)
  To: linux-btrfs

Henk Slager posted on Sun, 29 Nov 2015 20:29:49 +0100 as excerpted:

> On Sun, Nov 29, 2015 at 6:31 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> 
>> What are you using to tell you it has 1018391 extents?  If you're using
>> filefrag, it's known not to understand btrfs compression, which uses
>> 128 KiB (pre-compression size, I believe, tho I'm not absolutely
>> positive) blocks, and as a result, to report each of those blocks as a
>> separate extent.

> Indeed filefrag, and filefrag -k -v results in a 72M txt file, almost
> all 'extents' show length:128, but also several a bit bigger or much
> bigger.

Hmm... I wonder... Were they multiples of 128?  I'm not a coder but with 
your results it occurs to me that for sections that compress 
"negatively", that is, that end up larger when "compressed" than when 
not, perhaps even compress-force doesn't compress in that case, which, 
presuming there's several in a row, would then appear as a single extent 
to filefrag (assuming it actually is and the only reason filefrag would 
split it up in reports would be due to the compression), as it would then 
know how to map them properly.

Regardless, that's entirely new information to me, as I figured with 
compressed files it saw each 128 KiB as a separate extent, regardless.

Meanwhile, I didn't know about the -v actually showing the addresses so 
real extents could be manually calculated, either.  The manpage simply 
says "be verbose", which isn't particularly helpful, and I'd obviously 
never tried it.

> From just quickly browsing and random checks, it seems mostly contiguous
> (on linux block layer level or from what filefrag sees), it looks like
> it is not so difficult to change filefrag so that it also reports the
> amount of discontinuities. But maybe there is other 'misunderstanding'
> between filefrag and btrfs, so then we would need something dedicated.
> filefrag execution takes long time mostly (minutes), even just for 1
> though big file.

So filefrag -v's practical verbosity is new and quite useful information 
as well. =:^)  In fact, given that new info and enough motivation, I 
could almost certainly hack up a script to do the address comparisons and 
report actual extents here, tho obviously it'd be far less efficient than 
actually writing the same in native code, the result if filefrag itself 
were to "learn" about btrfs compression.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: slowness when cp respectively send/receiving on top of dm-crypt
  2015-11-30  5:02           ` Duncan
@ 2015-11-30 18:26             ` Henk Slager
  0 siblings, 0 replies; 12+ messages in thread
From: Henk Slager @ 2015-11-30 18:26 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Mon, Nov 30, 2015 at 6:02 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>>> What are you using to tell you it has 1018391 extents?  If you're using
>>> filefrag, it's known not to understand btrfs compression, which uses
>>> 128 KiB (pre-compression size, I believe, tho I'm not absolutely
>>> positive) blocks, and as a result, to report each of those blocks as a
>>> separate extent.
>
>> Indeed filefrag, and filefrag -k -v results in a 72M txt file, almost
>> all 'extents' show length:128, but also several a bit bigger or much
>> bigger.
>
> Hmm... I wonder... Were they multiples of 128?  I'm not a coder but with
> your results it occurs to me that for sections that compress
> "negatively", that is, that end up larger when "compressed" than when
> not, perhaps even compress-force doesn't compress in that case, which,
> presuming there's several in a row, would then appear as a single extent
> to filefrag (assuming it actually is and the only reason filefrag would
> split it up in reports would be due to the compression), as it would then
> know how to map them properly.
They are all multiples of 128, and 128 is the smallest.
I also had these kind of questions about btrfs compression. I just
took the risk that maybe audio/video and .tar.gz files etc might eat
slightly more space when copied to an fs with compress-force=zlib. I
know also that a significant part of my files compress well (factor
~2),  so overall for multi-Terabytes archiving, I am OK with the
situation.

The extent and chunk handling in general (so without crypto and
compression) is what me triggered via this mail-threat. Same as Chris
A.M., I see some storage throughput drops where I don't actually
expect them. It's on fast SSD, where similar files (VM images) achieve
~540MiB/s read throughput and 1 file just large parts only ~45MiB/s. A
partial balance did not help, a cp --reflink=never did. All VMs have
in order of 100k filefrag reported extents. It's not a noticeable
problem for VM running speed and I still have older snapshots around
of the VM that shows the read-slowness. Maybe something not reported
in dmesg is wrong, I will not do full balance or reboot or fs check do
right now, latest early next year. My best guesses are that the
extents are very much scattered in many blockgroups or some issue
inside the SSD that smartctl does not show.

> Regardless, that's entirely new information to me, as I figured with
> compressed files it saw each 128 KiB as a separate extent, regardless.
>
> Meanwhile, I didn't know about the -v actually showing the addresses so
> real extents could be manually calculated, either.  The manpage simply
> says "be verbose", which isn't particularly helpful, and I'd obviously
> never tried it.
>
>> From just quickly browsing and random checks, it seems mostly contiguous
>> (on linux block layer level or from what filefrag sees), it looks like
>> it is not so difficult to change filefrag so that it also reports the
>> amount of discontinuities. But maybe there is other 'misunderstanding'
>> between filefrag and btrfs, so then we would need something dedicated.
>> filefrag execution takes long time mostly (minutes), even just for 1
>> though big file.
>
> So filefrag -v's practical verbosity is new and quite useful information
> as well. =:^)  In fact, given that new info and enough motivation, I
> could almost certainly hack up a script to do the address comparisons and
> report actual extents here, tho obviously it'd be far less efficient than
> actually writing the same in native code, the result if filefrag itself
> were to "learn" about btrfs compression.
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-11-30 18:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-27 17:03 slowness when cp respectively send/receiving on top of dm-crypt Christoph Anton Mitterer
2015-11-27 19:00 ` Henk Slager
2015-11-28  4:14   ` Christoph Anton Mitterer
2015-11-28 18:34     ` Chris Murphy
2015-11-28 18:38       ` Christoph Anton Mitterer
2015-11-28 18:55         ` Chris Murphy
2015-11-28 18:37     ` Henk Slager
2015-11-29  5:31       ` Duncan
2015-11-29 19:29         ` Henk Slager
2015-11-30  5:02           ` Duncan
2015-11-30 18:26             ` Henk Slager
2015-11-28  4:55 ` Christoph Anton Mitterer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.