All of lore.kernel.org
 help / color / mirror / Atom feed
* NVMe SSD + compression - benchmarking
@ 2018-04-27 17:41 Brendan Hide
  2018-04-28  2:05 ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Brendan Hide @ 2018-04-27 17:41 UTC (permalink / raw)
  To: Btrfs BTRFS

Hey, all

I'm following up on the queries I had last week since I have installed 
the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing 
whether or not I'm doing these benchmarks correctly.

As a first test, I put together a 4.7GB .tar containing mostly 
duplicated copies of the kernel source code (rather compressible). 
Writing this to the SSD I was seeing repeatable numbers - but noted that 
the new (supposedly faster) zstd compression is noticeably slower than 
all other methods. Perhaps this is partly due to lack of 
multi-threading? No matter, I did also notice a supposedly impossible 
stat when there is no compression, in that it seems to be faster than 
the PCI-E 2.0 bus theoretically can deliver:

compression type / write speed / read speed (in GBps)
zlib / 1.24 / 2.07
lzo / 1.17 / 2.04
zstd / 0.75 / 1.97
no / 1.42 / 2.79

The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0 
16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's 
peak throughput *should* be 2.0 GBps - but above you can see the average 
read benchmark is 2.79GBps. :-/

The crude timing script I've put together does the following:
- Format the SSD anew with btrfs and no custom settings
- wait 180 seconds for possible hardware TRIM to settle (possibly 
overkill since the SSD is new)
- Mount the fs using all defaults except for compression, which could be 
of zlib, lzo, zstd, or no
- sync
- Drop all caches
- Time the following
  - Copy the file to the test fs (source is a ramdisk)
  - sync
- Drop all caches
- Time the following
  - Copy back from the test fs to ramdisk
  - sync
- unmount

I can see how, with compression, it *can* be faster than 2 GBps (though 
it isn't). But I cannot see how having no compression could possibly be 
faster than 2 GBps. :-/

I can of course get more info if it'd help figure out this puzzle:

Kernel info:
Linux localhost.localdomain 4.16.3-1-vfio #1 SMP PREEMPT Sun Apr 22 
12:35:45 SAST 2018 x86_64 GNU/Linux
^ Close to the regular ArchLinux kernel - but with vfio, and compiled 
with -arch=native. See https://aur.archlinux.org/pkgbase/linux-vfio/

CPU model:
model name    : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Motherboard model:
Product Name: Z68MA-G45 (MS-7676)

lspci output for the slot:
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe 
SSD Controller SM961/PM961
^ The disk id sans serial is Samsung_SSD_960_EVO_1TB

dmidecode output for the slot:
Handle 0x001E, DMI type 9, 17 bytes
System Slot Information
         Designation: J8B4
         Type: x16 PCI Express
         Current Usage: In Use
         Length: Long
         ID: 4
         Characteristics:
                 3.3 V is provided
                 Opening is shared
                 PME signal is supported
         Bus Address: 0000:02:01.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NVMe SSD + compression - benchmarking
  2018-04-27 17:41 NVMe SSD + compression - benchmarking Brendan Hide
@ 2018-04-28  2:05 ` Qu Wenruo
  2018-04-28  7:30   ` Brendan Hide
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2018-04-28  2:05 UTC (permalink / raw)
  To: Brendan Hide, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 4464 bytes --]



On 2018年04月28日 01:41, Brendan Hide wrote:
> Hey, all
> 
> I'm following up on the queries I had last week since I have installed
> the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing
> whether or not I'm doing these benchmarks correctly.
> 
> As a first test, I put together a 4.7GB .tar containing mostly
> duplicated copies of the kernel source code (rather compressible).
> Writing this to the SSD I was seeing repeatable numbers - but noted that
> the new (supposedly faster) zstd compression is noticeably slower than
> all other methods. Perhaps this is partly due to lack of
> multi-threading? No matter, I did also notice a supposedly impossible
> stat when there is no compression, in that it seems to be faster than
> the PCI-E 2.0 bus theoretically can deliver:

I'd say the test method is more like real world usage other than benchmark.
Moreover, the kernel source copying is not that good for compression, as
mostly of the files are smaller than 128K, which means they can't take
much advantage of multi thread split based on 128K.

And kernel source is consistent of multiple small files, and btrfs is
really slow for metadata heavy workload.

I'd recommend to start with simpler workload, then go step by step
towards more complex workload.

Large file sequence write with large block size would be a nice start
point, as it could take all advantage of multithread compression.


Another advice here is, if you really want a super fast storage, and
there is plenty memory, brd module will be your best friend.
And for modern mainstream hardware, brd could provide performance over
1GiB/s:
$ sudo modprobe brd rd_nr=1 rd_size=2097152
$ LANG=C dd if=/dev/zero  bs=1M of=/dev/ram0  count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.45593 s, 1.5 GB/s

Thanks,
Qu

> 
> compression type / write speed / read speed (in GBps)
> zlib / 1.24 / 2.07
> lzo / 1.17 / 2.04
> zstd / 0.75 / 1.97
> no / 1.42 / 2.79
> 
> The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0
> 16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's
> peak throughput *should* be 2.0 GBps - but above you can see the average
> read benchmark is 2.79GBps. :-/
> 
> The crude timing script I've put together does the following:
> - Format the SSD anew with btrfs and no custom settings
> - wait 180 seconds for possible hardware TRIM to settle (possibly
> overkill since the SSD is new)
> - Mount the fs using all defaults except for compression, which could be
> of zlib, lzo, zstd, or no
> - sync
> - Drop all caches
> - Time the following
>  - Copy the file to the test fs (source is a ramdisk)
>  - sync
> - Drop all caches
> - Time the following
>  - Copy back from the test fs to ramdisk
>  - sync
> - unmount
> 
> I can see how, with compression, it *can* be faster than 2 GBps (though
> it isn't). But I cannot see how having no compression could possibly be
> faster than 2 GBps. :-/
> 
> I can of course get more info if it'd help figure out this puzzle:
> 
> Kernel info:
> Linux localhost.localdomain 4.16.3-1-vfio #1 SMP PREEMPT Sun Apr 22
> 12:35:45 SAST 2018 x86_64 GNU/Linux
> ^ Close to the regular ArchLinux kernel - but with vfio, and compiled
> with -arch=native. See https://aur.archlinux.org/pkgbase/linux-vfio/
> 
> CPU model:
> model name    : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> 
> Motherboard model:
> Product Name: Z68MA-G45 (MS-7676)
> 
> lspci output for the slot:
> 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe
> SSD Controller SM961/PM961
> ^ The disk id sans serial is Samsung_SSD_960_EVO_1TB
> 
> dmidecode output for the slot:
> Handle 0x001E, DMI type 9, 17 bytes
> System Slot Information
>         Designation: J8B4
>         Type: x16 PCI Express
>         Current Usage: In Use
>         Length: Long
>         ID: 4
>         Characteristics:
>                 3.3 V is provided
>                 Opening is shared
>                 PME signal is supported
>         Bus Address: 0000:02:01.1
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NVMe SSD + compression - benchmarking
  2018-04-28  2:05 ` Qu Wenruo
@ 2018-04-28  7:30   ` Brendan Hide
  2018-04-29  8:28     ` Duncan
  0 siblings, 1 reply; 4+ messages in thread
From: Brendan Hide @ 2018-04-28  7:30 UTC (permalink / raw)
  To: Qu Wenruo, Btrfs BTRFS


On 04/28/2018 04:05 AM, Qu Wenruo wrote:
> 
> 
> On 2018年04月28日 01:41, Brendan Hide wrote:
>> Hey, all
>>
>> I'm following up on the queries I had last week since I have installed
>> the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing
>> whether or not I'm doing these benchmarks correctly.
>>
>> As a first test, I put together a 4.7GB .tar containing mostly
>> duplicated copies of the kernel source code (rather compressible).
>> Writing this to the SSD I was seeing repeatable numbers - but noted that
>> the new (supposedly faster) zstd compression is noticeably slower than
>> all other methods. Perhaps this is partly due to lack of
>> multi-threading? No matter, I did also notice a supposedly impossible
>> stat when there is no compression, in that it seems to be faster than
>> the PCI-E 2.0 bus theoretically can deliver:
> 
> I'd say the test method is more like real world usage other than benchmark.
> Moreover, the kernel source copying is not that good for compression, as
> mostly of the files are smaller than 128K, which means they can't take
> much advantage of multi thread split based on 128K.
> 
> And kernel source is consistent of multiple small files, and btrfs is
> really slow for metadata heavy workload.
> 
> I'd recommend to start with simpler workload, then go step by step
> towards more complex workload.
> 
> Large file sequence write with large block size would be a nice start
> point, as it could take all advantage of multithread compression.

Thanks, Qu

I did also test the folder tree where I realised it is intense / far 
from a regular use-case. It gives far slower results with zlib being the 
slowest. The source's average file size is near 13KiB. However, in this 
test where I gave some results below, the .tar is a large (4.7GB) 
singular file - I'm not unpacking it at all.

Average results from source tree:
compression type / write speed / read speed
no / 0.29 GBps / 0.20 GBps
lzo / 0.21 GBps / 0.17 GBps
zstd / 0.13 GBps / 0.14 GBps
zlib / 0.06 GBps / 0.10 GBps

Average results from .tar:
compression type / write speed / read speed
no / 1.42 GBps / 2.79 GBps
lzo / 1.17 GBps / 2.04 GBps
zstd / 0.75 GBps / 1.97 GBps
zlib / 1.24 GBps / 2.07 GBps

> Another advice here is, if you really want a super fast storage, and
> there is plenty memory, brd module will be your best friend.
> And for modern mainstream hardware, brd could provide performance over
> 1GiB/s:
> $ sudo modprobe brd rd_nr=1 rd_size=2097152
> $ LANG=C dd if=/dev/zero  bs=1M of=/dev/ram0  count=2048
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.45593 s, 1.5 GB/s

My real worry is that I'm currently reading at 2.79GB/s (see result 
above and below) without compression when my hardware *should* limit it 
to 2.0GB/s. This tells me either `sync` is not working or my benchmark 
method is flawed.

> Thanks,
> Qu
> 
>>
>> compression type / write speed / read speed (in GBps)
>> zlib / 1.24 / 2.07
>> lzo / 1.17 / 2.04
>> zstd / 0.75 / 1.97
>> no / 1.42 / 2.79
>>
>> The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0
>> 16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's
>> peak throughput *should* be 2.0 GBps - but above you can see the average
>> read benchmark is 2.79GBps. :-/
>>
>> The crude timing script I've put together does the following:
>> - Format the SSD anew with btrfs and no custom settings
>> - wait 180 seconds for possible hardware TRIM to settle (possibly
>> overkill since the SSD is new)
>> - Mount the fs using all defaults except for compression, which could be
>> of zlib, lzo, zstd, or no
>> - sync
>> - Drop all caches
>> - Time the following
>>   - Copy the file to the test fs (source is a ramdisk)
>>   - sync
>> - Drop all caches
>> - Time the following
>>   - Copy back from the test fs to ramdisk
>>   - sync
>> - unmount
>>
>> I can see how, with compression, it *can* be faster than 2 GBps (though
>> it isn't). But I cannot see how having no compression could possibly be
>> faster than 2 GBps. :-/
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NVMe SSD + compression - benchmarking
  2018-04-28  7:30   ` Brendan Hide
@ 2018-04-29  8:28     ` Duncan
  0 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2018-04-29  8:28 UTC (permalink / raw)
  To: linux-btrfs

Brendan Hide posted on Sat, 28 Apr 2018 09:30:30 +0200 as excerpted:

> My real worry is that I'm currently reading at 2.79GB/s (see result
> above and below) without compression when my hardware *should* limit it
> to 2.0GB/s. This tells me either `sync` is not working or my benchmark
> method is flawed.

No answer but a couple additional questions/suggestions:

* Tarfile:  Just to be sure, you're using an uncompressed tarfile, not a 
(compressed tarfile) tgz/tbz2/etc, correct?

* How does hdparm -t and -T compare?  That's read-only and bypasses the 
filesystem, so it should at least give you something to compare the 2.79 
GB/s to, both from-raw-device (-t) and cached/memory-only (-T).  See the 
hdparm (8) manpage for the details.

* And of course try the compressed tarball too, since it should be easy 
enough and should give you compressable vs. uncompressable numbers for 
sanity checking.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-04-29  8:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 17:41 NVMe SSD + compression - benchmarking Brendan Hide
2018-04-28  2:05 ` Qu Wenruo
2018-04-28  7:30   ` Brendan Hide
2018-04-29  8:28     ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.