* NVMe SSD + compression - benchmarking
@ 2018-04-27 17:41 Brendan Hide
2018-04-28 2:05 ` Qu Wenruo
0 siblings, 1 reply; 4+ messages in thread
From: Brendan Hide @ 2018-04-27 17:41 UTC (permalink / raw)
To: Btrfs BTRFS
Hey, all
I'm following up on the queries I had last week since I have installed
the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing
whether or not I'm doing these benchmarks correctly.
As a first test, I put together a 4.7GB .tar containing mostly
duplicated copies of the kernel source code (rather compressible).
Writing this to the SSD I was seeing repeatable numbers - but noted that
the new (supposedly faster) zstd compression is noticeably slower than
all other methods. Perhaps this is partly due to lack of
multi-threading? No matter, I did also notice a supposedly impossible
stat when there is no compression, in that it seems to be faster than
the PCI-E 2.0 bus theoretically can deliver:
compression type / write speed / read speed (in GBps)
zlib / 1.24 / 2.07
lzo / 1.17 / 2.04
zstd / 0.75 / 1.97
no / 1.42 / 2.79
The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0
16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's
peak throughput *should* be 2.0 GBps - but above you can see the average
read benchmark is 2.79GBps. :-/
The crude timing script I've put together does the following:
- Format the SSD anew with btrfs and no custom settings
- wait 180 seconds for possible hardware TRIM to settle (possibly
overkill since the SSD is new)
- Mount the fs using all defaults except for compression, which could be
of zlib, lzo, zstd, or no
- sync
- Drop all caches
- Time the following
- Copy the file to the test fs (source is a ramdisk)
- sync
- Drop all caches
- Time the following
- Copy back from the test fs to ramdisk
- sync
- unmount
I can see how, with compression, it *can* be faster than 2 GBps (though
it isn't). But I cannot see how having no compression could possibly be
faster than 2 GBps. :-/
I can of course get more info if it'd help figure out this puzzle:
Kernel info:
Linux localhost.localdomain 4.16.3-1-vfio #1 SMP PREEMPT Sun Apr 22
12:35:45 SAST 2018 x86_64 GNU/Linux
^ Close to the regular ArchLinux kernel - but with vfio, and compiled
with -arch=native. See https://aur.archlinux.org/pkgbase/linux-vfio/
CPU model:
model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Motherboard model:
Product Name: Z68MA-G45 (MS-7676)
lspci output for the slot:
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe
SSD Controller SM961/PM961
^ The disk id sans serial is Samsung_SSD_960_EVO_1TB
dmidecode output for the slot:
Handle 0x001E, DMI type 9, 17 bytes
System Slot Information
Designation: J8B4
Type: x16 PCI Express
Current Usage: In Use
Length: Long
ID: 4
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:02:01.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NVMe SSD + compression - benchmarking
2018-04-27 17:41 NVMe SSD + compression - benchmarking Brendan Hide
@ 2018-04-28 2:05 ` Qu Wenruo
2018-04-28 7:30 ` Brendan Hide
0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2018-04-28 2:05 UTC (permalink / raw)
To: Brendan Hide, Btrfs BTRFS
[-- Attachment #1.1: Type: text/plain, Size: 4464 bytes --]
On 2018年04月28日 01:41, Brendan Hide wrote:
> Hey, all
>
> I'm following up on the queries I had last week since I have installed
> the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing
> whether or not I'm doing these benchmarks correctly.
>
> As a first test, I put together a 4.7GB .tar containing mostly
> duplicated copies of the kernel source code (rather compressible).
> Writing this to the SSD I was seeing repeatable numbers - but noted that
> the new (supposedly faster) zstd compression is noticeably slower than
> all other methods. Perhaps this is partly due to lack of
> multi-threading? No matter, I did also notice a supposedly impossible
> stat when there is no compression, in that it seems to be faster than
> the PCI-E 2.0 bus theoretically can deliver:
I'd say the test method is more like real world usage other than benchmark.
Moreover, the kernel source copying is not that good for compression, as
mostly of the files are smaller than 128K, which means they can't take
much advantage of multi thread split based on 128K.
And kernel source is consistent of multiple small files, and btrfs is
really slow for metadata heavy workload.
I'd recommend to start with simpler workload, then go step by step
towards more complex workload.
Large file sequence write with large block size would be a nice start
point, as it could take all advantage of multithread compression.
Another advice here is, if you really want a super fast storage, and
there is plenty memory, brd module will be your best friend.
And for modern mainstream hardware, brd could provide performance over
1GiB/s:
$ sudo modprobe brd rd_nr=1 rd_size=2097152
$ LANG=C dd if=/dev/zero bs=1M of=/dev/ram0 count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.45593 s, 1.5 GB/s
Thanks,
Qu
>
> compression type / write speed / read speed (in GBps)
> zlib / 1.24 / 2.07
> lzo / 1.17 / 2.04
> zstd / 0.75 / 1.97
> no / 1.42 / 2.79
>
> The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0
> 16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's
> peak throughput *should* be 2.0 GBps - but above you can see the average
> read benchmark is 2.79GBps. :-/
>
> The crude timing script I've put together does the following:
> - Format the SSD anew with btrfs and no custom settings
> - wait 180 seconds for possible hardware TRIM to settle (possibly
> overkill since the SSD is new)
> - Mount the fs using all defaults except for compression, which could be
> of zlib, lzo, zstd, or no
> - sync
> - Drop all caches
> - Time the following
> - Copy the file to the test fs (source is a ramdisk)
> - sync
> - Drop all caches
> - Time the following
> - Copy back from the test fs to ramdisk
> - sync
> - unmount
>
> I can see how, with compression, it *can* be faster than 2 GBps (though
> it isn't). But I cannot see how having no compression could possibly be
> faster than 2 GBps. :-/
>
> I can of course get more info if it'd help figure out this puzzle:
>
> Kernel info:
> Linux localhost.localdomain 4.16.3-1-vfio #1 SMP PREEMPT Sun Apr 22
> 12:35:45 SAST 2018 x86_64 GNU/Linux
> ^ Close to the regular ArchLinux kernel - but with vfio, and compiled
> with -arch=native. See https://aur.archlinux.org/pkgbase/linux-vfio/
>
> CPU model:
> model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
>
> Motherboard model:
> Product Name: Z68MA-G45 (MS-7676)
>
> lspci output for the slot:
> 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe
> SSD Controller SM961/PM961
> ^ The disk id sans serial is Samsung_SSD_960_EVO_1TB
>
> dmidecode output for the slot:
> Handle 0x001E, DMI type 9, 17 bytes
> System Slot Information
> Designation: J8B4
> Type: x16 PCI Express
> Current Usage: In Use
> Length: Long
> ID: 4
> Characteristics:
> 3.3 V is provided
> Opening is shared
> PME signal is supported
> Bus Address: 0000:02:01.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NVMe SSD + compression - benchmarking
2018-04-28 2:05 ` Qu Wenruo
@ 2018-04-28 7:30 ` Brendan Hide
2018-04-29 8:28 ` Duncan
0 siblings, 1 reply; 4+ messages in thread
From: Brendan Hide @ 2018-04-28 7:30 UTC (permalink / raw)
To: Qu Wenruo, Btrfs BTRFS
On 04/28/2018 04:05 AM, Qu Wenruo wrote:
>
>
> On 2018年04月28日 01:41, Brendan Hide wrote:
>> Hey, all
>>
>> I'm following up on the queries I had last week since I have installed
>> the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing
>> whether or not I'm doing these benchmarks correctly.
>>
>> As a first test, I put together a 4.7GB .tar containing mostly
>> duplicated copies of the kernel source code (rather compressible).
>> Writing this to the SSD I was seeing repeatable numbers - but noted that
>> the new (supposedly faster) zstd compression is noticeably slower than
>> all other methods. Perhaps this is partly due to lack of
>> multi-threading? No matter, I did also notice a supposedly impossible
>> stat when there is no compression, in that it seems to be faster than
>> the PCI-E 2.0 bus theoretically can deliver:
>
> I'd say the test method is more like real world usage other than benchmark.
> Moreover, the kernel source copying is not that good for compression, as
> mostly of the files are smaller than 128K, which means they can't take
> much advantage of multi thread split based on 128K.
>
> And kernel source is consistent of multiple small files, and btrfs is
> really slow for metadata heavy workload.
>
> I'd recommend to start with simpler workload, then go step by step
> towards more complex workload.
>
> Large file sequence write with large block size would be a nice start
> point, as it could take all advantage of multithread compression.
Thanks, Qu
I did also test the folder tree where I realised it is intense / far
from a regular use-case. It gives far slower results with zlib being the
slowest. The source's average file size is near 13KiB. However, in this
test where I gave some results below, the .tar is a large (4.7GB)
singular file - I'm not unpacking it at all.
Average results from source tree:
compression type / write speed / read speed
no / 0.29 GBps / 0.20 GBps
lzo / 0.21 GBps / 0.17 GBps
zstd / 0.13 GBps / 0.14 GBps
zlib / 0.06 GBps / 0.10 GBps
Average results from .tar:
compression type / write speed / read speed
no / 1.42 GBps / 2.79 GBps
lzo / 1.17 GBps / 2.04 GBps
zstd / 0.75 GBps / 1.97 GBps
zlib / 1.24 GBps / 2.07 GBps
> Another advice here is, if you really want a super fast storage, and
> there is plenty memory, brd module will be your best friend.
> And for modern mainstream hardware, brd could provide performance over
> 1GiB/s:
> $ sudo modprobe brd rd_nr=1 rd_size=2097152
> $ LANG=C dd if=/dev/zero bs=1M of=/dev/ram0 count=2048
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.45593 s, 1.5 GB/s
My real worry is that I'm currently reading at 2.79GB/s (see result
above and below) without compression when my hardware *should* limit it
to 2.0GB/s. This tells me either `sync` is not working or my benchmark
method is flawed.
> Thanks,
> Qu
>
>>
>> compression type / write speed / read speed (in GBps)
>> zlib / 1.24 / 2.07
>> lzo / 1.17 / 2.04
>> zstd / 0.75 / 1.97
>> no / 1.42 / 2.79
>>
>> The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0
>> 16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's
>> peak throughput *should* be 2.0 GBps - but above you can see the average
>> read benchmark is 2.79GBps. :-/
>>
>> The crude timing script I've put together does the following:
>> - Format the SSD anew with btrfs and no custom settings
>> - wait 180 seconds for possible hardware TRIM to settle (possibly
>> overkill since the SSD is new)
>> - Mount the fs using all defaults except for compression, which could be
>> of zlib, lzo, zstd, or no
>> - sync
>> - Drop all caches
>> - Time the following
>> - Copy the file to the test fs (source is a ramdisk)
>> - sync
>> - Drop all caches
>> - Time the following
>> - Copy back from the test fs to ramdisk
>> - sync
>> - unmount
>>
>> I can see how, with compression, it *can* be faster than 2 GBps (though
>> it isn't). But I cannot see how having no compression could possibly be
>> faster than 2 GBps. :-/
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NVMe SSD + compression - benchmarking
2018-04-28 7:30 ` Brendan Hide
@ 2018-04-29 8:28 ` Duncan
0 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2018-04-29 8:28 UTC (permalink / raw)
To: linux-btrfs
Brendan Hide posted on Sat, 28 Apr 2018 09:30:30 +0200 as excerpted:
> My real worry is that I'm currently reading at 2.79GB/s (see result
> above and below) without compression when my hardware *should* limit it
> to 2.0GB/s. This tells me either `sync` is not working or my benchmark
> method is flawed.
No answer but a couple additional questions/suggestions:
* Tarfile: Just to be sure, you're using an uncompressed tarfile, not a
(compressed tarfile) tgz/tbz2/etc, correct?
* How does hdparm -t and -T compare? That's read-only and bypasses the
filesystem, so it should at least give you something to compare the 2.79
GB/s to, both from-raw-device (-t) and cached/memory-only (-T). See the
hdparm (8) manpage for the details.
* And of course try the compressed tarball too, since it should be easy
enough and should give you compressable vs. uncompressable numbers for
sanity checking.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-04-29 8:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 17:41 NVMe SSD + compression - benchmarking Brendan Hide
2018-04-28 2:05 ` Qu Wenruo
2018-04-28 7:30 ` Brendan Hide
2018-04-29 8:28 ` Duncan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.