* NVMe SSD + compression - benchmarking @ 2018-04-27 17:41 Brendan Hide 2018-04-28 2:05 ` Qu Wenruo 0 siblings, 1 reply; 4+ messages in thread From: Brendan Hide @ 2018-04-27 17:41 UTC (permalink / raw) To: Btrfs BTRFS Hey, all I'm following up on the queries I had last week since I have installed the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing whether or not I'm doing these benchmarks correctly. As a first test, I put together a 4.7GB .tar containing mostly duplicated copies of the kernel source code (rather compressible). Writing this to the SSD I was seeing repeatable numbers - but noted that the new (supposedly faster) zstd compression is noticeably slower than all other methods. Perhaps this is partly due to lack of multi-threading? No matter, I did also notice a supposedly impossible stat when there is no compression, in that it seems to be faster than the PCI-E 2.0 bus theoretically can deliver: compression type / write speed / read speed (in GBps) zlib / 1.24 / 2.07 lzo / 1.17 / 2.04 zstd / 0.75 / 1.97 no / 1.42 / 2.79 The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0 16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's peak throughput *should* be 2.0 GBps - but above you can see the average read benchmark is 2.79GBps. :-/ The crude timing script I've put together does the following: - Format the SSD anew with btrfs and no custom settings - wait 180 seconds for possible hardware TRIM to settle (possibly overkill since the SSD is new) - Mount the fs using all defaults except for compression, which could be of zlib, lzo, zstd, or no - sync - Drop all caches - Time the following - Copy the file to the test fs (source is a ramdisk) - sync - Drop all caches - Time the following - Copy back from the test fs to ramdisk - sync - unmount I can see how, with compression, it *can* be faster than 2 GBps (though it isn't). But I cannot see how having no compression could possibly be faster than 2 GBps. :-/ I can of course get more info if it'd help figure out this puzzle: Kernel info: Linux localhost.localdomain 4.16.3-1-vfio #1 SMP PREEMPT Sun Apr 22 12:35:45 SAST 2018 x86_64 GNU/Linux ^ Close to the regular ArchLinux kernel - but with vfio, and compiled with -arch=native. See https://aur.archlinux.org/pkgbase/linux-vfio/ CPU model: model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Motherboard model: Product Name: Z68MA-G45 (MS-7676) lspci output for the slot: 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 ^ The disk id sans serial is Samsung_SSD_960_EVO_1TB dmidecode output for the slot: Handle 0x001E, DMI type 9, 17 bytes System Slot Information Designation: J8B4 Type: x16 PCI Express Current Usage: In Use Length: Long ID: 4 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:02:01.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NVMe SSD + compression - benchmarking 2018-04-27 17:41 NVMe SSD + compression - benchmarking Brendan Hide @ 2018-04-28 2:05 ` Qu Wenruo 2018-04-28 7:30 ` Brendan Hide 0 siblings, 1 reply; 4+ messages in thread From: Qu Wenruo @ 2018-04-28 2:05 UTC (permalink / raw) To: Brendan Hide, Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 4464 bytes --] On 2018年04月28日 01:41, Brendan Hide wrote: > Hey, all > > I'm following up on the queries I had last week since I have installed > the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing > whether or not I'm doing these benchmarks correctly. > > As a first test, I put together a 4.7GB .tar containing mostly > duplicated copies of the kernel source code (rather compressible). > Writing this to the SSD I was seeing repeatable numbers - but noted that > the new (supposedly faster) zstd compression is noticeably slower than > all other methods. Perhaps this is partly due to lack of > multi-threading? No matter, I did also notice a supposedly impossible > stat when there is no compression, in that it seems to be faster than > the PCI-E 2.0 bus theoretically can deliver: I'd say the test method is more like real world usage other than benchmark. Moreover, the kernel source copying is not that good for compression, as mostly of the files are smaller than 128K, which means they can't take much advantage of multi thread split based on 128K. And kernel source is consistent of multiple small files, and btrfs is really slow for metadata heavy workload. I'd recommend to start with simpler workload, then go step by step towards more complex workload. Large file sequence write with large block size would be a nice start point, as it could take all advantage of multithread compression. Another advice here is, if you really want a super fast storage, and there is plenty memory, brd module will be your best friend. And for modern mainstream hardware, brd could provide performance over 1GiB/s: $ sudo modprobe brd rd_nr=1 rd_size=2097152 $ LANG=C dd if=/dev/zero bs=1M of=/dev/ram0 count=2048 2048+0 records in 2048+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.45593 s, 1.5 GB/s Thanks, Qu > > compression type / write speed / read speed (in GBps) > zlib / 1.24 / 2.07 > lzo / 1.17 / 2.04 > zstd / 0.75 / 1.97 > no / 1.42 / 2.79 > > The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0 > 16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's > peak throughput *should* be 2.0 GBps - but above you can see the average > read benchmark is 2.79GBps. :-/ > > The crude timing script I've put together does the following: > - Format the SSD anew with btrfs and no custom settings > - wait 180 seconds for possible hardware TRIM to settle (possibly > overkill since the SSD is new) > - Mount the fs using all defaults except for compression, which could be > of zlib, lzo, zstd, or no > - sync > - Drop all caches > - Time the following > - Copy the file to the test fs (source is a ramdisk) > - sync > - Drop all caches > - Time the following > - Copy back from the test fs to ramdisk > - sync > - unmount > > I can see how, with compression, it *can* be faster than 2 GBps (though > it isn't). But I cannot see how having no compression could possibly be > faster than 2 GBps. :-/ > > I can of course get more info if it'd help figure out this puzzle: > > Kernel info: > Linux localhost.localdomain 4.16.3-1-vfio #1 SMP PREEMPT Sun Apr 22 > 12:35:45 SAST 2018 x86_64 GNU/Linux > ^ Close to the regular ArchLinux kernel - but with vfio, and compiled > with -arch=native. See https://aur.archlinux.org/pkgbase/linux-vfio/ > > CPU model: > model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz > > Motherboard model: > Product Name: Z68MA-G45 (MS-7676) > > lspci output for the slot: > 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe > SSD Controller SM961/PM961 > ^ The disk id sans serial is Samsung_SSD_960_EVO_1TB > > dmidecode output for the slot: > Handle 0x001E, DMI type 9, 17 bytes > System Slot Information > Designation: J8B4 > Type: x16 PCI Express > Current Usage: In Use > Length: Long > ID: 4 > Characteristics: > 3.3 V is provided > Opening is shared > PME signal is supported > Bus Address: 0000:02:01.1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NVMe SSD + compression - benchmarking 2018-04-28 2:05 ` Qu Wenruo @ 2018-04-28 7:30 ` Brendan Hide 2018-04-29 8:28 ` Duncan 0 siblings, 1 reply; 4+ messages in thread From: Brendan Hide @ 2018-04-28 7:30 UTC (permalink / raw) To: Qu Wenruo, Btrfs BTRFS On 04/28/2018 04:05 AM, Qu Wenruo wrote: > > > On 2018年04月28日 01:41, Brendan Hide wrote: >> Hey, all >> >> I'm following up on the queries I had last week since I have installed >> the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing >> whether or not I'm doing these benchmarks correctly. >> >> As a first test, I put together a 4.7GB .tar containing mostly >> duplicated copies of the kernel source code (rather compressible). >> Writing this to the SSD I was seeing repeatable numbers - but noted that >> the new (supposedly faster) zstd compression is noticeably slower than >> all other methods. Perhaps this is partly due to lack of >> multi-threading? No matter, I did also notice a supposedly impossible >> stat when there is no compression, in that it seems to be faster than >> the PCI-E 2.0 bus theoretically can deliver: > > I'd say the test method is more like real world usage other than benchmark. > Moreover, the kernel source copying is not that good for compression, as > mostly of the files are smaller than 128K, which means they can't take > much advantage of multi thread split based on 128K. > > And kernel source is consistent of multiple small files, and btrfs is > really slow for metadata heavy workload. > > I'd recommend to start with simpler workload, then go step by step > towards more complex workload. > > Large file sequence write with large block size would be a nice start > point, as it could take all advantage of multithread compression. Thanks, Qu I did also test the folder tree where I realised it is intense / far from a regular use-case. It gives far slower results with zlib being the slowest. The source's average file size is near 13KiB. However, in this test where I gave some results below, the .tar is a large (4.7GB) singular file - I'm not unpacking it at all. Average results from source tree: compression type / write speed / read speed no / 0.29 GBps / 0.20 GBps lzo / 0.21 GBps / 0.17 GBps zstd / 0.13 GBps / 0.14 GBps zlib / 0.06 GBps / 0.10 GBps Average results from .tar: compression type / write speed / read speed no / 1.42 GBps / 2.79 GBps lzo / 1.17 GBps / 2.04 GBps zstd / 0.75 GBps / 1.97 GBps zlib / 1.24 GBps / 2.07 GBps > Another advice here is, if you really want a super fast storage, and > there is plenty memory, brd module will be your best friend. > And for modern mainstream hardware, brd could provide performance over > 1GiB/s: > $ sudo modprobe brd rd_nr=1 rd_size=2097152 > $ LANG=C dd if=/dev/zero bs=1M of=/dev/ram0 count=2048 > 2048+0 records in > 2048+0 records out > 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.45593 s, 1.5 GB/s My real worry is that I'm currently reading at 2.79GB/s (see result above and below) without compression when my hardware *should* limit it to 2.0GB/s. This tells me either `sync` is not working or my benchmark method is flawed. > Thanks, > Qu > >> >> compression type / write speed / read speed (in GBps) >> zlib / 1.24 / 2.07 >> lzo / 1.17 / 2.04 >> zstd / 0.75 / 1.97 >> no / 1.42 / 2.79 >> >> The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0 >> 16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's >> peak throughput *should* be 2.0 GBps - but above you can see the average >> read benchmark is 2.79GBps. :-/ >> >> The crude timing script I've put together does the following: >> - Format the SSD anew with btrfs and no custom settings >> - wait 180 seconds for possible hardware TRIM to settle (possibly >> overkill since the SSD is new) >> - Mount the fs using all defaults except for compression, which could be >> of zlib, lzo, zstd, or no >> - sync >> - Drop all caches >> - Time the following >> - Copy the file to the test fs (source is a ramdisk) >> - sync >> - Drop all caches >> - Time the following >> - Copy back from the test fs to ramdisk >> - sync >> - unmount >> >> I can see how, with compression, it *can* be faster than 2 GBps (though >> it isn't). But I cannot see how having no compression could possibly be >> faster than 2 GBps. :-/ >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NVMe SSD + compression - benchmarking 2018-04-28 7:30 ` Brendan Hide @ 2018-04-29 8:28 ` Duncan 0 siblings, 0 replies; 4+ messages in thread From: Duncan @ 2018-04-29 8:28 UTC (permalink / raw) To: linux-btrfs Brendan Hide posted on Sat, 28 Apr 2018 09:30:30 +0200 as excerpted: > My real worry is that I'm currently reading at 2.79GB/s (see result > above and below) without compression when my hardware *should* limit it > to 2.0GB/s. This tells me either `sync` is not working or my benchmark > method is flawed. No answer but a couple additional questions/suggestions: * Tarfile: Just to be sure, you're using an uncompressed tarfile, not a (compressed tarfile) tgz/tbz2/etc, correct? * How does hdparm -t and -T compare? That's read-only and bypasses the filesystem, so it should at least give you something to compare the 2.79 GB/s to, both from-raw-device (-t) and cached/memory-only (-T). See the hdparm (8) manpage for the details. * And of course try the compressed tarball too, since it should be easy enough and should give you compressable vs. uncompressable numbers for sanity checking. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-04-29 8:30 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-27 17:41 NVMe SSD + compression - benchmarking Brendan Hide 2018-04-28 2:05 ` Qu Wenruo 2018-04-28 7:30 ` Brendan Hide 2018-04-29 8:28 ` Duncan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.