Re: NVMe SSD + compression - benchmarking

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Brendan Hide <brendan@swiftspirit.co.za>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: NVMe SSD + compression - benchmarking
Date: Sat, 28 Apr 2018 10:05:57 +0800	[thread overview]
Message-ID: <e80e34ff-3d1c-99da-e622-4576c2dc8329@gmx.com> (raw)
In-Reply-To: <5be6d7f3-c905-dc7b-56e0-4c4be60ea952@swiftspirit.co.za>

[-- Attachment #1.1: Type: text/plain, Size: 4464 bytes --]

On 2018年04月28日 01:41, Brendan Hide wrote:
> Hey, all
> 
> I'm following up on the queries I had last week since I have installed
> the NVMe SSD into the PCI-e adapter. I'm having difficulty knowing
> whether or not I'm doing these benchmarks correctly.
> 
> As a first test, I put together a 4.7GB .tar containing mostly
> duplicated copies of the kernel source code (rather compressible).
> Writing this to the SSD I was seeing repeatable numbers - but noted that
> the new (supposedly faster) zstd compression is noticeably slower than
> all other methods. Perhaps this is partly due to lack of
> multi-threading? No matter, I did also notice a supposedly impossible
> stat when there is no compression, in that it seems to be faster than
> the PCI-E 2.0 bus theoretically can deliver:

I'd say the test method is more like real world usage other than benchmark.
Moreover, the kernel source copying is not that good for compression, as
mostly of the files are smaller than 128K, which means they can't take
much advantage of multi thread split based on 128K.

And kernel source is consistent of multiple small files, and btrfs is
really slow for metadata heavy workload.

I'd recommend to start with simpler workload, then go step by step
towards more complex workload.

Large file sequence write with large block size would be a nice start
point, as it could take all advantage of multithread compression.

Another advice here is, if you really want a super fast storage, and
there is plenty memory, brd module will be your best friend.
And for modern mainstream hardware, brd could provide performance over
1GiB/s:
$ sudo modprobe brd rd_nr=1 rd_size=2097152
$ LANG=C dd if=/dev/zero  bs=1M of=/dev/ram0  count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.45593 s, 1.5 GB/s

Thanks,
Qu

> 
> compression type / write speed / read speed (in GBps)
> zlib / 1.24 / 2.07
> lzo / 1.17 / 2.04
> zstd / 0.75 / 1.97
> no / 1.42 / 2.79
> 
> The SSD is PCI-E 3.0 4-lane capable and is connected to a PCI-E 2.0
> 16-lane slot. lspci -vv confirms it is using 4 lanes. This means it's
> peak throughput *should* be 2.0 GBps - but above you can see the average
> read benchmark is 2.79GBps. :-/
> 
> The crude timing script I've put together does the following:
> - Format the SSD anew with btrfs and no custom settings
> - wait 180 seconds for possible hardware TRIM to settle (possibly
> overkill since the SSD is new)
> - Mount the fs using all defaults except for compression, which could be
> of zlib, lzo, zstd, or no
> - sync
> - Drop all caches
> - Time the following
>  - Copy the file to the test fs (source is a ramdisk)
>  - sync
> - Drop all caches
> - Time the following
>  - Copy back from the test fs to ramdisk
>  - sync
> - unmount
> 
> I can see how, with compression, it *can* be faster than 2 GBps (though
> it isn't). But I cannot see how having no compression could possibly be
> faster than 2 GBps. :-/
> 
> I can of course get more info if it'd help figure out this puzzle:
> 
> Kernel info:
> Linux localhost.localdomain 4.16.3-1-vfio #1 SMP PREEMPT Sun Apr 22
> 12:35:45 SAST 2018 x86_64 GNU/Linux
> ^ Close to the regular ArchLinux kernel - but with vfio, and compiled
> with -arch=native. See https://aur.archlinux.org/pkgbase/linux-vfio/
> 
> CPU model:
> model name    : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> 
> Motherboard model:
> Product Name: Z68MA-G45 (MS-7676)
> 
> lspci output for the slot:
> 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe
> SSD Controller SM961/PM961
> ^ The disk id sans serial is Samsung_SSD_960_EVO_1TB
> 
> dmidecode output for the slot:
> Handle 0x001E, DMI type 9, 17 bytes
> System Slot Information
>         Designation: J8B4
>         Type: x16 PCI Express
>         Current Usage: In Use
>         Length: Long
>         ID: 4
>         Characteristics:
>                 3.3 V is provided
>                 Opening is shared
>                 PME signal is supported
>         Bus Address: 0000:02:01.1
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]