archive mirror
 help / color / mirror / Atom feed
From: Zdenek Kabelac <>
To: LVM general discussion and development <>,
	Dale Stephenson <>
Subject: Re: [linux-lvm] Performance penalty for 4k requests on thin provisioned volume
Date: Thu, 14 Sep 2017 11:00:46 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Dne 14.9.2017 v 00:39 Dale Stephenson napsal(a):
>> On Sep 13, 2017, at 4:19 PM, Zdenek Kabelac <> wrote:
>> Dne 13.9.2017 v 17:33 Dale Stephenson napsal(a):
>>> Distribution: centos-release-7-3.1611.el7.centos.x86_64
>>> Kernel: Linux 3.10.0-514.26.2.el7.x86_64
>>> LVM: 2.02.166(2)-RHEL7 (2016-11-16)
>>> Volume group consisted of an 8-drive SSD (500G drives) array, plus an additional SSD of the same size.  The array had 64 k stripes.
>>> Thin pool had -Zn option and 512k chunksize (full stripe), size 3T with metadata volume 16G.  data was entirely on the 8-drive raid, metadata was entirely on the 9th drive.
>>> Virtual volume “thin” was 300 GB.  I also filled it with dd so that it would be fully provisioned before the test.
>>> Volume “thick” was also 300GB, just an ordinary volume also entirely on the 8-drive array.
>>> Four tests were run directlyagainst each volume using fio-2.2.8, random read, random write, sequential read, sequential write.  Single thread, 4k blocksize, 90s run time.
>> Hi
>> Can you please provide output of:
>> lvs -a -o+stripes,stripesize,seg_pe_ranges
>> so we can see how is your stripe placed on devices ?
> Sure, thank you for your help:
> # lvs -a -o+stripes,stripesize,seg_pe_ranges
>    LV               VG     Attr       LSize   Pool     Origin Data%  Meta%  Move Log Cpy%Sync Convert #Str Stripe PE Ranges
>    [lvol0_pmspare]  volgr0 ewi-------  16.00g                                                            1     0  /dev/md127:867328-871423
>    thick            volgr0 -wi-a----- 300.00g                                                            1     0  /dev/md127:790528-867327
>    thin             volgr0 Vwi-a-t--- 300.00g thinpool        100.00                                     0     0
>    thinpool         volgr0 twi-aot---   3.00t                 9.77   0.13                                1     0  thinpool_tdata:0-786431
>    [thinpool_tdata] volgr0 Twi-ao----   3.00t                                                            1     0  /dev/md127:0-786431
>    [thinpool_tmeta] volgr0 ewi-ao----  16.00g                                                            1     0  /dev/sdb4:0-4095
> md127 is an 8-drive RAID 0
> As you can see, there’s no lvm striping; I rely on the software RAID underneath for that.  Both thick and thin lvols are on the same PV.
>> SSD typically do needs ideally write 512K chunks.
> I could create the md to use 512k chunks for RAID 0, but I wouldn’t expect that to have any impact on a single threaded test using 4k request size.  Is there a hidden relationship that I’m unaware of?

Yep - it seems the setup in this case is the best fit.

If you can reevaluate different setups you may possibly get much higher 

My guess would be - the best targeting layout should be probably striping no 
more then 2-3 disks and use bigger striping block.

And then just 'join' 'smaller' arrays together in lvm2 in 1 big LV.

>> (something like  'lvcreate -LXXX -i8 -I512k vgname’)
> Would making lvm stripe on top of an md that already stripes confer any performance benefit in general, or for small (4k) requests in particular?

Rule #1 - try to avoid 'over-combining' things together.
  - measure performance from 'bottom'  upward in your device stack.
If the underlying devices gives poor speed - you can't make it better by any 
super0smart disk-layout on top of it.

>> Wouldn't be 'faster' to just concatenate 8 disks together instead of striping - or stripe only across 2 disk - and then you concatenate 4 such striped areas…
> For sustained throughput I would expect striping of 8 disks to blow away concatenation — however, for small requests I wouldn’t expect any advantage.  On a non-redundant array, I would expect a single threaded test using 4k requests is going to end up reading/writing data from exactly one disk regardless of whether the underlying drives are concatenated or stripes.
It always depends which kind of load you expect the most.

I suspect spreading 4K blocks across 8 SSD is likely very far away from ideal 

Any SSD is typically very bad with 4K blocks -  it you want to 'spread' the 
load on mores SSDs  do not use less the 64K stripe chunks per SSD - this gives 
you (8*64)  512K stripe size.

As for thin-pool chunksize -  if you plan to use lots of snapshots - keep the 
value lowest possible - 64K  or 128K thin-pool chunksize.

But I'd still suggest to reevaluate/benchmark setup where you will use much 
lower number of SSD for load spreading - and use bigger strip chunks per each 
device.  This should nicely improve performance in case of 'bigger' writes
and not that much slow things down with  4K loads....

> What is the best choice for handling 4k request sizes?

Possibly NVMe can do a better job here.



  reply	other threads:[~2017-09-14  9:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-13 15:33 [linux-lvm] Performance penalty for 4k requests on thin provisioned volume Dale Stephenson
2017-09-13 20:19 ` Zdenek Kabelac
2017-09-13 22:39   ` Dale Stephenson
2017-09-14  9:00     ` Zdenek Kabelac [this message]
2017-09-14  9:37       ` Zdenek Kabelac
2017-09-14 10:52         ` Gionatan Danti
2017-09-14 10:57         ` Gionatan Danti
2017-09-14 11:13           ` Zdenek Kabelac
2017-09-14 14:32             ` Dale Stephenson
2017-09-14 15:25       ` Dale Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).