linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Zdenek Kabelac <zkabelac@redhat.com>
To: LVM general discussion and development <linux-lvm@redhat.com>,
	Dale Stephenson <dalestephenson@mac.com>
Subject: Re: [linux-lvm] Performance penalty for 4k requests on thin provisioned volume
Date: Thu, 14 Sep 2017 11:00:46 +0200	[thread overview]
Message-ID: <bb529703-849c-93ff-f40c-1de8a6f49ce6@redhat.com> (raw)
In-Reply-To: <42E7ED35-B32E-4C02-976A-7A9E5380EEA8@mac.com>

Dne 14.9.2017 v 00:39 Dale Stephenson napsal(a):
> 
>> On Sep 13, 2017, at 4:19 PM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
>>
>> Dne 13.9.2017 v 17:33 Dale Stephenson napsal(a):
>>> Distribution: centos-release-7-3.1611.el7.centos.x86_64
>>> Kernel: Linux 3.10.0-514.26.2.el7.x86_64
>>> LVM: 2.02.166(2)-RHEL7 (2016-11-16)
>>> Volume group consisted of an 8-drive SSD (500G drives) array, plus an additional SSD of the same size.  The array had 64 k stripes.
>>> Thin pool had -Zn option and 512k chunksize (full stripe), size 3T with metadata volume 16G.  data was entirely on the 8-drive raid, metadata was entirely on the 9th drive.
>>> Virtual volume “thin” was 300 GB.  I also filled it with dd so that it would be fully provisioned before the test.
>>> Volume “thick” was also 300GB, just an ordinary volume also entirely on the 8-drive array.
>>> Four tests were run directlyagainst each volume using fio-2.2.8, random read, random write, sequential read, sequential write.  Single thread, 4k blocksize, 90s run time.
>>
>> Hi
>>
>> Can you please provide output of:
>>
>> lvs -a -o+stripes,stripesize,seg_pe_ranges
>>
>> so we can see how is your stripe placed on devices ?
> 
> Sure, thank you for your help:
> # lvs -a -o+stripes,stripesize,seg_pe_ranges
>    LV               VG     Attr       LSize   Pool     Origin Data%  Meta%  Move Log Cpy%Sync Convert #Str Stripe PE Ranges
>    [lvol0_pmspare]  volgr0 ewi-------  16.00g                                                            1     0  /dev/md127:867328-871423
>    thick            volgr0 -wi-a----- 300.00g                                                            1     0  /dev/md127:790528-867327
>    thin             volgr0 Vwi-a-t--- 300.00g thinpool        100.00                                     0     0
>    thinpool         volgr0 twi-aot---   3.00t                 9.77   0.13                                1     0  thinpool_tdata:0-786431
>    [thinpool_tdata] volgr0 Twi-ao----   3.00t                                                            1     0  /dev/md127:0-786431
>    [thinpool_tmeta] volgr0 ewi-ao----  16.00g                                                            1     0  /dev/sdb4:0-4095
> 
> md127 is an 8-drive RAID 0
> 
> As you can see, there’s no lvm striping; I rely on the software RAID underneath for that.  Both thick and thin lvols are on the same PV.
>>
>> SSD typically do needs ideally write 512K chunks.
> 
> I could create the md to use 512k chunks for RAID 0, but I wouldn’t expect that to have any impact on a single threaded test using 4k request size.  Is there a hidden relationship that I’m unaware of?


Yep - it seems the setup in this case is the best fit.

If you can reevaluate different setups you may possibly get much higher 
throughput.

My guess would be - the best targeting layout should be probably striping no 
more then 2-3 disks and use bigger striping block.

And then just 'join' 'smaller' arrays together in lvm2 in 1 big LV.


> 
>> (something like  'lvcreate -LXXX -i8 -I512k vgname’)
>>
> Would making lvm stripe on top of an md that already stripes confer any performance benefit in general, or for small (4k) requests in particular?

Rule #1 - try to avoid 'over-combining' things together.
  - measure performance from 'bottom'  upward in your device stack.
If the underlying devices gives poor speed - you can't make it better by any 
super0smart disk-layout on top of it.


> 
>> Wouldn't be 'faster' to just concatenate 8 disks together instead of striping - or stripe only across 2 disk - and then you concatenate 4 such striped areas…
>>
> For sustained throughput I would expect striping of 8 disks to blow away concatenation — however, for small requests I wouldn’t expect any advantage.  On a non-redundant array, I would expect a single threaded test using 4k requests is going to end up reading/writing data from exactly one disk regardless of whether the underlying drives are concatenated or stripes.
It always depends which kind of load you expect the most.

I suspect spreading 4K blocks across 8 SSD is likely very far away from ideal 
layout.

Any SSD is typically very bad with 4K blocks -  it you want to 'spread' the 
load on mores SSDs  do not use less the 64K stripe chunks per SSD - this gives 
you (8*64)  512K stripe size.

As for thin-pool chunksize -  if you plan to use lots of snapshots - keep the 
value lowest possible - 64K  or 128K thin-pool chunksize.

But I'd still suggest to reevaluate/benchmark setup where you will use much 
lower number of SSD for load spreading - and use bigger strip chunks per each 
device.  This should nicely improve performance in case of 'bigger' writes
and not that much slow things down with  4K loads....


> What is the best choice for handling 4k request sizes?

Possibly NVMe can do a better job here.

Regards

Zdenek

  reply	other threads:[~2017-09-14  9:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-13 15:33 [linux-lvm] Performance penalty for 4k requests on thin provisioned volume Dale Stephenson
2017-09-13 20:19 ` Zdenek Kabelac
2017-09-13 22:39   ` Dale Stephenson
2017-09-14  9:00     ` Zdenek Kabelac [this message]
2017-09-14  9:37       ` Zdenek Kabelac
2017-09-14 10:52         ` Gionatan Danti
2017-09-14 10:57         ` Gionatan Danti
2017-09-14 11:13           ` Zdenek Kabelac
2017-09-14 14:32             ` Dale Stephenson
2017-09-14 15:25       ` Dale Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb529703-849c-93ff-f40c-1de8a6f49ce6@redhat.com \
    --to=zkabelac@redhat.com \
    --cc=dalestephenson@mac.com \
    --cc=linux-lvm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).