All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zdenek Kabelac <zkabelac@redhat.com>
To: device-mapper development <dm-devel@redhat.com>
Cc: sandeen@redhat.com, Daniel Browning <db@kavod.com>,
	Mike Snitzer <snitzer@redhat.com>
Subject: Re: A thin-p over 256 GiB fails with I/O errors with non-power-of-two chunk
Date: Tue, 22 Jan 2013 12:10:20 +0100	[thread overview]
Message-ID: <50FE739C.5090200@redhat.com> (raw)
In-Reply-To: <20130121184954.GA18892@redhat.com>

Dne 21.1.2013 19:49, Mike Snitzer napsal(a):
> On Fri, Jan 18 2013 at  5:19am -0500,
> Daniel Browning <db@kavod.com> wrote:
>
>> Why do I get the following error, and what should I do about it? When I
>> create a raid0 md with a non-power-of-two chunk size (e.g. 1152K instead of
>> 512K), then create a thinly-provisioned volume that is over 256 GiB, I get
>> the following dmesg error when I try to create a file system on it:
>>
>>      "make_request bug: can't convert block across chunks or bigger than 1152k 4384 127"
>>
>> This bubbles up to mkfs.xfs as
>>
>>      "libxfs_device_zero write failed: Input/output error"
>>
>> What I find interesting is that it seems to require all three conditions
>> (chunk size, thin-p, and >256 GiB) in order to fail. Without those, it seems
>> to work fine:
>>
>>      * Power-of-two chunk (e.g. 512K), thin-p vol, >256 GiB? Works.
>>      * Non-power-of-two chunk (e.g. 1152K), thin-p vol, <256 GiB? Works.
>>      * Non-power-of-two chunk (e.g. 1152K), regular vol, >256 GiB? Works.
>>      * Non-power-of-two chunk (e.g. 1152K), thin-p vol, >256 GiB? FAIL.
>>
>> Attached is a self-contained test case to reproduce the error, version
>> numbers, and an strace. Thank you in advance,
>> --
>> Daniel Browning
>> Kavod Technologies
>>
>> Appendix A. Self-contained reproduce script
>> ===========================================================
>> dd if=/dev/zero of=loop0.img bs=1G count=150; losetup /dev/loop0 loop0.img
>> dd if=/dev/zero of=loop1.img bs=1G count=150; losetup /dev/loop1 loop1.img
>> mdadm --create /dev/md99 --verbose --level=0 --raid-devices=2 \
>>        --chunk=1152K /dev/loop0 /dev/loop1
>> pvcreate /dev/md99
>> vgcreate test_vg /dev/md99
>> lvcreate --size 257G --type thin-pool --thinpool test_thin_pool test_vg
>> lvcreate --virtualsize 257G --thin test_vg/test_thin_pool --name test_lv
>> mkfs.xfs /dev/test_vg/test_lv
>>
>> # That is where the error occurs. Next is cleanup.
>> lvremove -f /dev/test_vg/test_lv
>> lvremove -f /dev/mapper/test_vg-test_thin_pool
>> vgremove -f test_vg
>> pvremove /dev/md99
>> mdadm --stop /dev/md99
>> mdadm --zero-superblock /dev/loop0 /dev/loop1
>> losetup -d /dev/loop0 /dev/loop1
>> rm loop*.img
>
> Limits of the raid0 device (/dev/md99):
> cat /sys/block/md99/queue/minimum_io_size
> 1179648
> cat /sys/block/md99/queue/optimal_io_size
> 2359296
>
> Limits of the thin-pool device (/dev/test_vg/test_thin_pool):
> cat /sys/block/dm-9/queue/minimum_io_size
> 512
> cat /sys/block/dm-9/queue/optimal_io_size
> 262144
>
> Limits of the thin-device device (/dev/test_vg/test_lv):
> cat /sys/block/dm-10/queue/minimum_io_size
> 512
> cat /sys/block/dm-10/queue/optimal_io_size
> 262144
>
> I notice that lvcreate is not using a thin-pool chunksize that matches
> the raid0's chunksize (just uses the lvm2 default of 256K).
>
> Switching the thin-pool lvcreate to use --chunksize 1152K at least
> enables me to format the filesystem.
>
> And both the thin-pool and thin device have an optimal_io_size that
> matches the chunk_size of the underlying raid volume:
>
> cat /sys/block/dm-9/queue/optimal_io_size
> 1179648
> cat /sys/block/dm-10/queue/optimal_io_size
> 1179648
>
> I'm still investigating the limits issue when --chunksize 1152K isn't
> used for the thin-pool lvcreate.

Just a comment for the selection of thin chunksize here -

I think it has couple aspects here - by default (unless changed via
lvm.conf {allocation/thin_pool_chunk_size}) it is targeting for 64K
and scales chunksize up to fit thin metadata within 128MB.
(compiled in as DEFAULT_THIN_POOL_OPTIMAL_SIZE)
So lvm2 here scaled from 64k to 256k in multiTB case.

lvcreate currently doesn't look out for geometry of underlying PV(s) during 
its allocation (somewhat chicken-egg problem) - yet there are possible ways to 
try to put this into equation - thought it might not be actually wanted by the 
user - since for snapshots the smaller chunksize is more usable
(>1MB is quite a lot here IMHO) - but it probably worth some thinking.

Zdenek

  reply	other threads:[~2013-01-22 11:10 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-18 10:19 A thin-p over 256 GiB fails with I/O errors with non-power-of-two chunk Daniel Browning
2013-01-21 18:49 ` Mike Snitzer
2013-01-22 11:10   ` Zdenek Kabelac [this message]
2013-01-22 13:51     ` Mike Snitzer
2013-01-23 22:16       ` [PATCH] dm thin: fix queue limits stacking when data device has compulsory merge_bvec_fn Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50FE739C.5090200@redhat.com \
    --to=zkabelac@redhat.com \
    --cc=db@kavod.com \
    --cc=dm-devel@redhat.com \
    --cc=sandeen@redhat.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.