From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <20180511082128.jabbpuxnq4c7ypxr@reti> <6546b6c6-14fb-3180-9b4d-62aa638a2fb3@redhat.com> In-Reply-To: From: John Hamilton Date: Wed, 16 May 2018 09:43:30 -0500 Message-ID: Content-Type: multipart/alternative; boundary="000000000000c3b281056c53be2e" Subject: Re: [linux-lvm] inconsistency between thin pool metadata mapped_blocks and lvs output Reply-To: john.l.hamilton@gmail.com, LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Marian Csontos Cc: LVM general discussion and development --000000000000c3b281056c53be2e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable So it turns out simply running lvconvert --repair fixed the issue and lvs is now reporting the correct utilization. On Fri, May 11, 2018 at 12:09 PM John Hamilton wrote: > Thanks for the response. > > >Is this everything? > > Yes, that is everything in the metadata xml dump. I just removed all of > the *_mapping entries for brevity. For the lvs output I removed other > logical volumes that aren't related to this pool. > > >Is this a pool used by docker, which does not (did not) use LVM to > manage thin-volumes? > > It's not docker, but it is an application called serviced that uses > docker's library for managing the volumes > > >LVM just queries DM, and displays whatever that provides > > Yeah, it looks like dmsetup status output matches lvs: > > myvg-my--pool: 0 5242880000 thin-pool 70 207941/4145152 29018611/40960000= - rw discard_passdown queue_if_no_space - > myvg-my--pool_tdata: 0 4194304000 <(419)%20430-4000> linear > myvg-my--pool_tdata: 4194304000 <(419)%20430-4000> 1048576000 linear > myvg-my--pool_tmeta: 0 33161216 linear > > >What is kernel/lvm version? > > # uname -r > 3.10.0-693.21.1.el7.x86_64 > > # lvm version > LVM version: 2.02.171(2)-RHEL7 (2017-05-03) > Library version: 1.02.140-RHEL7 (2017-05-03) > Driver version: 4.35.0 > Configuration: ./configure --build=3Dx86_64-redhat-linux-gnu --host=3Dx= 86_64-redhat-linux-gnu --program-prefix=3D --disable-dependency-tracking --= prefix=3D/usr --exec-prefix=3D/usr --bindir=3D/usr/bin --sbindir=3D/usr/sbi= n --sysconfdir=3D/etc --datadir=3D/usr/share --includedir=3D/usr/include --= libdir=3D/usr/lib64 --libexecdir=3D/usr/libexec --localstatedir=3D/var --sh= aredstatedir=3D/var/lib --mandir=3D/usr/share/man --infodir=3D/usr/share/in= fo --with-default-dm-run-dir=3D/run --with-default-run-dir=3D/run/lvm --wit= h-default-pid-dir=3D/run --with-default-locking-dir=3D/run/lock/lvm --with-= usrlibdir=3D/usr/lib64 --enable-lvm1_fallback --enable-fsadm --with-pool=3D= internal --enable-write_install --with-user=3D --with-group=3D --with-devic= e-uid=3D0 --with-device-gid=3D6 --with-device-mode=3D0660 --enable-pkgconfi= g --enable-applib --enable-cmdlib --enable-dmeventd --enable-blkid_wiping -= -enable-python2-bindings --with-cluster=3Dinternal --with-clvmd=3Dcorosync = --enable-cmirrord --with-udevdir=3D/usr/lib/udev/rules.d --enable-udev_sync= --with-thin=3Dinternal --enable-lvmetad --with-cache=3Dinternal --enable-l= vmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-dmfilemapd > > >Is thin_check_executable configured in lvm.conf? > > Yes > > I also just found out that they apparently ran thin_check recently and go= t > a message about a corrupt superblock, but didn't repair it. They were > still able to re-activate the pool though. We'll run a repair as soon as = we > get a chance and see if that fixes it. > > Thanks, > > John > > On Fri, May 11, 2018 at 3:54 AM Marian Csontos > wrote: > >> On 05/11/2018 10:21 AM, Joe Thornber wrote: >> > On Thu, May 10, 2018 at 07:30:09PM +0000, John Hamilton wrote: >> >> I saw something today that I don't understand and I'm hoping somebody >> can >> >> help. We had a ~2.5TB thin pool that was showing 69% data utilizatio= n >> in >> >> lvs: >> >> >> >> # lvs -a >> >> LV VG Attr LSize Pool Origin Data% >> >> Meta% Move Log Cpy%Sync Convert >> >> my-pool myvg twi-aotz-- 2.44t 69.04 4.90 >> >> [my-pool_tdata] myvg Twi-ao---- 2.44t >> >> [my-pool_tmeta] myvg ewi-ao---- 15.81g >> >> Is this everything? Is this a pool used by docker, which does not (did >> not) use LVM to manage thin-volumes? >> >> >> However, when I dump the thin pool metadata and look at the >> mapped_blocks >> >> for the 2 devices in the pool, I can only account for about 950GB. >> Here is >> >> the superblock and device entries from the metadata xml. There are n= o >> >> other devices listed in the metadata: >> >> >> >> > >> data_block_size=3D"128" nr_data_blocks=3D"0"> >> >> > >> creation_time=3D"0" snap_time=3D"14"> >> >> > >> creation_time=3D"15" snap_time=3D"34"> >> >> >> >> That first device looks like it has about 16GB allocated to it and th= e >> >> second device about 950GB. So, I would expect lvs to show somewhere >> >> between 950G-966G Is something wrong, or am I misunderstanding how to >> read >> >> the metadata dump? Where is the other 700 or so GB that lvs is showi= ng >> >> used? >> > >> > The non zero snap_time suggests that you're using snapshots. I which >> case it >> > could just be there is common data shared between volumes that is >> getting counted >> > more than once. >> > >> > You can confirm this using the thin_ls tool and specifying a format >> line that >> > includes EXCLUSIVE_BLOCKS, or SHARED_BLOCKS. Lvm doesn't take shared >> blocks into >> > account because it has to scan all the metadata to calculate what's >> shared. >> >> LVM just queries DM, and displays whatever that provides. You could see >> that in `dmsetup status` output, there are two pairs of '/' separated >> entries - first is metadata usage (USED_BLOCKS/ALL_BLOCKS), second data >> usage (USED_CHUNKS/ALL_CHUNKS). >> >> So the error lies somewhere between dmsetup and kernel. >> >> What is kernel/lvm version? >> Is thin_check_executable configured in lvm.conf? >> >> -- Martian >> > --000000000000c3b281056c53be2e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
So it turns out simply running lvconvert --repair fixed th= e issue and lvs is now reporting the correct utilization.

On Fri, May 11, 2018 at 12:09 PM John H= amilton <john.l.hamilton@gm= ail.com> wrote:
Thanks for the response.=C2=A0=C2=A0

>Is this everything?

Yes, that is everything in the metadata xml dump.=C2=A0 I just remo= ved all of the *_mapping entries for brevity.=C2=A0 For the lvs output I re= moved other logical volumes that aren't related to this pool.

>Is this a pool used by docker, which= does not (did=C2=A0not) use LVM to manage thin-volumes?<= /div>

= It's not docker, but it is an application called serviced that uses doc= ker's library for managing the volumes

>LVM just queries DM, and displays whatever that provi= des

Yeah, it looks like dmsetup status output matches lvs:
myvg-my--pool: 0 5242880000 =
thin-pool 70 207941/4145152 29018611/40960000 - rw discard_passdown queue_i=
f_no_space -
myvg-my--pool_tdata: 0 4194304000 linear
myvg-my--pool_tdata: 4194304000 1048576000 linear
myvg-my--pool_tmeta: 0 33161216 linear
>What is kernel/lvm version?
# uname -r 3.10.0-693.21.1.el7.x86_64
# lvm version
LVM version:     2.02.171(2)-RHEL7 (2017-05-03)
Library version: 1.02.140-RHEL7 (2017-05-03)
Driver version:  4.35.0
Configuration:   ./configure --build=3Dx86_64-redhat-linux-gnu --host=3Dx86=
_64-redhat-linux-gnu --program-prefix=3D --disable-dependency-tracking --pr=
efix=3D/usr --exec-prefix=3D/usr --bindir=3D/usr/bin --sbindir=3D/usr/sbin =
--sysconfdir=3D/etc --datadir=3D/usr/share --includedir=3D/usr/include --li=
bdir=3D/usr/lib64 --libexecdir=3D/usr/libexec --localstatedir=3D/var --shar=
edstatedir=3D/var/lib --mandir=3D/usr/share/man --infodir=3D/usr/share/info=
 --with-default-dm-run-dir=3D/run --with-default-run-dir=3D/run/lvm --with-=
default-pid-dir=3D/run --with-default-locking-dir=3D/run/lock/lvm --with-us=
rlibdir=3D/usr/lib64 --enable-lvm1_fallback --enable-fsadm --with-pool=3Din=
ternal --enable-write_install --with-user=3D --with-group=3D --with-device-=
uid=3D0 --with-device-gid=3D6 --with-device-mode=3D0660 --enable-pkgconfig =
--enable-applib --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --e=
nable-python2-bindings --with-cluster=3Dinternal --with-clvmd=3Dcorosync --=
enable-cmirrord --with-udevdir=3D/usr/lib/udev/rules.d --enable-udev_sync -=
-with-thin=3Dinternal --enable-lvmetad --with-cache=3Dinternal --enable-lvm=
polld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-dmfilemapd
>Is thin_check_executable configured in lvm.conf?

Yes

I also just found out that they apparently = ran thin_check recently and got a message about a corrupt superblock, but d= idn't repair it.=C2=A0 They were still able to re-activate the pool tho= ugh. We'll run a repair as soon as we get a chance and see if that fixe= s it.

Thanks,

John

O= n Fri, May 11, 2018 at 3:54 AM Marian Csontos <mcsontos@redhat.com> wrote:
On 05/11/2018 10:21 AM, Joe Thornber wrote:=
> On Thu, May 10, 2018 at 07:30:09PM +0000, John Hamilton wrote:
>> I saw something today that I don't understand and I'm hopi= ng somebody can
>> help.=C2=A0 We had a ~2.5TB thin pool that was showing 69% data ut= ilization in
>> lvs:
>>
>> # lvs -a
>>=C2=A0 =C2=A0 LV=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 VG=C2=A0 =C2=A0 =C2=A0 =C2=A0Attr=C2=A0 =C2=A0 =C2=A0 = =C2=A0LSize=C2=A0 Pool Origin Data%
>> Meta%=C2=A0 Move Log Cpy%Sync Convert
>>=C2=A0 =C2=A0 my-pool=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0myvg twi-aot= z--=C2=A0 2.44t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A069.04=C2=A0 = 4.90
>>=C2=A0 =C2=A0 [my-pool_tdata] myvg Twi-ao----=C2=A0 2.44t
>>=C2=A0 =C2=A0 [my-pool_tmeta] myvg ewi-ao---- 15.81g

Is this everything? Is this a pool used by docker, which does not (did
not) use LVM to manage thin-volumes?

>> However, when I dump the thin pool metadata and look at the mapped= _blocks
>> for the 2 devices in the pool, I can only account for about 950GB.= =C2=A0 Here is
>> the superblock and device entries from the metadata xml.=C2=A0 The= re are no
>> other devices listed in the metadata:
>>
>> <superblock uuid=3D"" time=3D"34" transacti= on=3D"68" flags=3D"0" version=3D"2"
>> data_block_size=3D"128" nr_data_blocks=3D"0"&g= t;
>>=C2=A0 =C2=A0 <device dev_id=3D"1" mapped_blocks=3D&qu= ot;258767" transaction=3D"0"
>> creation_time=3D"0" snap_time=3D"14">
>>=C2=A0 =C2=A0 <device dev_id=3D"8" mapped_blocks=3D&qu= ot;15616093" transaction=3D"27"
>> creation_time=3D"15" snap_time=3D"34">
>>
>> That first device looks like it has about 16GB allocated to it and= the
>> second device about 950GB.=C2=A0 So, I would expect lvs to show so= mewhere
>> between 950G-966G Is something wrong, or am I misunderstanding how= to read
>> the metadata dump?=C2=A0 Where is the other 700 or so GB that lvs = is showing
>> used?
>
> The non zero snap_time suggests that you're using snapshots.=C2=A0= I which case it
> could just be there is common data shared between volumes that is gett= ing counted
> more than once.
>
> You can confirm this using the thin_ls tool and specifying a format li= ne that
> includes EXCLUSIVE_BLOCKS, or SHARED_BLOCKS.=C2=A0 Lvm doesn't tak= e shared blocks into
> account because it has to scan all the metadata to calculate what'= s shared.

LVM just queries DM, and displays whatever that provides. You could see that in `dmsetup status` output, there are two pairs of '/' separat= ed
entries - first is metadata usage (USED_BLOCKS/ALL_BLOCKS), second data usage (USED_CHUNKS/ALL_CHUNKS).

So the error lies somewhere between dmsetup and kernel.

What is kernel/lvm version?
Is thin_check_executable configured in lvm.conf?

-- Martian
--000000000000c3b281056c53be2e--