All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] Repair thin pool
@ 2016-02-05  1:21 Mars
  2016-02-05 11:44 ` M.H. Tsai
  2016-02-06 14:10 ` M.H. Tsai
  0 siblings, 2 replies; 16+ messages in thread
From: Mars @ 2016-02-05  1:21 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 2582 bytes --]

Hi there,

We're using Centos 7.0 with lvm 2.02.105 and met a problem as underlying:
After a electricity powerdown in the datacenter room, thin provision
volumes came up with wrong states:

*[root@storage ~]# lvs -a*
*  dm_report_object: report function failed for field data_percent*
*  LV                              VG               Attr       LSize
Pool        Origin           Data%  Move Log Cpy%Sync Convert*
*  DailyBuild                      vgg145155121036c Vwi-d-tz--   5.00t
pool_nas                                     *
*  dat                             vgg145155121036c Vwi-d-tz--  10.00t
pool_nas                                         *
*  lvol0                           vgg145155121036c -wi-a-----
15.36g                                                              *
*  [lvol3_pmspare]                 vgg145155121036c ewi-------  15.27g*
*  market                          vgg145155121036c Vwi-d-tz--   3.00t
pool_nas                    *
*  pool_nas                        vgg145155121036c twi-a-tz--
14.90t                                0.00                          *
*  [pool_nas_tdata]                vgg145155121036c Twi-ao----
14.90t                                                              *
*  [pool_nas_tmeta]                vgg145155121036c ewi-ao----  15.27g

                                       *
*  share                           vgg145155121036c Vwi-d-tz--  10.00t
pool_nas*


 the thin pool "pool_nas" and general lv "lvol0" are active, but thin
provision volumes cannot be actived even with cmd "lvchange -ay
thin_volume_name".

To recover it, we tried following ways refer to these mail conversations:
http://www.spinics.net/lists/lvm/msg22629.html and
http://comments.gmane.org/gmane.linux.lvm.general/14828.

1, USE: "lvconvert --repair vgg145155121036c/pool_nas"
output as below and thin volumes still cannot be active.
WARNING: If everything works, remove "vgg145155121036c/pool_nas_tmeta0".
WARNING: Use pvmove command to move "vgg145155121036c/pool_nas_tmeta" on
the best fitting PV.

2, USE manual repair steps:
2a: inactive thin pool.
2b: create a temp lv "metabak".
2c: swap the thin pool's metadata lv: "lvconvert --thinpool vgg145155121036
c/pool_nas --poolmetadata metabak -y", only with "-y" option can submit the
command.
2d: active temp lv "metabak" and create another bigger lv "metabak1".
2e: repair metadata: "thin_restore -i /dev/vgg145155121036c/metabak-o /dev/
vgg145155121036c/metabak1", and got segment fault.

So, is there any other way to recover this or some steps we do wrong?

Thank you very much.
Mars

[-- Attachment #2: Type: text/html, Size: 12176 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-05  1:21 [linux-lvm] Repair thin pool Mars
@ 2016-02-05 11:44 ` M.H. Tsai
  2016-02-05 15:17   ` Zdenek Kabelac
  2016-02-08  8:56   ` Joe Thornber
  2016-02-06 14:10 ` M.H. Tsai
  1 sibling, 2 replies; 16+ messages in thread
From: M.H. Tsai @ 2016-02-05 11:44 UTC (permalink / raw)
  To: LVM general discussion and development

Hi,

Seems that your steps are wrong.  You should run thin_repair before
swapping the pool metadata.
Also, thin_restore is for XML(text) input, not for binary metadata
input, so it's normal to get segmentation fault...

"lvconvert --repair ... " is a command wrapping "thin_repair +
swapping metadata"  into a single step.
If it doesn't work, then you might need to dump the metadata manually,
to check if there's serious corruption in mapping trees or not....
(I recommend to use the newest thin-provisioning-tools to get better result)

1. active the pool metadata (It's okay if the command failed. We just
want to activate the hidden metadata LV)
lvchange -ay vgg1/pool_nas

2. dump the metadata, then checkout the output XML
thin_dump /dev/mapper/vgg1-pool_nas_tmeta -o thin_dump.xml -r

I have experience in repairing many seriously corrupted thin pools. If
the physical medium is okay, I think that most cases are repairable.
I also wrote some extension to thin-provisioning-tools (not yet
published. the code still need some refinement...), maybe it could
help.


Ming-Hung Tsai


2016-02-05 9:21 GMT+08:00 Mars <kirapangzi@gmail.com>:
>
> Hi there,
>
> We're using Centos 7.0 with lvm 2.02.105 and met a problem as underlying:
> After a electricity powerdown in the datacenter room, thin provision volumes came up with wrong states:
>
> [root@storage ~]# lvs -a
>   dm_report_object: report function failed for field data_percent
>   LV                              VG               Attr       LSize   Pool        Origin           Data%  Move Log Cpy%Sync Convert
>   DailyBuild                      vgg145155121036c Vwi-d-tz--   5.00t pool_nas
>   dat                             vgg145155121036c Vwi-d-tz--  10.00t pool_nas
>   lvol0                           vgg145155121036c -wi-a-----  15.36g
>   [lvol3_pmspare]                 vgg145155121036c ewi-------  15.27g
>   market                          vgg145155121036c Vwi-d-tz--   3.00t pool_nas
>   pool_nas                        vgg145155121036c twi-a-tz--  14.90t                                0.00
>   [pool_nas_tdata]                vgg145155121036c Twi-ao----  14.90t
>   [pool_nas_tmeta]                vgg145155121036c ewi-ao----  15.27g
>   share                           vgg145155121036c Vwi-d-tz--  10.00t pool_nas
>
>
>  the thin pool "pool_nas" and general lv "lvol0" are active, but thin provision volumes cannot be actived even with cmd "lvchange -ay thin_volume_name".
>
> To recover it, we tried following ways refer to these mail conversations: http://www.spinics.net/lists/lvm/msg22629.html and http://comments.gmane.org/gmane.linux.lvm.general/14828.
>
> 1, USE: "lvconvert --repair vgg145155121036c/pool_nas"
> output as below and thin volumes still cannot be active.
> WARNING: If everything works, remove "vgg145155121036c/pool_nas_tmeta0".
> WARNING: Use pvmove command to move "vgg145155121036c/pool_nas_tmeta" on the best fitting PV.
>
> 2, USE manual repair steps:
> 2a: inactive thin pool.
> 2b: create a temp lv "metabak".
> 2c: swap the thin pool's metadata lv: "lvconvert --thinpool vgg145155121036c/pool_nas --poolmetadata metabak -y", only with "-y" option can submit the command.
> 2d: active temp lv "metabak" and create another bigger lv "metabak1".
> 2e: repair metadata: "thin_restore -i /dev/vgg145155121036c/metabak-o /dev/vgg145155121036c/metabak1", and got segment fault.
>
> So, is there any other way to recover this or some steps we do wrong?
>
> Thank you very much.
> Mars
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-05 11:44 ` M.H. Tsai
@ 2016-02-05 15:17   ` Zdenek Kabelac
  2016-02-05 16:12     ` M.H. Tsai
  2016-02-08  8:56   ` Joe Thornber
  1 sibling, 1 reply; 16+ messages in thread
From: Zdenek Kabelac @ 2016-02-05 15:17 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 5.2.2016 v 12:44 M.H. Tsai napsal(a):
> Hi,
>
> Seems that your steps are wrong.  You should run thin_repair before
> swapping the pool metadata.

Nope - actually they were correct.

> Also, thin_restore is for XML(text) input, not for binary metadata
> input, so it's normal to get segmentation fault...
>
> "lvconvert --repair ... " is a command wrapping "thin_repair +
> swapping metadata"  into a single step.
> If it doesn't work, then you might need to dump the metadata manually,
> to check if there's serious corruption in mapping trees or not....
> (I recommend to use the newest thin-provisioning-tools to get better result)
>
> 1. active the pool metadata (It's okay if the command failed. We just
> want to activate the hidden metadata LV)
> lvchange -ay vgg1/pool_nas
>
> 2. dump the metadata, then checkout the output XML
> thin_dump /dev/mapper/vgg1-pool_nas_tmeta -o thin_dump.xml -r

Here is actually what goes wrong.

You should not try to access 'life' metadata (unless you take thin-pool 
snapshot of them)

So by using thin-dump on life changed volume you often get 'corruptions' 
listed which actually do not exist.

That said - if your thin-pool got 'blocked' for whatever reason
(deadlock?) - reading such data which cannot be changed anymore could provide 
the 'best' guess data you could get - so in some cases it depends on use-case
(i.e. you disk is dying and it may not run at all after reboot)...


> I have experience in repairing many seriously corrupted thin pools. If
> the physical medium is okay, I think that most cases are repairable.
> I also wrote some extension to thin-provisioning-tools (not yet
> published. the code still need some refinement...), maybe it could
> help.

You should always repair data where you are sure they are not changing in 
background.

That's why --repair requires currently offline state of thin-pool.
It should do all 'swap' operations in proper order.

Zdenek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-05 15:17   ` Zdenek Kabelac
@ 2016-02-05 16:12     ` M.H. Tsai
  2016-02-05 17:28       ` Zdenek Kabelac
  0 siblings, 1 reply; 16+ messages in thread
From: M.H. Tsai @ 2016-02-05 16:12 UTC (permalink / raw)
  To: LVM general discussion and development

2016-02-05 23:17 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
> Dne 5.2.2016 v 12:44 M.H. Tsai napsal(a):
>>
>> Hi,
>>
>> Seems that your steps are wrong.  You should run thin_repair before
>> swapping the pool metadata.
>
> Nope - actually they were correct.
>
>> Also, thin_restore is for XML(text) input, not for binary metadata
>> input, so it's normal to get segmentation fault...
>>
>> "lvconvert --repair ... " is a command wrapping "thin_repair +
>> swapping metadata"  into a single step.
>> If it doesn't work, then you might need to dump the metadata manually,
>> to check if there's serious corruption in mapping trees or not....
>> (I recommend to use the newest thin-provisioning-tools to get better
>> result)
>>
>> 1. active the pool metadata (It's okay if the command failed. We just
>> want to activate the hidden metadata LV)
>> lvchange -ay vgg1/pool_nas
>>
>> 2. dump the metadata, then checkout the output XML
>> thin_dump /dev/mapper/vgg1-pool_nas_tmeta -o thin_dump.xml -r
>
> Here is actually what goes wrong.
>
> You should not try to access 'life' metadata (unless you take thin-pool
> snapshot of them)
>
> So by using thin-dump on life changed volume you often get 'corruptions'
> listed which actually do not exist.
>
> That said - if your thin-pool got 'blocked' for whatever reason
> (deadlock?) - reading such data which cannot be changed anymore could
> provide the 'best' guess data you could get - so in some cases it depends on
> use-case
> (i.e. you disk is dying and it may not run at all after reboot)...
>
> You should always repair data where you are sure they are not changing in
> background.
>
> That's why --repair requires currently offline state of thin-pool.
> It should do all 'swap' operations in proper order.
>
> Zdenek

Yes, we should repair the metadata when the pool is offline, but LVM
cannot activate a hidden metadata LV. So the easiest way is activating
the entire pool. Maybe we need some option to force activate a hidden
volume, like "lvchange -ay vgg1/pool_nas_tmeta -ff". It's useful for
repairing metadata. Otherwise, we should use dmsetup to manually
create the device.

In my experience, if the metadata had serious problem, then the pool
device usually cannot be created, so the metadata is not accessed by
kernel... Just a coincidence.


Ming-Hung Tsai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-05 16:12     ` M.H. Tsai
@ 2016-02-05 17:28       ` Zdenek Kabelac
  2016-02-06 13:14         ` M.H. Tsai
  0 siblings, 1 reply; 16+ messages in thread
From: Zdenek Kabelac @ 2016-02-05 17:28 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 5.2.2016 v 17:12 M.H. Tsai napsal(a):
> 2016-02-05 23:17 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
>> Dne 5.2.2016 v 12:44 M.H. Tsai napsal(a):
>>>
>>> Hi,
>>>
>>> Seems that your steps are wrong.  You should run thin_repair before
>>> swapping the pool metadata.
>>
>> Nope - actually they were correct.
>>
>>> Also, thin_restore is for XML(text) input, not for binary metadata
>>> input, so it's normal to get segmentation fault...
>>>
>>> "lvconvert --repair ... " is a command wrapping "thin_repair +
>>> swapping metadata"  into a single step.
>>> If it doesn't work, then you might need to dump the metadata manually,
>>> to check if there's serious corruption in mapping trees or not....
>>> (I recommend to use the newest thin-provisioning-tools to get better
>>> result)
>>>
>>> 1. active the pool metadata (It's okay if the command failed. We just
>>> want to activate the hidden metadata LV)
>>> lvchange -ay vgg1/pool_nas
>>>
>>> 2. dump the metadata, then checkout the output XML
>>> thin_dump /dev/mapper/vgg1-pool_nas_tmeta -o thin_dump.xml -r
>>
>> Here is actually what goes wrong.
>>
>> You should not try to access 'life' metadata (unless you take thin-pool
>> snapshot of them)
>>
>> So by using thin-dump on life changed volume you often get 'corruptions'
>> listed which actually do not exist.
>>
>> That said - if your thin-pool got 'blocked' for whatever reason
>> (deadlock?) - reading such data which cannot be changed anymore could
>> provide the 'best' guess data you could get - so in some cases it depends on
>> use-case
>> (i.e. you disk is dying and it may not run at all after reboot)...
>>
>> You should always repair data where you are sure they are not changing in
>> background.
>>
>> That's why --repair requires currently offline state of thin-pool.
>> It should do all 'swap' operations in proper order.
>>
>> Zdenek
>
> Yes, we should repair the metadata when the pool is offline, but LVM
> cannot activate a hidden metadata LV. So the easiest way is activating
> the entire pool. Maybe we need some option to force activate a hidden
> volume, like "lvchange -ay vgg1/pool_nas_tmeta -ff". It's useful for
> repairing metadata. Otherwise, we should use dmsetup to manually
> create the device.

But that's actually what described 'swap' is for.

You 'replace/swap'  existing metadata LV with some selected LV in VG.

Then you activate this LV - and you may do whatever you need to do.
(so you have content of  _tmeta LV  accessible through your  tmp_created_LV)

lvm2 currently doesn't support activation of 'subLVs'  as it makes activation 
of the whole tree of LVs much more complicated (clvmd support restrictions)

So ATM we take only top-level LV lock in cluster (and yes - there is still 
unresolved bug for thin-pool/thinLV - when user may 'try' to activate 
different thin LVs from a single thin-pool on multiple nodes - so for now - 
there is just one advice - don't do that - until we provide a fix for this.


>
> In my experience, if the metadata had serious problem, then the pool
> device usually cannot be created, so the metadata is not accessed by
> kernel... Just a coincidence.

So once you  i.e.   'repair'  metadata from swapped LV to some other LV
you can swap back 'fixed' metadata (and of course there should (and someday 
will) be further validation between kernel metadata and lvm2 metadata about
device IDs, transaction IDs,  devices sizes....)

This way you may even make metadata smaller if you need to (and select to 
large metadata area initially so you not waste space on this LV).

Zdenek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-05 17:28       ` Zdenek Kabelac
@ 2016-02-06 13:14         ` M.H. Tsai
  0 siblings, 0 replies; 16+ messages in thread
From: M.H. Tsai @ 2016-02-06 13:14 UTC (permalink / raw)
  To: LVM general discussion and development

2016-02-06 1:28 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
> But that's actually what described 'swap' is for.
>
> You 'replace/swap'  existing metadata LV with some selected LV in VG.
>
> Then you activate this LV - and you may do whatever you need to do.
> (so you have content of  _tmeta LV  accessible through your  tmp_created_LV)

I forget that we can use swapping to make _tmeta visible. The steps in
this page are correct.
http://www.spinics.net/lists/lvm/msg22629.html
The only typo in this post is that we should use thin_repair instead
of thin_restore.
http://permalink.gmane.org/gmane.linux.lvm.general/14829

> lvm2 currently doesn't support activation of 'subLVs'  as it makes
> activation of the whole tree of LVs much more complicated (clvmd support
> restrictions)
>
> So ATM we take only top-level LV lock in cluster (and yes - there is still
> unresolved bug for thin-pool/thinLV - when user may 'try' to activate
> different thin LVs from a single thin-pool on multiple nodes - so for now -
> there is just one advice - don't do that - until we provide a fix for this.

I didn't try clvm. Thanks for noticing.


Ming-Hung Tsai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-05  1:21 [linux-lvm] Repair thin pool Mars
  2016-02-05 11:44 ` M.H. Tsai
@ 2016-02-06 14:10 ` M.H. Tsai
  1 sibling, 0 replies; 16+ messages in thread
From: M.H. Tsai @ 2016-02-06 14:10 UTC (permalink / raw)
  To: LVM general discussion and development

Hi,

Let we review your question again. You had run "lvconvert --repair",
so now the volume pool_nas_tmeta0 is the original metadata (if you
didn't swap the metadata again). You can run thin_check and thin_dump
on pool_nas_tmeta0 to know why thin_repair doesn't work.

thin_check /dev/mapper/vgg145155121036c-pool_nas_tmeta0 > thin_check.log 2>&1
thin_dump /dev/mapper/vgg145155121036c-pool_nas_tmeta0 -o thin_dump.xml -r


Ming-Hung Tsai

2016-02-05 9:21 GMT+08:00 Mars <kirapangzi@gmail.com>:
> Hi there,
>
> We're using Centos 7.0 with lvm 2.02.105 and met a problem as underlying:
> After a electricity powerdown in the datacenter room, thin provision volumes
> came up with wrong states:
>
> [root@storage ~]# lvs -a
>   dm_report_object: report function failed for field data_percent
>   LV                              VG               Attr       LSize   Pool
> Origin           Data%  Move Log Cpy%Sync Convert
>   DailyBuild                      vgg145155121036c Vwi-d-tz--   5.00t
> pool_nas
>   dat                             vgg145155121036c Vwi-d-tz--  10.00t
> pool_nas
>   lvol0                           vgg145155121036c -wi-a-----  15.36g
>   [lvol3_pmspare]                 vgg145155121036c ewi-------  15.27g
>   market                          vgg145155121036c Vwi-d-tz--   3.00t
> pool_nas
>   pool_nas                        vgg145155121036c twi-a-tz--  14.90t
> 0.00
>   [pool_nas_tdata]                vgg145155121036c Twi-ao----  14.90t
>   [pool_nas_tmeta]                vgg145155121036c ewi-ao----  15.27g
>   share                           vgg145155121036c Vwi-d-tz--  10.00t
> pool_nas
>
>
>  the thin pool "pool_nas" and general lv "lvol0" are active, but thin
> provision volumes cannot be actived even with cmd "lvchange -ay
> thin_volume_name".
>
> To recover it, we tried following ways refer to these mail conversations:
> http://www.spinics.net/lists/lvm/msg22629.html and
> http://comments.gmane.org/gmane.linux.lvm.general/14828.
>
> 1, USE: "lvconvert --repair vgg145155121036c/pool_nas"
> output as below and thin volumes still cannot be active.
> WARNING: If everything works, remove "vgg145155121036c/pool_nas_tmeta0".
> WARNING: Use pvmove command to move "vgg145155121036c/pool_nas_tmeta" on the
> best fitting PV.
>
> 2, USE manual repair steps:
> 2a: inactive thin pool.
> 2b: create a temp lv "metabak".
> 2c: swap the thin pool's metadata lv: "lvconvert --thinpool
> vgg145155121036c/pool_nas --poolmetadata metabak -y", only with "-y" option
> can submit the command.
> 2d: active temp lv "metabak" and create another bigger lv "metabak1".
> 2e: repair metadata: "thin_restore -i /dev/vgg145155121036c/metabak-o
> /dev/vgg145155121036c/metabak1", and got segment fault.
>
> So, is there any other way to recover this or some steps we do wrong?
>
> Thank you very much.
> Mars

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-05 11:44 ` M.H. Tsai
  2016-02-05 15:17   ` Zdenek Kabelac
@ 2016-02-08  8:56   ` Joe Thornber
  2016-02-08 18:03     ` M.H. Tsai
  1 sibling, 1 reply; 16+ messages in thread
From: Joe Thornber @ 2016-02-08  8:56 UTC (permalink / raw)
  To: LVM general discussion and development

On Fri, Feb 05, 2016 at 07:44:46PM +0800, M.H. Tsai wrote:
> I also wrote some extension to thin-provisioning-tools (not yet
> published. the code still need some refinement...), maybe it could
> help.

I'd definitely like to see what you changed please.

- Joe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-08  8:56   ` Joe Thornber
@ 2016-02-08 18:03     ` M.H. Tsai
  2016-02-10 10:32       ` Joe Thornber
  0 siblings, 1 reply; 16+ messages in thread
From: M.H. Tsai @ 2016-02-08 18:03 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

2016-02-08 16:56 GMT+08:00 Joe Thornber <thornber@redhat.com>:
> On Fri, Feb 05, 2016 at 07:44:46PM +0800, M.H. Tsai wrote:
>> I also wrote some extension to thin-provisioning-tools (not yet
>> published. the code still need some refinement...), maybe it could
>> help.
>
> I'd definitely like to see what you changed please.
>
> - Joe

I wrote some tools to do "semi-auto" repair, called thin_ll_dump and
thin_ll_restore (low-level dump & restore), that can find orphan nodes
and reconstruct the metadata using orphan nodes. It could cope the cases
that the top-level data mapping tree or some higher-level nodes were
broken, to complement the repairing feature of thin_repair.

Although that users are required to have knowledge about dm-thin metadata
before using these tools (you need to specify which orphan node to use), I
think that these tools are useful for system administrators. Most thin-pool
corruption cases I experienced (caused by power lost, broken disks, RAID
corruption, etc.) cannot be handled by the current thin-provisioning-tools
--  thin_repair is fully automatic, but it just skips broken nodes.
However, those missing mappings could be found in orphan nodes.

Also, I wrote another tool called thin_scan, to show the entire metadata
layout and scan broken nodes. (which is an enhanced version of
thin_show_block in branch low_level_examine_metadata -- I didn't notice
that before... maybe the name thin_show_block sounds more clear?)

What do you think about these features? Are they worth to be merged to the
upstream?


Thanks,
Ming-Hung Tsai

[-- Attachment #2: Type: text/html, Size: 1808 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-08 18:03     ` M.H. Tsai
@ 2016-02-10 10:32       ` Joe Thornber
  2016-02-14  8:54         ` M.H. Tsai
  0 siblings, 1 reply; 16+ messages in thread
From: Joe Thornber @ 2016-02-10 10:32 UTC (permalink / raw)
  To: LVM general discussion and development

On Tue, Feb 09, 2016 at 02:03:39AM +0800, M.H. Tsai wrote:
> 2016-02-08 16:56 GMT+08:00 Joe Thornber <thornber@redhat.com>:
> > On Fri, Feb 05, 2016 at 07:44:46PM +0800, M.H. Tsai wrote:
> >> I also wrote some extension to thin-provisioning-tools (not yet
> >> published. the code still need some refinement...), maybe it could
> >> help.
> >
> > I'd definitely like to see what you changed please.
> >
> > - Joe
> 
> I wrote some tools to do "semi-auto" repair, called thin_ll_dump and
> thin_ll_restore (low-level dump & restore), that can find orphan nodes
> and reconstruct the metadata using orphan nodes. It could cope the cases
> that the top-level data mapping tree or some higher-level nodes were
> broken, to complement the repairing feature of thin_repair.
> 
> Although that users are required to have knowledge about dm-thin metadata
> before using these tools (you need to specify which orphan node to use), I
> think that these tools are useful for system administrators. Most thin-pool
> corruption cases I experienced (caused by power lost, broken disks, RAID
> corruption, etc.) cannot be handled by the current thin-provisioning-tools
> --  thin_repair is fully automatic, but it just skips broken nodes.
> However, those missing mappings could be found in orphan nodes.
> 
> Also, I wrote another tool called thin_scan, to show the entire metadata
> layout and scan broken nodes. (which is an enhanced version of
> thin_show_block in branch low_level_examine_metadata -- I didn't notice
> that before... maybe the name thin_show_block sounds more clear?)
> 
> What do you think about these features? Are they worth to be merged to the
> upstream?

Yep, I definitely want these for upstream.  Send me what you've got,
whatever state it's in; I'll happily spend a couple of weeks tidying
this.

- Joe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-10 10:32       ` Joe Thornber
@ 2016-02-14  8:54         ` M.H. Tsai
  0 siblings, 0 replies; 16+ messages in thread
From: M.H. Tsai @ 2016-02-14  8:54 UTC (permalink / raw)
  To: LVM general discussion and development

2016-02-10 18:32 GMT+08:00 Joe Thornber <thornber@redhat.com>:
> Yep, I definitely want these for upstream.  Send me what you've got,
> whatever state it's in; I'll happily spend a couple of weeks tidying
> this.
>
> - Joe

The feature was completed & workable, but the code is based on v0.4.1.
I need some days to clean up & rebase. Please wait.

syntax:
thin_ll_dump /dev/mapper/corrupted_tmeta [-o thin_ll_dump.xml]
thin_ll_restore -i edited_thin_ll_dump.xml -E
/dev/mapper/corrupted_tmeta -o /dev/mapper/fixed_tmeta


Ming-Hung Tsai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-21 15:41 ` M.H. Tsai
@ 2016-02-23 12:12   ` M.H. Tsai
  0 siblings, 0 replies; 16+ messages in thread
From: M.H. Tsai @ 2016-02-23 12:12 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Mars

The original post asks how to do if the superblock was broken (his superblock
was accidentally wiped). Since that I don't have time to update the program
at this moment, here's my workaround:

1. Partially rebuild the superblock

  (1) Obtain pool parameter from LVM

       ./sbin/lvm lvs vg1/tp1 -o transaction_id,chunksize,lv_size --units s

      sample output:
       Tran Chunk LSize
       3545  128S 7999381504S

      The number of data blocks is $((7999381504/128)) = 62495168

  (2) Create input.xml with pool parameters obtained from LVM:

       <superblock uuid="" time="0" transaction="3545"
data_block_size="128" nr_data_blocks="62495168">
       </superblock>

  (3) Run thin_restore to generate a temporary metadata with correct superblock

       dd if=/dev/zero of=/tmp/test.bin bs=1M count=16
       thin_restore -i input.xml -o /tmp/test.bin

      The size of /tmp/test.bin depends on your pool size.

  (4) Copy the partially-rebuilt superblock (4KB) to your broken metadata.
      (<src_metadata>).

      dd if=/tmp/test.bin of=<src_metadata> bs=4k count=1 conv=notrunc

2. Run thin_ll_dump and thin_ll_restore
    https://www.redhat.com/archives/linux-lvm/2016-February/msg00038.html

   Example: assume that we found data-mapping-root=2303
     and device-details-root=277313

   ./pdata_tools thin_ll_dump <src_metadata> --data-mapping-root=2303 \
              --device-details-root 277313 -o thin_ll_dump.txt

   ./pdata_tools thin_ll_restore -E <src_metadata> -i thin_ll_dump.txt \
                                 -o <dst_metadata>

   Note that <dst_metadata> should be sufficient large especially when you
   have snapshots, since that the mapping trees reconstructed by thintools
   do not share blocks.

3. Fix superblock's time field

  (1) Run thin_dump on the repaired metadata

      thin_dump <dst_metadata> -o thin_dump.txt

  (2) Find the maximum time value in data mapping trees
      (the device with maximum snap_time might be remove, so find the
       maximum time in data mapping trees, not the device detail tree)

      grep "time=\"[0-9]*\"" thin_dump.txt -o | uniq | sort | uniq | tail

      (I run uniq twice to avoid sorting too much data)

      sample output:
        ...
        time="1785"
        time="1786"
        time="1787"

      so the maximum time is 1787.

  (3) Edit the "time" value of the <superblock> tag in thin_dump's output

     <superblock uuid="" time="1787" ... >
        ...

  (4) Run thin_restore to get the final metadata

      thin_restore -i thin_dump.txt -o <dst_metadata>


Ming-Hung Tsai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
       [not found] <CAGU4k=0B-uXUvc0gNyYH3eF62tNWoGBWVpqg826YH6Xo1Gp4Aw@mail.gmail.com>
  2016-02-18 14:22 ` M.H. Tsai
@ 2016-02-21 15:41 ` M.H. Tsai
  2016-02-23 12:12   ` M.H. Tsai
  1 sibling, 1 reply; 16+ messages in thread
From: M.H. Tsai @ 2016-02-21 15:41 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Mars

Hi,

I updated the program with some bug fix. Please download it again.
https://www.dropbox.com/s/6g8gm1hndxp3rpd/pdata_tools?dl=0

Here's a quick guide to manually repair a metadata, if thin_repair doesn't help.

1. Run thin_scan to do some basic checking

  ./pdata_tools thin_scan <metadata> [-o <output.xml>]

  The output contains information about:
  (1) metadata blocks' type, properties, and integrity
  (2) metadata utilization, then you can ignore the rest of metadata
      (usually, the last block is an index_block)

  Example output:
    <single_block type="superblock" location="0" ref_count="4294967295" \
                  is_valid="1"/>
    <range_block type="bitmap_block" location_begin="1" blocknr_begin="1" \
                 length="3" ref_count="4294967295" is_valid="1"/>
    ...
    <single_block type="index_block" location="26268" blocknr="26268" \
                  ref_count="4294967295" is_valid="1"/>


2. Check data mapping tree and device details tree

  If you don't know how to use thin_debug or superblock's layout,
  then you can use thin_ll_dump to obtain the tree roots:

  ./pdata_tools thin_ll_dump <metadata> [-o <output.xml>] \
                                        [--end <last_utilized_block+1>]

  Example output:
    <superblock blocknr="0" data_mapping_root="25036" \
                device_details_root="25772">
      ...
    </superblock>
    <orphans>
      ...
    </orphans>

  According to thin_scan's output, we know that the data_mapping_root and
  device_details_root points to a wrong location. That's why thin_dump doesn't
  work.

  <range_block type="btree_leaf" location_begin="25031" blocknr_begin="25031" \
               length="7" ref_count="4" is_valid="1" value_size="8"/>
  ...
  <range_block type="btree_leaf" location_begin="25772" blocknr_begin="25772" \
               length="2" ref_count="4" is_valid="1" value_size="4"/>


3. Find the correct data mapping root and device details root

  (1) If you are using LVM, run lvs to know the thin device ids. That's the key
      for data mapping tree and device details tree. Try to find the nodes with
      key ranges containing the device ids (see thin_scan's output)
  (2) For device details tree, if you have less than 127 thin volumes, then the
      tree root is also a leaf. Check the nodes with value_size="24".

  Example:
  (1) data_mapping_root = 22917 or 25316
    (see thin_ll_dump's output)
    <node blocknr="22917" flags="2" key_begin="1" key_end="105" \
          nr_entries="74"/>
    <node blocknr="25316" flags="2" key_begin="1" key_end="105" \
          nr_entries="74"/>

  (2) device_details_root = 26263 or 26267
    (see thin_scan's output)
    <single_block type="btree_leaf" location="26263" blocknr="26263" \
                  ref_count="4294967295" is_valid="1" value_size="24"/>
    <single_block type="btree_leaf" location="26267" blocknr="26267" \
                  ref_count="4294967295" is_valid="1" value_size="24"/>

  Currently, thin_ll_dump only lists orphan nodes with value_size==8,
  so the orphan device-details leaves won't be listed.


4. Run thin_ll_dump with correct root information:

  ./pdata_tools thin_ll_dump <metadata_file> --device-details-root=<blocknr> \
                --data-mapping-root=<blocknr> [-o thin_ll_dump.xml] \
                         [--end=<last_utilized_block+1>]

  Example:
  ./bin/pdata_tools thin_ll_dump server_meta.bin --device-details-root=26263 \
                    --data-mapping-root=22917 -o thin_ll_dump2.xml --end=26269

  If the roots are correct, then the number of orphans should be less
than before.


5. Run thin_ll_restore to recover the metadata

  ./bin/pdata_tools thin_ll_restore -i <edited thin_ll_dump.xml> \
                    -E <source metadata> -o <output metadata>

  Example (restore to /dev/loop0):
  ./bin/pdata_tools thin_ll_restore -i thin_ll_dump.xml -E server_meta.bin \
                    -o /dev/loop0


Advance use of thin_ll_restore
===============================

1. Handle the case if the root was broken, and you can only find some internal
   or leaf nodes.

  Example: All the mappings reachable from block#1234 and #4567 will
be dumped to device#1.
  <superblock blocknr="0" data_mapping_root="22917" device_details_root="26263">
    <device dev_id="1">
      <node blocknr="1234"/>
      <node blocknr="5678"/>
      ...
    </device>
  </superblock>

2. Create a new device

  If the device_id cannot be found in the device details tree,
  then thin_ll_dump will create a new device with default device_details values.


Please let me know if you have any questions.


Ming-Hung Tsai

2016-02-18 15:17 GMT+08:00 Mars <kirapangzi@gmail.com>:
> Hi,
> We have tried your tools, here's the result:
>
> ...
>
> The output file have nearly 20000 lines and you can find it in attachment.
>
> Thank you very much.
> Mars

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
       [not found] <CAGU4k=0B-uXUvc0gNyYH3eF62tNWoGBWVpqg826YH6Xo1Gp4Aw@mail.gmail.com>
@ 2016-02-18 14:22 ` M.H. Tsai
  2016-02-21 15:41 ` M.H. Tsai
  1 sibling, 0 replies; 16+ messages in thread
From: M.H. Tsai @ 2016-02-18 14:22 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: kirapangzi

2016-02-18 15:17 GMT+08:00 Mars <kirapangzi@gmail.com>:
> Hi,
> We have tried your tools, here's the result:
>
> <superblock>
>   <device dev_id="7050">
>   </device>
>   <device dev_id="7051">
>   </device>
> ...
> </superblock>
> <orphans>
>   <node blocknr="22496" flags="2" key_begin="0" key_end="128" nr_entries="126"/>
>   <node blocknr="17422" flags="2" key_begin="0" key_end="128" nr_entries="126"/>
>   <node blocknr="23751" flags="2" key_begin="0" key_end="2175" nr_entries="126"/>
> ...
>   <node blocknr="26257" flags="2" key_begin="7972758" key_end="50331647" nr_entries="242"/>
> </orphans>
>
> The output file have nearly 20000 lines and you can find it in attachment.

Looks strange. How many thin volumes do you have? The top-level
mappings tree contains 208 keys, so the top-level mapping tree might
point to a wrong location. Also, no mapped value was output, not sure
if it is a bug...

1. Please run lvs to show the device id
lvs -o lv_name,thin_id
The try to find the orphan nodes with key range containing the device
ids. That could be the real top-level tree node.

2. What's your pool chunk size?
lvs vgg145155121036c/pool_nas -o chunksize

3. Could you please provide your RAW metadata for me to debug? I want
to know why the output went wrong...
You don't need to dump the entire 16GB metadata:

(1) Please run thin_scan to know the metadata utilization (do not rely
on the metadata space map)

./pdata_tools thin_scan /dev/mapper/vgg145155121036c-pool_nas_tmeta0

You don't need to wait for it to complete scanning. Ctrl-C to stop the
program when it stuck for some minutes. The last line is the last
utilized metadata block. For example:

...
<single_block type="btree_leaf" location="234518" blocknr="234518"
ref_count="0" is_valid="1" value_size="4"/>
<single_block type="btree_leaf" location="234519" blocknr="234519"
ref_count="0" is_valid="1" value_size="32"/>
<single_block type="index_block" location="234520" blocknr="234520"
ref_count="0" is_valid="1"/>
(the program stuck here, break the program)

Then block#234520 is the last utilized metadata block. Usually it is
an index_block.

(2) dump & compress the used part. Send me the file if you can.
dd if=/dev/mapper/vgg145155121036c-pool_nas_tmeta0 of=tmeta.bin bs=4K
count=$((234520+1))
tar -czvf tmeta.tar.gz tmeta.bin


Thanks,
Ming-Hung Tsai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
  2016-02-17  2:48 Mars
@ 2016-02-17  9:29 ` M.H. Tsai
  0 siblings, 0 replies; 16+ messages in thread
From: M.H. Tsai @ 2016-02-17  9:29 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 1424 bytes --]

2016-02-17 10:48 GMT+08:00 Mars <kirapangzi@gmail.com>:
> Hi,
>
> Thank you very much for giving us so many advices.
>
> Here are some progresses based on you guys mail conversations:
>
> 1,check metadata device:
>
> [root@stor14 home]# thin_check
/dev/mapper/vgg145155121036c-pool_nas_tmeta0
> examining superblock
> examining devices tree
> examining mapping tree
>
> 2,dump metadata info:
>
> [root@stor14 home]# thin_dump /dev/mapper/vgg145155121036c-pool_nas_tmeta0
> -o nas_thin_dump.xml -r
> [root@stor14 home]# cat nas_thin_dump.xml
> <superblock uuid="" time="1787" transaction="3545" data_block_size="128"
> nr_data_blocks="249980672">
> </superblock>
>
> Compared with other normal pools, it seems like all device nodes and
mapping
> info in the metadata lv have lost.

Two possibilities: The device details tree was broken, or worse, the data
mapping tree was broken.

> Is there happened to be 'orphan nodes'? and could you give us your
semi-auto
> repair tools so we can repair it?

Sorry, the code is not finished. Please try my binary first (static binary
compiled on Ubuntu 14.04):
https://www.dropbox.com/s/6g8gm1hndxp3rpd/pdata_tools?dl=0

Please provide the output of thin_ll_dump:
./pdata_tools thin_ll_dump /dev/mapper/vgg145155121036c-pool_nas_tmeta0 -o
nas_thin_ll_dump.xml
(it needs some minutes to go, since that it scan through the entire
metadata (16GB!). I'll improve it later.


Ming-Hung Tsai

[-- Attachment #2: Type: text/html, Size: 1932 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] Repair thin pool
@ 2016-02-17  2:48 Mars
  2016-02-17  9:29 ` M.H. Tsai
  0 siblings, 1 reply; 16+ messages in thread
From: Mars @ 2016-02-17  2:48 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

2016-02-10 18:32 GMT+08:00 Joe Thornber <thornber redhat com>:

> Yep, I definitely want these for upstream.  Send me what you've got,

> whatever state it's in; I'll happily spend a couple of weeks tidying

> this.

>

> - Joe

The feature was completed & workable, but the code is based on v0.4.1.

I need some days to clean up & rebase. Please wait.

syntax:

thin_ll_dump /dev/mapper/corrupted_tmeta [-o thin_ll_dump.xml]

thin_ll_restore -i edited_thin_ll_dump.xml -E

/dev/mapper/corrupted_tmeta -o /dev/mapper/fixed_tmeta

Ming-Hung Tsai

-------------

Hi,

Thank you very much for giving us so many advices.


Here are some progresses based on you guys mail conversations:

1,check metadata device:

[root@stor14 home]# thin_check /dev/mapper/vgg145155121036c-pool_nas_tmeta0
examining superblock
examining devices tree
examining mapping tree

2,dump metadata info:

[root@stor14 home]# thin_dump
/dev/mapper/vgg145155121036c-pool_nas_tmeta0 -o nas_thin_dump.xml -r
[root@stor14 home]# cat nas_thin_dump.xml
<superblock uuid="" time="1787" transaction="3545"
data_block_size="128" nr_data_blocks="249980672">
</superblock>

Compared with other normal pools, it seems like all device nodes and
mapping info in the metadata lv have lost.

Is there happened to be 'orphan nodes'? and could you give us your
semi-auto repair tools so we can repair it?


Thank you very much!

Mars

[-- Attachment #2: Type: text/html, Size: 2752 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-02-23 12:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-05  1:21 [linux-lvm] Repair thin pool Mars
2016-02-05 11:44 ` M.H. Tsai
2016-02-05 15:17   ` Zdenek Kabelac
2016-02-05 16:12     ` M.H. Tsai
2016-02-05 17:28       ` Zdenek Kabelac
2016-02-06 13:14         ` M.H. Tsai
2016-02-08  8:56   ` Joe Thornber
2016-02-08 18:03     ` M.H. Tsai
2016-02-10 10:32       ` Joe Thornber
2016-02-14  8:54         ` M.H. Tsai
2016-02-06 14:10 ` M.H. Tsai
2016-02-17  2:48 Mars
2016-02-17  9:29 ` M.H. Tsai
     [not found] <CAGU4k=0B-uXUvc0gNyYH3eF62tNWoGBWVpqg826YH6Xo1Gp4Aw@mail.gmail.com>
2016-02-18 14:22 ` M.H. Tsai
2016-02-21 15:41 ` M.H. Tsai
2016-02-23 12:12   ` M.H. Tsai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.