[linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device

All of lore.kernel.org
 help / color / mirror / Atom feed

* [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
@ 2015-12-23  9:50 M.H. Tsai
  2015-12-24  9:04 ` Zdenek Kabelac
  0 siblings, 1 reply; 14+ messages in thread
From: M.H. Tsai @ 2015-12-23  9:50 UTC (permalink / raw)
  To: LVM general discussion and development

Hi All,

I'm running LVM2.2.02.138 on Ubuntu 14.04.  When I try to expand a
thinpool, I found that lvextend doesn't expand the top-level dm-linear
device of the thinpool. The following are the reproduce steps

# lvcreate vg1 --type thin-pool --thinpool tp1 --size 1g
--poolmetadataspare=n -Zn
# lvcreate vg1 --type thin --thinpool tp1 --virtualsize 100m --name lvol0
# lvextend vg1/tp1 --size +100m

After running lvextend, the table of vg1-tp1_tdata and vg1-tp1-tpool
are expanded, but the dm-linear table of vg1-tp1 remains unchanged.

I think that the function _lv_update_and_reload() erroneously operates
on the holder of of tp1, that is, lvol0. This might be caused by
commit fa64823, hence the subsequent actions runs on the lock_lv. The
verbose output also shows that the tree_action() is running on lvol0,
not tp1.

Creating PRELOAD tree for vg1/lvol0.
Creating SUSPEND tree for vg1/lvol0.
Creating ACTIVATE tree for vg1/lvol0.
Creating CLEAN tree for vg1/lvol0.

Is that a bug?

Thanks,
Ming-Hung Tsai

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-23  9:50 [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device M.H. Tsai
@ 2015-12-24  9:04 ` Zdenek Kabelac
  2015-12-25  2:27   ` M.H. Tsai
  0 siblings, 1 reply; 14+ messages in thread
From: Zdenek Kabelac @ 2015-12-24  9:04 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 23.12.2015 v 10:50 M.H. Tsai napsal(a):
> Hi All,
>
> I'm running LVM2.2.02.138 on Ubuntu 14.04.  When I try to expand a
> thinpool, I found that lvextend doesn't expand the top-level dm-linear
> device of the thinpool. The following are the reproduce steps
>
> # lvcreate vg1 --type thin-pool --thinpool tp1 --size 1g
> --poolmetadataspare=n -Zn
> # lvcreate vg1 --type thin --thinpool tp1 --virtualsize 100m --name lvol0
> # lvextend vg1/tp1 --size +100m
>
> After running lvextend, the table of vg1-tp1_tdata and vg1-tp1-tpool
> are expanded, but the dm-linear table of vg1-tp1 remains unchanged.
>
> I think that the function _lv_update_and_reload() erroneously operates
> on the holder of of tp1, that is, lvol0. This might be caused by
> commit fa64823, hence the subsequent actions runs on the lock_lv. The
> verbose output also shows that the tree_action() is running on lvol0,
> not tp1.
>
> Creating PRELOAD tree for vg1/lvol0.
> Creating SUSPEND tree for vg1/lvol0.
> Creating ACTIVATE tree for vg1/lvol0.
> Creating CLEAN tree for vg1/lvol0.
>
> Is that a bug?
>
>
> Thanks,
> Ming-Hung Tsai

Hi

Please check with commit cd8e95d9337207a8f87a6f68dc9b1db7e3828bbf included 
(2.02.139).

It's been known issue, the size of top-level 'fake' pool device is however not 
really important - no one should be actually using it and the size could have 
been artificial.

In fact - I do plan to rework this 'pool' device 'faking' to avoid need
of this 'extra' device - but it's 'a little bit' complex - so it will take
some time (I've even fix to correct the size of fake device - but then I've 
realized it would be actually much better without it)

So do not worry about the size of this device - the only device which does 
matter is -tpool.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-24  9:04 ` Zdenek Kabelac
@ 2015-12-25  2:27   ` M.H. Tsai
  2015-12-25 18:37     ` Zdenek Kabelac
  0 siblings, 1 reply; 14+ messages in thread
From: M.H. Tsai @ 2015-12-25  2:27 UTC (permalink / raw)
  To: LVM general discussion and development

Hi,

Sorry, what's the purpose of commit cd8e95d9337207a8f87a6f68dc9b1db7e3828bbf ?

Another issue is, the current code suspends the first thin volume
(just the first one!)
while extending a thinpool, which is unnecessary and also harms IO performance.

Also, If the top-level "fake" pool device is unimportant, why not
remove it, or simply
make tp1 as a thin-pool target? It seems that the commit 00a45ca4 want
to do this,
it removes the -tpool layer while activating a new thinpool. But if
there are thin
volumes, the -tpool layer back again. This make me confused -- Do we really need
layers?


Thanks,
Ming-Hung Tsai

2015-12-24 17:04 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
> Dne 23.12.2015 v 10:50 M.H. Tsai napsal(a):
>
>> After running lvextend, the table of vg1-tp1_tdata and vg1-tp1-tpool
>> are expanded, but the dm-linear table of vg1-tp1 remains unchanged.
>>
>> I think that the function _lv_update_and_reload() erroneously operates
>> on the holder of of tp1, that is, lvol0. This might be caused by
>> commit fa64823, hence the subsequent actions runs on the lock_lv. The
>> verbose output also shows that the tree_action() is running on lvol0,
>> not tp1.
>>
>> Creating PRELOAD tree for vg1/lvol0.
>> Creating SUSPEND tree for vg1/lvol0.
>> Creating ACTIVATE tree for vg1/lvol0.
>> Creating CLEAN tree for vg1/lvol0.
>>
>> Is that a bug?
>>
>>
>> Thanks,
>> Ming-Hung Tsai
>
>
> Hi
>
> Please check with commit cd8e95d9337207a8f87a6f68dc9b1db7e3828bbf included
> (2.02.139).
>
> It's been known issue, the size of top-level 'fake' pool device is however
> not really important - no one should be actually using it and the size could
> have been artificial.
>
> In fact - I do plan to rework this 'pool' device 'faking' to avoid need
> of this 'extra' device - but it's 'a little bit' complex - so it will take
> some time (I've even fix to correct the size of fake device - but then I've
> realized it would be actually much better without it)
>
> So do not worry about the size of this device - the only device which does
> matter is -tpool.
>
> Regards
>
> Zdenek
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-25  2:27   ` M.H. Tsai
@ 2015-12-25 18:37     ` Zdenek Kabelac
  2015-12-27  9:19       ` M.H. Tsai
  2015-12-27 13:09       ` M.H. Tsai
  0 siblings, 2 replies; 14+ messages in thread
From: Zdenek Kabelac @ 2015-12-25 18:37 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 25.12.2015 v 03:27 M.H. Tsai napsal(a):
> Hi,
>
> Sorry, what's the purpose of commit cd8e95d9337207a8f87a6f68dc9b1db7e3828bbf ?
>

Ahh my mistake - related to rename...

> Another issue is, the current code suspends the first thin volume
> (just the first one!)
> while extending a thinpool, which is unnecessary and also harms IO performance.
>
> Also, If the top-level "fake" pool device is unimportant, why not
> remove it, or simply
> make tp1 as a thin-pool target? It seems that the commit 00a45ca4 want
> to do this,
> it removes the -tpool layer while activating a new thinpool. But if
> there are thin
> volumes, the -tpool layer back again. This make me confused -- Do we really need
> layers?
>

It's not so simple - since the user may activate  thin-pool without activation 
of any thin-volume, and keep thin-pool active while thin-volumes are activated 
and deactivated - so it's different case if user activates
thin-pool explicitly or thin-pool is activated as thin volume dependency.

Also thin-pool LV has its own cluster lock which is quite complicated to 
explain, but for now -  thin-pool size is unimportant, but it's existence is 
mandatory :)

There is no simple way to avoid using current  'layer' logic, but as said
I've already do have some plan how to go without 'extra' layer 'fake' LV,
but it will take some time to implement.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-25 18:37     ` Zdenek Kabelac
@ 2015-12-27  9:19       ` M.H. Tsai
  2015-12-27 13:09       ` M.H. Tsai
  1 sibling, 0 replies; 14+ messages in thread
From: M.H. Tsai @ 2015-12-27  9:19 UTC (permalink / raw)
  To: LVM general discussion and development

2015-12-26 2:37 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
> Dne 25.12.2015 v 03:27 M.H. Tsai napsal(a):
>
> It's not so simple - since the user may activate  thin-pool without
> activation of any thin-volume, and keep thin-pool active while thin-volumes
> are activated and deactivated - so it's different case if user activates
> thin-pool explicitly or thin-pool is activated as thin volume dependency.
>
> Also thin-pool LV has its own cluster lock which is quite complicated to
> explain, but for now -  thin-pool size is unimportant, but it's existence is
> mandatory :)

I

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-25 18:37     ` Zdenek Kabelac
  2015-12-27  9:19       ` M.H. Tsai
@ 2015-12-27 13:09       ` M.H. Tsai
  2015-12-29 21:06         ` Zdenek Kabelac
  2016-02-12 12:40         ` Zdenek Kabelac
  1 sibling, 2 replies; 14+ messages in thread
From: M.H. Tsai @ 2015-12-27 13:09 UTC (permalink / raw)
  To: LVM general discussion and development

Sorry, I sent a wrong mail before. Please ignore it.

2015-12-26 2:37 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
> Dne 25.12.2015 v 03:27 M.H. Tsai napsal(a):
>
> It's not so simple - since the user may activate  thin-pool without
> activation of any thin-volume, and keep thin-pool active while thin-volumes
> are activated and deactivated - so it's different case if user activates
> thin-pool explicitly or thin-pool is activated as thin volume dependency.
>
> Also thin-pool LV has its own cluster lock which is quite complicated to
> explain, but for now -  thin-pool size is unimportant, but it's existence is
> mandatory :)

I have three questions:

1. If we need to preserve the -tpool layer, why the commit 00a45ca4
activates a new thinpool (transaction_id == 0) without overlay?

2. Is it necessary to suspend any thin volume while extending a
thinpool? If not, the commit fa648234 might need some fix.

3. Similary to question(2), is it necessary to suspend thin-pool while
expanding a thin-volume ? If no, we should adopt the approach of
a900d150e for thin-volume expansion. The following is my solution:

diff --git a/lib/activate/dev_manager.c b/lib/activate/dev_manager.c
index 588031d..ab75e9e 100644
--- a/lib/activate/dev_manager.c
+++ b/lib/activate/dev_manager.c
@@ -1980,6 +1980,18 @@ static int _add_lv_to_dtree(struct dev_manager
*dm, struct dm_tree *dtree,
  if ((node = dm_tree_find_node_by_uuid(dtree, uuid)))
  dm_tree_node_skip_childrens(node, 1);
 #endif
+ /* Add target when building SUSPEND tree for origin-only thin-volume */
+ if (!dm->activation && dm->suspend) {
+ struct lv_activate_opts laopts = {
+ .origin_only = 1,
+ };
+ log_debug_activation("Adding target of %s to SUSPEND tree", lv->name);
+ if (!(uuid = build_dm_uuid(dm->mem, lv->lvid.s, NULL)))
+ return_0;
+ if ((node = dm_tree_find_node_by_uuid(dtree, uuid)) &&
+    !_add_target_to_dtree(dm, node, first_seg(lv), &laopts))
+ return_0;
+ }
  }

  if (origin_only && dm->activation && !dm->skip_external_lv &&
diff --git a/lib/metadata/lv_manip.c b/lib/metadata/lv_manip.c
index 76e7895..f55d9a7 100644
--- a/lib/metadata/lv_manip.c
+++ b/lib/metadata/lv_manip.c
@@ -5336,6 +5336,7 @@ int lv_resize(struct cmd_context *cmd, struct
logical_volume *lv,
  struct volume_group *vg = lv->vg;
  struct logical_volume *lock_lv = NULL;
  int inactive = 0;
+ int ret = 0;

  if (lv_is_cache_type(lv)) {
  log_error("Unable to resize logical volumes of cache type.");
@@ -5381,7 +5382,9 @@ int lv_resize(struct cmd_context *cmd, struct
logical_volume *lv,
  }

  /* store vg on disk(s) */
- if (!lv_update_and_reload(lock_lv))
+ if (!(ret = lv_is_thin_volume(lock_lv) ?
+    lv_update_and_reload_origin(lock_lv) :
+    lv_update_and_reload(lock_lv)))
  goto_bad;

  if (lv_is_cow_covering_origin(lv))
diff --git a/libdm/libdm-deptree.c b/libdm/libdm-deptree.c
index 73f2e4c..c7ee955 100644
--- a/libdm/libdm-deptree.c
+++ b/libdm/libdm-deptree.c
@@ -1744,6 +1744,7 @@ int dm_tree_suspend_children(struct dm_tree_node *dnode,
  const struct dm_info *dinfo;
  const char *name;
  const char *uuid;
+ struct load_segment *seg = NULL;

  /* Suspend nodes at this level of the tree */
  while ((child = dm_tree_next_child(&handle, dnode, 0))) {
@@ -1787,6 +1788,13 @@ int dm_tree_suspend_children(struct dm_tree_node *dnode,
  continue;
  }

+ /* Do not suspend the thinpool while reloading the table of a thin volume */
+ seg = dm_list_item(dm_list_last(&child->props.segs), struct load_segment);
+ if (dinfo->inactive_table && r && seg && seg->type == SEG_THIN) {
+ log_debug_activation("Skipping suspend children of %s.", _node_name(child));
+ child->props.skip_suspend++;
+ }
+
  if (!_suspend_node(name, info.major, info.minor,
    child->dtree->skip_lockfs,
    child->dtree->no_flush, &newinfo)) {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-27 13:09       ` M.H. Tsai
@ 2015-12-29 21:06         ` Zdenek Kabelac
  2015-12-31  9:06           ` M.H. Tsai
  2016-02-12 12:40         ` Zdenek Kabelac
  1 sibling, 1 reply; 14+ messages in thread
From: Zdenek Kabelac @ 2015-12-29 21:06 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 27.12.2015 v 14:09 M.H. Tsai napsal(a):
> Sorry, I sent a wrong mail before. Please ignore it.
>
> 2015-12-26 2:37 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
>> Dne 25.12.2015 v 03:27 M.H. Tsai napsal(a):
>>
>> It's not so simple - since the user may activate  thin-pool without
>> activation of any thin-volume, and keep thin-pool active while thin-volumes
>> are activated and deactivated - so it's different case if user activates
>> thin-pool explicitly or thin-pool is activated as thin volume dependency.
>>
>> Also thin-pool LV has its own cluster lock which is quite complicated to
>> explain, but for now -  thin-pool size is unimportant, but it's existence is
>> mandatory :)
>
> I have three questions:
>
> 1. If we need to preserve the -tpool layer, why the commit 00a45ca4
> activates a new thinpool (transaction_id == 0) without overlay?

This comes from request to support  'external' thin-pool users - so
lvm2 only manages thin-pool resize - but does not create thin LVs -
it's docker's business...

But the rule is - users' usable LV do have public names and UUID.
Hidden/private  LVs have suffices in UUID.
(In the future all hidden LVs should have UUID suffix for easy identification
by blkid)

> 2. Is it necessary to suspend any thin volume while extending a
> thinpool? If not, the commit fa648234 might need some fix.

Well technically it would likely be better to do a suspend from
all top-level LVs - but ATM thin-pool does use 'internal'
suspend - there are couple associated issue - like usage of
flush during such suspend (see comment bellow)

> 3. Similary to question(2), is it necessary to suspend thin-pool while
> expanding a thin-volume ? If no, we should adopt the approach of
> a900d150e for thin-volume expansion. The following is my solution:

Nope, even the current solution is rather 'short-time' hack
until better fitting way is found (the hack needs some libdm API
interface rather then some autonomous decision)
But since current thin-pool target is missing couple features,
we need to first improve kernel target.

In general we need 'suspend with flush' which will not block, when pool
runs out of space as suspend without flush is in general not so much
useful.

Unsure how much you interested in development of lvm2 code?

Are you targeting some specific feature?

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-29 21:06         ` Zdenek Kabelac
@ 2015-12-31  9:06           ` M.H. Tsai
  2015-12-31 21:25             ` Zdenek Kabelac
  0 siblings, 1 reply; 14+ messages in thread
From: M.H. Tsai @ 2015-12-31  9:06 UTC (permalink / raw)
  To: LVM general discussion and development

2015-12-30 5:06 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
>> I have three questions:
>>
>> 1. If we need to preserve the -tpool layer, why the commit 00a45ca4
>> activates a new thinpool (transaction_id == 0) without overlay?
>
> This comes from request to support  'external' thin-pool users - so
> lvm2 only manages thin-pool resize - but does not create thin LVs -
> it's docker's business...
>
> But the rule is - users' usable LV do have public names and UUID.
> Hidden/private  LVs have suffices in UUID.
> (In the future all hidden LVs should have UUID suffix for easy
> identification
> by blkid)

I found the discussion thread and related commits (in 2014-11-04).
I'll read it later.
https://github.com/docker/docker/issues/15629

>> 2. Is it necessary to suspend any thin volume while extending a
>> thinpool? If not, the commit fa648234 might need some fix.
>
> Well technically it would likely be better to do a suspend from
> all top-level LVs - but ATM thin-pool does use 'internal'
> suspend - there are couple associated issue - like usage of
> flush during such suspend (see comment bellow)
>
>
>> 3. Similary to question(2), is it necessary to suspend thin-pool while
>> expanding a thin-volume ? If no, we should adopt the approach of
>> a900d150e for thin-volume expansion. The following is my solution:
>
> Nope, even the current solution is rather 'short-time' hack
> until better fitting way is found (the hack needs some libdm API
> interface rather then some autonomous decision)
> But since current thin-pool target is missing couple features,
> we need to first improve kernel target.
>
> In general we need 'suspend with flush' which will not block, when pool
> runs out of space as suspend without flush is in general not so much
> useful.
>
> Unsure how much you interested in development of lvm2 code?
> Are you targeting some specific feature?

I'm the maintainer of LVM for my company's NAS. We just take daily
snapshot for backup.

For question(2), the kernel's pool_presuspend() function suspends all
the active thin volumes. Does that mean we don't need to suspend all
the thin volumes explicitly? Also, the current LVM only suspends the
"first" thin volume, which looks confusing.

The main concern for question (3) is IO performance -- taking
snapshots or expanding a thin volume should not affect IO of other
volumes or the entire pool.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-31  9:06           ` M.H. Tsai
@ 2015-12-31 21:25             ` Zdenek Kabelac
  2016-01-01 18:10               ` M.H. Tsai
  0 siblings, 1 reply; 14+ messages in thread
From: Zdenek Kabelac @ 2015-12-31 21:25 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 31.12.2015 v 10:06 M.H. Tsai napsal(a):
> 2015-12-30 5:06 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
>>> I have three questions:
>>>
>
>>> 2. Is it necessary to suspend any thin volume while extending a
>>> thinpool? If not, the commit fa648234 might need some fix.
>>
>> Well technically it would likely be better to do a suspend from
>> all top-level LVs - but ATM thin-pool does use 'internal'
>> suspend - there are couple associated issue - like usage of
>> flush during such suspend (see comment bellow)
>>
>>
>>> 3. Similary to question(2), is it necessary to suspend thin-pool while
>>> expanding a thin-volume ? If no, we should adopt the approach of
>>> a900d150e for thin-volume expansion. The following is my solution:
>>
>> Nope, even the current solution is rather 'short-time' hack
>> until better fitting way is found (the hack needs some libdm API
>> interface rather then some autonomous decision)
>> But since current thin-pool target is missing couple features,
>> we need to first improve kernel target.
>>
>> In general we need 'suspend with flush' which will not block, when pool
>> runs out of space as suspend without flush is in general not so much
>> useful.
>>
>> Unsure how much you interested in development of lvm2 code?
>> Are you targeting some specific feature?
>
> I'm the maintainer of LVM for my company's NAS. We just take daily
> snapshot for backup.
>
> For question(2), the kernel's pool_presuspend() function suspends all
> the active thin volumes. Does that mean we don't need to suspend all
> the thin volumes explicitly? Also, the current LVM only suspends the
> "first" thin volume, which looks confusing.
>
> The main concern for question (3) is IO performance -- taking
> snapshots or expanding a thin volume should not affect IO of other
> volumes or the entire pool.


You should be aware of  thin-pool limits.

i.e. ATM it's bad plan to use more then say 16 LVs in parallel for
a single thin-pool LV - it cannot be used in some massive parallel system
for its current locking mechanism.

There is even sequencing problem with creating snapshot in kernel target
which needs to be probably fixed first.
(the rule here should be - to never create/allocate something when
there is suspended device - and this rule is broken with current thin
snapshot creation - so thin snap create message should go in front
to ensure there is a space in thin-pool ahead of origin suspend  - will
be addressed in some future version....)

However when taking snapshot - only origin thin LV is now suspended and
should not influence rest of thin volumes (except for thin-pool commit points)

minor warning - snapshot is not a backup - although it might look like it is :).

Regards

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-31 21:25             ` Zdenek Kabelac
@ 2016-01-01 18:10               ` M.H. Tsai
  2016-01-02 23:05                 ` Zdenek Kabelac
  0 siblings, 1 reply; 14+ messages in thread
From: M.H. Tsai @ 2016-01-01 18:10 UTC (permalink / raw)
  To: LVM general discussion and development

2016-01-01 5:25 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
> You should be aware of  thin-pool limits.
>
> i.e. ATM it's bad plan to use more then say 16 LVs in parallel for
> a single thin-pool LV - it cannot be used in some massive parallel system
> for its current locking mechanism.

Is it LVM or dm-thin kernel target's limit? And, is there any
reference about the "16 LVs" and the locking issue? (why 16 LVs?)

> There is even sequencing problem with creating snapshot in kernel target
> which needs to be probably fixed first.
> (the rule here should be - to never create/allocate something when
> there is suspended device - and this rule is broken with current thin
> snapshot creation - so thin snap create message should go in front
> to ensure there is a space in thin-pool ahead of origin suspend  - will
> be addressed in some future version....)
>
> However when taking snapshot - only origin thin LV is now suspended and
> should not influence rest of thin volumes (except for thin-pool commit
> points)

Does that mean in future version of dm-thin, the command sequence of
snapshot creation will be:

dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
dmsetup suspend /dev/mapper/thin
dmsetup resume /dev/mapper/thin

> minor warning - snapshot is not a backup - although it might look like it is
> :).

Yes, we know it, thanks :)


Ming-Hung Tsai

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2016-01-01 18:10               ` M.H. Tsai
@ 2016-01-02 23:05                 ` Zdenek Kabelac
  2016-01-04  5:08                   ` M.H. Tsai
  0 siblings, 1 reply; 14+ messages in thread
From: Zdenek Kabelac @ 2016-01-02 23:05 UTC (permalink / raw)
  To: LVM general discussion and development, Joe Thornber

Dne 1.1.2016 v 19:10 M.H. Tsai napsal(a):
> 2016-01-01 5:25 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
>> You should be aware of  thin-pool limits.
>>
>> i.e. ATM it's bad plan to use more then say 16 LVs in parallel for
>> a single thin-pool LV - it cannot be used in some massive parallel system
>> for its current locking mechanism.
>
> Is it LVM or dm-thin kernel target's limit? And, is there any
> reference about the "16 LVs" and the locking issue? (why 16 LVs?)

Hi

That's rather Joe (driver author) what are the 'reasonable' limits for current 
target? (I think I remember it correctly it's been somewhere 16-32 LV).

Of course it all depends on uses case - and any reproducible benchmarks
are welcome in this area (e.g. there is no hard limit on some device number
count - but the more parallel writes it need to handle and do provisioning,
trimming the more lock contention will happen)

>> There is even sequencing problem with creating snapshot in kernel target
>> which needs to be probably fixed first.
>> (the rule here should be - to never create/allocate something when
>> there is suspended device - and this rule is broken with current thin
>> snapshot creation - so thin snap create message should go in front
>> to ensure there is a space in thin-pool ahead of origin suspend  - will
>> be addressed in some future version....)
>>
>> However when taking snapshot - only origin thin LV is now suspended and
>> should not influence rest of thin volumes (except for thin-pool commit
>> points)
>
> Does that mean in future version of dm-thin, the command sequence of
> snapshot creation will be:
>
> dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
> dmsetup suspend /dev/mapper/thin
> dmsetup resume /dev/mapper/thin
>

Possibly different message - since everything must remain
fully backward compatible (i.e. create_snap_on_suspend,
or maybe some other mechanism will be there).
But yes something in this direction...

Regards

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2016-01-02 23:05                 ` Zdenek Kabelac
@ 2016-01-04  5:08                   ` M.H. Tsai
  2016-01-04 13:27                     ` Zdenek Kabelac
  0 siblings, 1 reply; 14+ messages in thread
From: M.H. Tsai @ 2016-01-04  5:08 UTC (permalink / raw)
  To: LVM general discussion and development

2016-01-03 7:05 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
> Dne 1.1.2016 v 19:10 M.H. Tsai napsal(a):
>> 2016-01-01 5:25 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
>>> There is even sequencing problem with creating snapshot in kernel target
>>> which needs to be probably fixed first.
>>> (the rule here should be - to never create/allocate something when
>>> there is suspended device

Excuse me, does the statement
'to never create/allocate something when there is suspended device'
describes the case that the thin-pool is full, and the volume is
'suspend with no flush' ? Because there's no free blocks for
allocation.

Otherwise, it would be strange if we cannot do these operations when
the pool is not full.

>>> and this rule is broken with current thin
>>> snapshot creation, so thin snap create message should go in front
>>> to ensure there is a space in thin-pool ahead of origin suspend  - will
>>> be addressed in some future version....)
>>>
>>> However when taking snapshot - only origin thin LV is now suspended and
>>> should not influence rest of thin volumes (except for thin-pool commit
>>> points)
>>
>> Does that mean in future version of dm-thin, the command sequence of
>> snapshot creation will be:
>>
>> dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
>> dmsetup suspend /dev/mapper/thin
>> dmsetup resume /dev/mapper/thin
>>
> Possibly different message - since everything must remain
> fully backward compatible (i.e. create_snap_on_suspend,
> or maybe some other mechanism will be there).
> But yes something in this direction...

I'm not well understood. Is the new message designed for the case that
thin-pool is nearly full?
Because the pool's free data blocks might not sufficient for 'suspend
with flush' (i.e., 'suspend with flush' might failed if the pool is
nearly full), so we should move the create_snap message before
suspending. However, the created snapshots are inconsistent.
If the pool is full, then there's no difference between taking
snapshots before or after 'suspend without flush'.
Is that right?

Thanks,
Ming-Hung Tsai

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2016-01-04  5:08                   ` M.H. Tsai
@ 2016-01-04 13:27                     ` Zdenek Kabelac
  0 siblings, 0 replies; 14+ messages in thread
From: Zdenek Kabelac @ 2016-01-04 13:27 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 4.1.2016 v 06:08 M.H. Tsai napsal(a):
> 2016-01-03 7:05 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
>> Dne 1.1.2016 v 19:10 M.H. Tsai napsal(a):
>>> 2016-01-01 5:25 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
>>>> There is even sequencing problem with creating snapshot in kernel target
>>>> which needs to be probably fixed first.
>>>> (the rule here should be - to never create/allocate something when
>>>> there is suspended device
>
> Excuse me, does the statement
> 'to never create/allocate something when there is suspended device'
> describes the case that the thin-pool is full, and the volume is
> 'suspend with no flush' ? Because there's no free blocks for
> allocation.

The reason for this is -  you could suspend a device with i.e. swap/root
so now - if during any kernel allocation kernel would need a memory
chunk and would require some 'swap/root' space on suspended disk, kernel
would block endlessly.

So table reload (with updated dm table line) should always happen before
suspend (aka PRELOAD phase in lvm2 code).

Following device resume should be just switching tables without any
memory allocations - those should have been all resolved in load phase -
where you have always 2 slots - active & inactive.

(And yes - there are some (known) problems with this rule in current lvm2 and 
some dm targets...)

> Otherwise, it would be strange if we cannot do these operations when
> the pool is not full.

Extension of device is 'special' - in fact we could enable  'suspend WITHOUT 
flush' for any 'lvextend' operation - but that needs full re-validation of all 
targets - so for now it's only enabled for thin-pool lvextend.

As 'suspend with flush' is typically needed when you change device type in 
some way - however with pure lvextend case (onlt new space is added, no 
existing device space changes) there may not be any BIO in-flight routed into 
'new extended' space - thus flush is not needed. (unsure if this explanation 
does make sense)

>
>>>> and this rule is broken with current thin
>>>> snapshot creation, so thin snap create message should go in front
>>>> to ensure there is a space in thin-pool ahead of origin suspend  - will
>>>> be addressed in some future version....)
>>>>
>>>> However when taking snapshot - only origin thin LV is now suspended and
>>>> should not influence rest of thin volumes (except for thin-pool commit
>>>> points)
>>>
>>> Does that mean in future version of dm-thin, the command sequence of
>>> snapshot creation will be:
>>>
>>> dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
>>> dmsetup suspend /dev/mapper/thin
>>> dmsetup resume /dev/mapper/thin
>>>
>> Possibly different message - since everything must remain
>> fully backward compatible (i.e. create_snap_on_suspend,
>> or maybe some other mechanism will be there).
>> But yes something in this direction...
>
> I'm not well understood. Is the new message designed for the case that
> thin-pool is nearly full?
> Because the pool's free data blocks might not sufficient for 'suspend
> with flush' (i.e., 'suspend with flush' might failed if the pool is
> nearly full), so we should move the create_snap message before
> suspending. However, the created snapshots are inconsistent.
> If the pool is full, then there's no difference between taking
> snapshots before or after 'suspend without flush'.
> Is that right?

As said - the solution is nontrivial - and needs enhancements
on suspend API - when you suspend 'thinLV origin' you need
to use suspend with flush - however ATM such suspend may 'block'
whole lvm2 - while lvm2 keeps VG lock.

As a prevention - lvm2 user can configure threshold for autoresize (e.g. 70%)
and when pool is above the threshold user is not allowed to create any new 
thinLV. This normally works quite ok - but it's obviously not a 'bullet-proof' 
solution here (as you could construct a case, where time-of-check
and time-of-use may cause out-of-space pool).

So far the rule is simple - at all cost - do not run thin-pool when it's full, 
overfilled pool is NOT comparable to a 'single' write error.
When admin is solving overfilled pool - something went wrong earlier
(admin failed to extend his VG)....

Thin-pool is about 'promising' a space user can deliver 'later', not about
hitting overfull corner case as 'regular' use-case where user can expect some 
well handled error behavior (but yes we try to make a better user experience here)

Regards

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device
  2015-12-27 13:09       ` M.H. Tsai
  2015-12-29 21:06         ` Zdenek Kabelac
@ 2016-02-12 12:40         ` Zdenek Kabelac
  1 sibling, 0 replies; 14+ messages in thread
From: Zdenek Kabelac @ 2016-02-12 12:40 UTC (permalink / raw)
  To: linux-lvm

Dne 27.12.2015 v 14:09 M.H. Tsai napsal(a):
> Sorry, I sent a wrong mail before. Please ignore it.
>
> 2015-12-26 2:37 GMT+08:00 Zdenek Kabelac <zkabelac@redhat.com>:
>> Dne 25.12.2015 v 03:27 M.H. Tsai napsal(a):
>>
>> It's not so simple - since the user may activate  thin-pool without
>> activation of any thin-volume, and keep thin-pool active while thin-volumes
>> are activated and deactivated - so it's different case if user activates
>> thin-pool explicitly or thin-pool is activated as thin volume dependency.
>>
>> Also thin-pool LV has its own cluster lock which is quite complicated to
>> explain, but for now -  thin-pool size is unimportant, but it's existence is
>> mandatory :)
>
> I have three questions:
>
> 1. If we need to preserve the -tpool layer, why the commit 00a45ca4
> activates a new thinpool (transaction_id == 0) without overlay?
>
> 2. Is it necessary to suspend any thin volume while extending a
> thinpool? If not, the commit fa648234 might need some fix.
>
> 3. Similary to question(2), is it necessary to suspend thin-pool while
> expanding a thin-volume ? If no, we should adopt the approach of
> a900d150e for thin-volume expansion. The following is my solution:
>

Hi

Thanks a lot for very detailed analysis which has been correct.

Hopefully I've put in all the patches here in the proper way upstream.
Please check upstream commits starting in this thread here:

https://www.redhat.com/archives/lvm-devel/2016-February/msg00004.html

Regards

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-02-12 12:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-23  9:50 [linux-lvm] Possible bug in expanding thinpool: lvextend doens't expand the top-level dm-linear device M.H. Tsai
2015-12-24  9:04 ` Zdenek Kabelac
2015-12-25  2:27   ` M.H. Tsai
2015-12-25 18:37     ` Zdenek Kabelac
2015-12-27  9:19       ` M.H. Tsai
2015-12-27 13:09       ` M.H. Tsai
2015-12-29 21:06         ` Zdenek Kabelac
2015-12-31  9:06           ` M.H. Tsai
2015-12-31 21:25             ` Zdenek Kabelac
2016-01-01 18:10               ` M.H. Tsai
2016-01-02 23:05                 ` Zdenek Kabelac
2016-01-04  5:08                   ` M.H. Tsai
2016-01-04 13:27                     ` Zdenek Kabelac
2016-02-12 12:40         ` Zdenek Kabelac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.