linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] commit c527a0cbfc3 may have a bug
       [not found] <098d6e8d-2d2c-5067-1435-eefd7e2d09bc@suse.com>
@ 2020-02-14 15:18 ` heming.zhao
  2020-02-14 19:11 ` David Teigland
  1 sibling, 0 replies; 12+ messages in thread
From: heming.zhao @ 2020-02-14 15:18 UTC (permalink / raw)
  To: linux-lvm; +Cc: teigland


Hello list & David,

The stable-2.02 branch commit c527a0cbfc391645d30407d2 may intruduce a bug.
There is a new function label_scan_pvscan_all(), which uses 
cmd->lvmetad_filter to create devices list for scan.

code:
```
label_scan_pvscan_all
  if (!(iter = dev_iter_create(cmd->lvmetad_filter, 0)))
  ... ...
  while ((dev = dev_iter_get(iter)))
  ... ...
```

It looks it is wrong to use cmd->lvmetad_filter in label_scan_pvscan_all.
The behaviour is changed after the patch applied. (legacy code use 
cmd->full_filter)

When system has duplicated devices and startup, with patch c527a0cb, the 
duplicated devs will pass global_filter (usually it's empty). It makes 
lvmetad fail to build up LV, then the system boot failed. This case is 
not my imagination, one of our customer met recently.

So I suggest to change the cmd->lvmetad_filter to cmd->full_filter in 
label_scan_pvscan_all().

The steps to reproduce:
```
create a loop dev.
use this loop to create some mapper devs. (share the same loop dev)
pvcreate on these mapper devs

# this cmd will output warning msg.
pvscan --cache --config ' devices { filter = [ "r|/dev/loop0|" } '
# this cmd will not output warning msg.
pvscan --cache --config ' devices { filter = [ "a|/dev/loop0|" ] 
global_filter = [ "r|/dev/loop0|" ] } '
```

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
       [not found] <098d6e8d-2d2c-5067-1435-eefd7e2d09bc@suse.com>
  2020-02-14 15:18 ` [linux-lvm] commit c527a0cbfc3 may have a bug heming.zhao
@ 2020-02-14 19:11 ` David Teigland
  2020-02-14 19:34   ` Gionatan Danti
  1 sibling, 1 reply; 12+ messages in thread
From: David Teigland @ 2020-02-14 19:11 UTC (permalink / raw)
  To: heming.zhao; +Cc: linux-lvm

On Fri, Feb 14, 2020 at 11:13:04PM +0800, heming.zhao@suse.com wrote:
> Hello list & David,
> 
> The stable-2.02 branch commit c527a0cbfc391645d30407d2 may intruduce a bug.
> There is a new function label_scan_pvscan_all(), which uses
> cmd->lvmetad_filter to create devices list for scan.
> 
> code:
> ```
> label_scan_pvscan_all
>  if (!(iter = dev_iter_create(cmd->lvmetad_filter, 0)))
>  ... ...
>  while ((dev = dev_iter_get(iter)))
>  ... ...
> ```
> 
> It looks it is wrong to use cmd->lvmetad_filter in label_scan_pvscan_all.
> The behaviour is changed after the patch applied. (legacy code use
> cmd->full_filter)

Hi, it looks like a bug led to an incorrect filter configuration actually
working for a period of time.  When the bug was later fixed, the incorrect
filter became apparent.  In summary, the correct way to exclude devs from
lvmetad (and to handle duplicate PVs) is to set global_filter; filter is
not meant to work for that.

Here's the best comment to refer to:

 *   - cmd->lvmetad_filter - the lvmetad filter chain used when scanning devs for lvmetad update:
 *     sysfs filter -> internal filter -> global regex filter -> type filter ->
 *     usable device filter(FILTER_MODE_PRE_LVMETAD) ->
 *     mpath component filter -> partitioned filter ->
 *     md component filter -> fw raid filter
 *
 *   - cmd->filter - the filter chain used for lvmetad responses:
 *     persistent filter -> regex_filter -> usable device filter(FILTER_MODE_POST_LVMETAD)
 *
 *   - cmd->full_filter - the filter chain used for all the remaining situations:
 *     cmd->lvmetad_filter -> cmd->filter

pvscan --cache, which populates lvmetad, should be using
cmd->lvmetad_filter (which includes global_filter config, but not the
filter config.)  So, label_scan_pvscan_all() looks like it should be
correct.

Before c527a0cbfc391645d30407d2, pvscan --cache called label_scan() which
uses cmd->full_filter (a combination of global_filter config and filter
config.)  Afterward, pvscan --cache calls label_scan_pvscan_all() which
uses cmd->lvmetad_filter.  So, that commit should be fixing the behavior
of pvscan.

> When system has duplicated devices and startup, with patch c527a0cb, the
> duplicated devs will pass global_filter (usually it's empty). It makes
> lvmetad fail to build up LV, then the system boot failed. This case is not
> my imagination, one of our customer met recently.

Setting global_filter is the correct way to handle duplicate devices,
setting the filter config shouldn't affect pvscan --cache.

> So I suggest to change the cmd->lvmetad_filter to cmd->full_filter in
> label_scan_pvscan_all().
> 
> The steps to reproduce:
> ```
> create a loop dev.
> use this loop to create some mapper devs. (share the same loop dev)
> pvcreate on these mapper devs
> 
> # this cmd will output warning msg.
> pvscan --cache --config ' devices { filter = [ "r|/dev/loop0|" } '
> # this cmd will not output warning msg.
> pvscan --cache --config ' devices { filter = [ "a|/dev/loop0|" ]
> global_filter = [ "r|/dev/loop0|" ] } '
> ```

The best option would be:
pvscan --cache --config ' devices { global_filter = [ "r|/dev/loop0|" } '

I have /dev/loop0 and a dm wrapper of it called /dev/mapper/loop0idm.

The best config works as expected:
# pvscan --cache --config "devices {global_filter=[\"r|/dev/loop0|\"]}" -vvvv 2>&1| grep -e 'Scan metadata from' -e WARNING
#cache/lvmetad.c:2292    Scan metadata from dev /dev/loop1
#cache/lvmetad.c:2292    Scan metadata from dev /dev/loop2
#cache/lvmetad.c:2292    Scan metadata from dev /dev/mapper/loop0idm

This config should work, but setting filter is unnecessary:
# pvscan --cache --config "devices {filter=[\"a|/dev/loop0|\"] global_filter=[\"r|/dev/loop0|\"]}" -vvvv 2>&1| grep -e 'Scan metadata from' -e WARNING
#cache/lvmetad.c:2292    Scan metadata from dev /dev/loop1
#cache/lvmetad.c:2292    Scan metadata from dev /dev/loop2
#cache/lvmetad.c:2292    Scan metadata from dev /dev/mapper/loop0idm

This config is not expected to work:
# pvscan --cache --config "devices {filter=[\"r|/dev/loop0|\"]}" -vvvv 2>&1| grep -e 'Scan metadata from' -e WARNING
#cache/lvmetad.c:2292    Scan metadata from dev /dev/loop0
#cache/lvmetad.c:2292    Scan metadata from dev /dev/loop1
#cache/lvmetad.c:2292    Scan metadata from dev /dev/loop2
#cache/lvmetad.c:2292    Scan metadata from dev /dev/mapper/loop0idm
#cache/lvmcache.c:1615    WARNING: found device with duplicate /dev/mapper/loop0idm
#cache/lvmcache.c:1617    WARNING: Disabling lvmetad cache which does not support duplicate PVs.
#cache/lvmetad.c:2486    WARNING: Scan found duplicate PVs.
#pvscan.c:515     WARNING: Not using lvmetad because cache update failed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-14 19:11 ` David Teigland
@ 2020-02-14 19:34   ` Gionatan Danti
  2020-02-14 20:40     ` David Teigland
  0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2020-02-14 19:34 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: heming.zhao

Il 2020-02-14 20:11 David Teigland ha scritto:
> Hi, it looks like a bug led to an incorrect filter configuration 
> actually
> working for a period of time.  When the bug was later fixed, the 
> incorrect
> filter became apparent.  In summary, the correct way to exclude devs 
> from
> lvmetad (and to handle duplicate PVs) is to set global_filter; filter 
> is
> not meant to work for that.

Hi David, being filters one of the most asked questions, can I ask why 
we have so many different filters, leading to such complex interactions 
and behaviors?

Don't get me wrong: I am sure you (the lvm team) have very good reasons 
to do that, and I am surely missing something? But what, precisely? How 
should we (end users) consider filters? Should we only use 
global_filter?

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it [1]
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-14 19:34   ` Gionatan Danti
@ 2020-02-14 20:40     ` David Teigland
  2020-02-15  5:22       ` heming.zhao
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Teigland @ 2020-02-14 20:40 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-lvm, heming.zhao

On Fri, Feb 14, 2020 at 08:34:19PM +0100, Gionatan Danti wrote:
> Hi David, being filters one of the most asked questions, can I ask why we
> have so many different filters, leading to such complex interactions and
> behaviors?
> 
> Don't get me wrong: I am sure you (the lvm team) have very good reasons to
> do that, and I am surely missing something? But what, precisely? How should
> we (end users) consider filters? Should we only use global_filter?

You're right, filters are difficult to understand and use correctly.  The
complexity and confusion in the code is no better.  With the removal of
lvmetad in 2.03 versions (e.g. RHEL8) there's no difference between filter
and global_filter, so that's some small improvement.  But, I think filters
should be replaced or overhauled with something easier to use and more
useful at a technical level.

I've created a bz about that and welcome thoughts about what a replacement
should or should not be like.  With input the work is more likely to be
prioritized.

https://bugzilla.redhat.com/show_bug.cgi?id=1803266

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-14 20:40     ` David Teigland
@ 2020-02-15  5:22       ` heming.zhao
  2020-02-15 12:40       ` Zdenek Kabelac
  2020-02-15 19:07       ` Gionatan Danti
  2 siblings, 0 replies; 12+ messages in thread
From: heming.zhao @ 2020-02-15  5:22 UTC (permalink / raw)
  To: David Teigland, Gionatan Danti; +Cc: linux-lvm

Hello David,

I accept your points. the commit c527a0cbfc3 is correct.
I still not sure if the correct fix would have unintended consequences.
I think the most of people should only config device/filter in their machine.
After this commit, machine with duplicated devs should config 2 same copies filter rule.
one copy for device/filter, another for device/global_filter. It is wield.
The legacy code lived a period of time. Many of machine use it.

I quickly check the codes, before c527a0cbfc3, lvmetad_filter is
mainly used in _pvscan_cache() with in seldom condition. All other cases use device/filter.

So I suggest
1. Does lvm2 continue to keep the wrong filter(full_filter) usage?
    It can keep machine running as usual.
    or may add a new config item (e.g. pvscan_compat_filter = 0|1). let user to choose
filter behaviour

2. (a lot of work) backport mainline one filter code into stable-2.02 branch.

At last,
There is a little code tip for mainline branch:
To remove the in cfg_array(devices_global_filter_CFG in lib/config/config_settings.h
It generates useless config info.

Thanks.


On 2/15/20 4:40 AM, David Teigland wrote:
> On Fri, Feb 14, 2020 at 08:34:19PM +0100, Gionatan Danti wrote:
>> Hi David, being filters one of the most asked questions, can I ask why we
>> have so many different filters, leading to such complex interactions and
>> behaviors?
>>
>> Don't get me wrong: I am sure you (the lvm team) have very good reasons to
>> do that, and I am surely missing something? But what, precisely? How should
>> we (end users) consider filters? Should we only use global_filter?
> 
> You're right, filters are difficult to understand and use correctly.  The
> complexity and confusion in the code is no better.  With the removal of
> lvmetad in 2.03 versions (e.g. RHEL8) there's no difference between filter
> and global_filter, so that's some small improvement.  But, I think filters
> should be replaced or overhauled with something easier to use and more
> useful at a technical level.
> 
> I've created a bz about that and welcome thoughts about what a replacement
> should or should not be like.  With input the work is more likely to be
> prioritized.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1803266
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-14 20:40     ` David Teigland
  2020-02-15  5:22       ` heming.zhao
@ 2020-02-15 12:40       ` Zdenek Kabelac
  2020-02-15 19:15         ` Gionatan Danti
  2020-02-15 19:07       ` Gionatan Danti
  2 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2020-02-15 12:40 UTC (permalink / raw)
  To: LVM general discussion and development, David Teigland, Gionatan Danti
  Cc: heming.zhao

Dne 14. 02. 20 v 21:40 David Teigland napsal(a):
> On Fri, Feb 14, 2020 at 08:34:19PM +0100, Gionatan Danti wrote:
>> Hi David, being filters one of the most asked questions, can I ask why we
>> have so many different filters, leading to such complex interactions and
>> behaviors?
>>
>> Don't get me wrong: I am sure you (the lvm team) have very good reasons to
>> do that, and I am surely missing something? But what, precisely? How should
>> we (end users) consider filters? Should we only use global_filter?
> 
> You're right, filters are difficult to understand and use correctly.  The
> complexity and confusion in the code is no better.  With the removal of
> lvmetad in 2.03 versions (e.g. RHEL8) there's no difference between filter
> and global_filter, so that's some small improvement.  But, I think filters
> should be replaced or overhauled with something easier to use and more
> useful at a technical level.
> 
> I've created a bz about that and welcome thoughts about what a replacement
> should or should not be like.  With input the work is more likely to be
> prioritized.
> 

One of the 'reason' for having 2 sets of filter was the presence of universal 
'scanning' tool (aka udev) - which is assessing & reading devices in a system 
and its combination with various 'VM' environments where actual device are 
passed to guest systems on your hosting machine.

So there are many different combinations where different commands may need to 
see different subset of devices - so i.e. your guest machine should not have 
an impact on correctness of your 'hosting' machine no matter what guess will 
write (i.e. duplicating signatures...)

While in many cases for many single home users with single set of devices this 
can be seen maybe as an 'overkill' solution - in the more generic world where 
there is unfortunately not yet any widely used/accepted solution solving the 
core problem: 'who is the owner of a device'  having several sets of filter 
was the only solution we were able to create.

It's worth to note lvm2 is solving way more issues then other similar device 
technology (i.e. mdraid, btrfs....) where it's very simple to cause big 
confusion and data corruptions (even unnoticed) once duplicates appears in 
your system...

Zdenek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-14 20:40     ` David Teigland
  2020-02-15  5:22       ` heming.zhao
  2020-02-15 12:40       ` Zdenek Kabelac
@ 2020-02-15 19:07       ` Gionatan Danti
  2 siblings, 0 replies; 12+ messages in thread
From: Gionatan Danti @ 2020-02-15 19:07 UTC (permalink / raw)
  To: David Teigland; +Cc: linux-lvm, heming.zhao

Il 2020-02-14 21:40 David Teigland ha scritto:
> You're right, filters are difficult to understand and use correctly.  
> The
> complexity and confusion in the code is no better.  With the removal of
> lvmetad in 2.03 versions (e.g. RHEL8) there's no difference between 
> filter
> and global_filter, so that's some small improvement.  But, I think 
> filters
> should be replaced or overhauled with something easier to use and more
> useful at a technical level.
> 
> I've created a bz about that and welcome thoughts about what a 
> replacement
> should or should not be like.  With input the work is more likely to be
> prioritized.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1803266

Hi David, I think that part of the problem is the unclear/vague 
description of filters (eg: "plain" filter vs global_filter). In other 
words, maybe the real problem is a documentation one.

For example: am I right saying that global_filter were introduced as a 
"fail-safe" mechanism to protect udev & the likes by 
command-line-overwritten "plain" filter directive?

If so, I am not sure the comment in lvm.conf fully convey this message 
(and I can not find much on man pages, also). If not, and I am wrong 
about filter vs global_filter, then, well, this somewhat proves the 
point above :)

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it [1]
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-15 12:40       ` Zdenek Kabelac
@ 2020-02-15 19:15         ` Gionatan Danti
  2020-02-15 20:19           ` Zdenek Kabelac
  2020-02-15 20:49           ` Chris Murphy
  0 siblings, 2 replies; 12+ messages in thread
From: Gionatan Danti @ 2020-02-15 19:15 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: David, Teigland, heming.zhao, LVM general discussion and development

Il 2020-02-15 13:40 Zdenek Kabelac ha scritto:
> Dne 14. 02. 20 v 21:40 David Teigland napsal(a):
>> On Fri, Feb 14, 2020 at 08:34:19PM +0100, Gionatan Danti wrote:
>>> Hi David, being filters one of the most asked questions, can I ask 
>>> why we
>>> have so many different filters, leading to such complex interactions 
>>> and
>>> behaviors?
>>> 
>>> Don't get me wrong: I am sure you (the lvm team) have very good 
>>> reasons to
>>> do that, and I am surely missing something? But what, precisely? How 
>>> should
>>> we (end users) consider filters? Should we only use global_filter?
>> 
>> You're right, filters are difficult to understand and use correctly.  
>> The
>> complexity and confusion in the code is no better.  With the removal 
>> of
>> lvmetad in 2.03 versions (e.g. RHEL8) there's no difference between 
>> filter
>> and global_filter, so that's some small improvement.  But, I think 
>> filters
>> should be replaced or overhauled with something easier to use and more
>> useful at a technical level.
>> 
>> I've created a bz about that and welcome thoughts about what a 
>> replacement
>> should or should not be like.  With input the work is more likely to 
>> be
>> prioritized.
>> 
> 
> One of the 'reason' for having 2 sets of filter was the presence of
> universal 'scanning' tool (aka udev) - which is assessing & reading
> devices in a system and its combination with various 'VM' environments
> where actual device are passed to guest systems on your hosting
> machine.
> 
> So there are many different combinations where different commands may
> need to see different subset of devices - so i.e. your guest machine
> should not have an impact on correctness of your 'hosting' machine no
> matter what guess will write (i.e. duplicating signatures...)

Sure. But why having a single, valid filter set is not sufficient? In 
other words, why/when I can not simply using global_filter, ignoring 
"plain" filter?

> While in many cases for many single home users with single set of
> devices this can be seen maybe as an 'overkill' solution - in the more
> generic world where there is unfortunately not yet any widely
> used/accepted solution solving the core problem: 'who is the owner of
> a device'  having several sets of filter was the only solution we were
> able to create.

True. I myself saw some setup where hosts had direct visibility of 
guest-created logical volumes. The obvious solution was to correctly set 
global_filter. However, I have the impression that a good share of 
complexity/issues/unexpected behaviors are due to LVM being able to be 
nested (PV inside LV inside VG inside PV inside ...)

> It's worth to note lvm2 is solving way more issues then other similar
> device technology (i.e. mdraid, btrfs....) where it's very simple to
> cause big confusion and data corruptions (even unnoticed) once
> duplicates appears in your system...
> 
> Zdenek

I never duplicate devices with mdraid, but BTRFS is so fragile that 
taking a simple LVM snapshot of a BTRFS component device can lead to 
data corruption.

I really think the gold standard here is ZFS.
Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it [1]
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-15 19:15         ` Gionatan Danti
@ 2020-02-15 20:19           ` Zdenek Kabelac
  2020-02-16 15:17             ` Gionatan Danti
  2020-02-15 20:49           ` Chris Murphy
  1 sibling, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2020-02-15 20:19 UTC (permalink / raw)
  To: Gionatan Danti
  Cc: heming.zhao, David Teigland, LVM general discussion and development

Dne 15. 02. 20 v 20:15 Gionatan Danti napsal(a):
> Il 2020-02-15 13:40 Zdenek Kabelac ha scritto:
>> Dne 14. 02. 20 v 21:40 David Teigland napsal(a):
>>> On Fri, Feb 14, 2020 at 08:34:19PM +0100, Gionatan Danti wrote:
>>>> Hi David, being filters one of the most asked questions, can I ask why we
>>>> have so many different filters, leading to such complex interactions and
>>>> behaviors?
>>>>
>>>> Don't get me wrong: I am sure you (the lvm team) have very good reasons to
>>>> do that, and I am surely missing something? But what, precisely? How should
>>>> we (end users) consider filters? Should we only use global_filter?
>>>
>>> You're right, filters are difficult to understand and use correctly. The
>>> complexity and confusion in the code is no better.� With the removal of
>>> lvmetad in 2.03 versions (e.g. RHEL8) there's no difference between filter
>>> and global_filter, so that's some small improvement.� But, I think filters
>>> should be replaced or overhauled with something easier to use and more
>>> useful at a technical level.
>>>
>>> I've created a bz about that and welcome thoughts about what a replacement
>>> should or should not be like.� With input the work is more likely to be
>>> prioritized.
>>>
>>
>> One of the 'reason' for having 2 sets of filter was the presence of
>> universal 'scanning' tool (aka udev) - which is assessing & reading
>> devices in a system and its combination with various 'VM' environments
>> where actual device are passed to guest systems on your hosting
>> machine.
>>
>> So there are many different combinations where different commands may
>> need to see different subset of devices - so i.e. your guest machine
>> should not have an impact on correctness of your 'hosting' machine no
>> matter what guess will write (i.e. duplicating signatures...)
> 
> Sure. But why having a single, valid filter set is not sufficient? In other 
> words, why/when I can not simply using global_filter, ignoring "plain" filter?

The problem with simple filter - that was 'tried' to be resolved for lvmetad was:

udev should 'see' all devices in your system - so lvmetad should know about 
all devices in the system  (even with duplicates and all sort of 
inconsistencies and garbage) - the idea was 'nice', but the actual 
implementation itself was rising more troubles that it has been solving.

But ATM - we still have sort of 'pvscan' from udev
and  lvm command run by admin - which can run with different '--config'.

So the 'current' (ATM) difference is:

global_filter -  never scan such devices on a machine

filter  -  never scan device within a single command.

and the idea is - you can have 'different' sets of command operating on 
different subset of device on your machine  - which might be useful in the 
world of 'containers' & VMs & clusters...

So while 'global_filter' should mostly never change -  the change of filter is 
kind of ok during system's lifetime.

When there is no lvmetad anymore -  having 2 different 'filter' settings is 
now 'less' fancy and both cases could be somehow solved with just a single 
filter (as there is simply no cache and there is always some scan) -
but the correctness with VMs and other bigger systems could be better handled 
with 2 filter levels -  where basically 'admin' sets  'hard' borders with
global_filter - and tools can play with 'filter' with already preselected
subset of devices...

As has been said - it's not too much useful if there are just couple of disks 
:)...

>> It's worth to note lvm2 is solving way more issues then other similar
>> device technology (i.e. mdraid, btrfs....) where it's very simple to
>> cause big confusion and data corruptions (even unnoticed) once
>> duplicates appears in your system...
>>
>> Zdenek
> 
> I never duplicate devices with mdraid, but BTRFS is so fragile that taking a 
> simple LVM snapshot of a BTRFS component device can lead to data corruption.
> 
> I really think the gold standard here is ZFS.

IMHO ZFS is 'somewhat' slow to play with...
and I've no idea how ZFS can resolve all correctness issues in kernel...

Zdenek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-15 19:15         ` Gionatan Danti
  2020-02-15 20:19           ` Zdenek Kabelac
@ 2020-02-15 20:49           ` Chris Murphy
  2020-02-16 15:28             ` Gionatan Danti
  1 sibling, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2020-02-15 20:49 UTC (permalink / raw)
  To: LVM general discussion and development

On Sat, Feb 15, 2020 at 12:22 PM Gionatan Danti <g.danti@assyoma.it> wrote:
>
> Il 2020-02-15 13:40 Zdenek Kabelac ha scritto:
> > It's worth to note lvm2 is solving way more issues then other similar
> > device technology (i.e. mdraid, btrfs....) where it's very simple to
> > cause big confusion and data corruptions (even unnoticed) once
> > duplicates appears in your system...
> >
> > Zdenek
>
> I never duplicate devices with mdraid, but BTRFS is so fragile that
> taking a simple LVM snapshot of a BTRFS component device can lead to
> data corruption.
>
> I really think the gold standard here is ZFS.

Are you referring to this known problem?
https://btrfs.wiki.kernel.org/index.php/Gotchas#Block-level_copies_of_devices

By default the snapshot LV isn't active, so the problem doesn't
happen. I've taken many LVM thinp snapshots of Btrfs file systems,
including while they're actively being written to, and never run into
this problem (or any other).

An LVM snapshot comes with FIFREEZE, and supported filesystems,
including Btrfs, should have a consistent snapshot created as a
result. I don't think ZFS supports FIFREEZE/FITHAW and if that's
correct, you're effectively getting a powerfail/crash type behavior
with an LVM snapshot of a ZFS file system, entirely trusting on its
own ability to maintain file system consistency.

My dualist opinion on mixing these layers: while it should work, and
if there's corruption then there's a bug somewhere, adding layers
increases complexity and thus risk. That's possibly a good idea in a
testing/qualification context, where you want something sensitive to
and consistently flags any discrepancy. That's not fragility.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-15 20:19           ` Zdenek Kabelac
@ 2020-02-16 15:17             ` Gionatan Danti
  0 siblings, 0 replies; 12+ messages in thread
From: Gionatan Danti @ 2020-02-16 15:17 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: David, Teigland, heming.zhao, LVM general discussion and development

Il 2020-02-15 21:19 Zdenek Kabelac ha scritto:
> IMHO ZFS is 'somewhat' slow to play with...
> and I've no idea how ZFS can resolve all correctness issues in 
> kernel...
> 
> Zdenek

Oh, it surely does *not* solve all correctness issues. Rather, having 
much simpler constrains (and use cases), it simply avoids many issues.

That said, what LVM achieve despite all abstraction layers and very 
different goals/use cases really is impressive.

So, thanks to the LVM team for the hard work!

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it [1]
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] commit c527a0cbfc3 may have a bug
  2020-02-15 20:49           ` Chris Murphy
@ 2020-02-16 15:28             ` Gionatan Danti
  0 siblings, 0 replies; 12+ messages in thread
From: Gionatan Danti @ 2020-02-16 15:28 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Chris Murphy

Il 2020-02-15 21:49 Chris Murphy ha scritto:
> Are you referring to this known problem?
> https://btrfs.wiki.kernel.org/index.php/Gotchas#Block-level_copies_of_devices

Yes.

> By default the snapshot LV isn't active, so the problem doesn't
> happen. I've taken many LVM thinp snapshots of Btrfs file systems,
> including while they're actively being written to, and never run into
> this problem (or any other).

Thin LVM snapshots are not active by default, yes. But you *need* to 
activate them to access their data.
Moreover, classical (non-thin) LVM snapshot are automatically activated 
when taken.

> An LVM snapshot comes with FIFREEZE, and supported filesystems,
> including Btrfs, should have a consistent snapshot created as a
> result. I don't think ZFS supports FIFREEZE/FITHAW and if that's
> correct, you're effectively getting a powerfail/crash type behavior
> with an LVM snapshot of a ZFS file system, entirely trusting on its
> own ability to maintain file system consistency.

True, but the transactional nature of ZFS writes means that a clean 
recovery option should always be available. Anyway, any modern journaled 
filesystem will not corrupt itself on power loss/recovery (async write 
back data will be lost, obviously).

> My dualist opinion on mixing these layers: while it should work, and
> if there's corruption then there's a bug somewhere, adding layers
> increases complexity and thus risk. That's possibly a good idea in a
> testing/qualification context, where you want something sensitive to
> and consistently flags any discrepancy. That's not fragility.

I am not sure about that: one of BTRFS main goal was to not duplicate 
code, relying on standard Linux block device behavior as much as 
possible. For this reason, I tend to think that snapshotting (and using) 
the block device under a BTRFS device should be a supported use case.

But hey - the LVM team is really doing an awesome work!
Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it [1]
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-02-16 15:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <098d6e8d-2d2c-5067-1435-eefd7e2d09bc@suse.com>
2020-02-14 15:18 ` [linux-lvm] commit c527a0cbfc3 may have a bug heming.zhao
2020-02-14 19:11 ` David Teigland
2020-02-14 19:34   ` Gionatan Danti
2020-02-14 20:40     ` David Teigland
2020-02-15  5:22       ` heming.zhao
2020-02-15 12:40       ` Zdenek Kabelac
2020-02-15 19:15         ` Gionatan Danti
2020-02-15 20:19           ` Zdenek Kabelac
2020-02-16 15:17             ` Gionatan Danti
2020-02-15 20:49           ` Chris Murphy
2020-02-16 15:28             ` Gionatan Danti
2020-02-15 19:07       ` Gionatan Danti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).