linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] issue about return value in _lvchange_activate_single
@ 2020-11-17  3:52 heming.zhao
  2020-11-17 15:45 ` David Teigland
  0 siblings, 1 reply; 3+ messages in thread
From: heming.zhao @ 2020-11-17  3:52 UTC (permalink / raw)
  To: LVM general discussion and development, David Teigland, Zdenek Kabelac

Hello List, David & Zdenek,

I met a boring problem when executing lvchange cmd.

In lvm tools/errors.h, there are defines return values:
```
#define ECMD_PROCESSED      1
#define ENO_SUCH_CMD        2
#define EINVALID_CMD_LINE   3
#define EINIT_FAILED        4
#define ECMD_FAILED     5
```

In lvm functions, it treats "return 0" as error case.

if _lvchange_activate() return ECMD_FAILED, the caller _lvchange_activate_single() think as normal:
```
if (!_lvchange_activate(cmd, lv)) <== ECMD_FAILED is 5, won't enter if case.
        return_ECMD_FAILED;

return ECMD_PROCESSED;
```

So in special condition, lvchange cmd will return successfully but cmd executing failed.

the code should be changed to:
```
diff --git a/tools/lvchange.c b/tools/lvchange.c
index f9a0b54e3..ae626a05b 100644
--- a/tools/lvchange.c
+++ b/tools/lvchange.c
@@ -1437,6 +1437,7 @@ static int _lvchange_activate_single(struct cmd_context *cmd,
 {
        struct logical_volume *origin;
        char snaps_msg[128];
+       int rv;
 
        /* FIXME: untangle the proper logic for cow / sparse / virtual origin */
 
@@ -1465,8 +1466,10 @@ static int _lvchange_activate_single(struct cmd_context *cmd,
                }
        }
 
-       if (!_lvchange_activate(cmd, lv))
+       rv = _lvchange_activate(cmd, lv);
+       if (!rv || rv>1) {
                return_ECMD_FAILED;
+       }
 
        return ECMD_PROCESSED;
 }
```

maybe the same problem exist in other places of lvm2 source code.

how to trigger:

there is two nodes env, node1 & node2. they share 1 iSCSI lun disk.
node1 & node2 use systemid to control the shared disk.

1. node1 already use uname to set up systemid.

```
[tb-clustermd1 ~]# pvs --foreign
  PV         VG  Fmt  Attr PSize   PFree  
  /dev/sda   vg1 lvm2 a--  292.00m 260.00m
[tb-clustermd1 ~]# vgs --foreign -o+systemid
  VG  #PV #LV #SN Attr   VSize   VFree   System ID    
  vg1   1   1   0 wz--n- 292.00m 260.00m tb-clustermd1
[tb-clustermd1 ~]# lvs --foreign -o+Host
  LV   VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Host         
  lv1  vg1 -wi-a----- 32.00m                                                     tb-clustermd1
[tb-clustermd1 ~]# dmsetup ls
vg1-lv1 (254:0)
[tb-clustermd1 ~]#
```

2. node2 change the systemid to itself

```
[tb-clustermd2 ~]# vgchange -y --config "local/extra_system_ids='tb-clustermd1'" --systemid tb-clustermd2 vg1
  Volume group "vg1" successfully changed
[tb-clustermd2 ~]# lvchange -ay vg1/lv1
[tb-clustermd2 ~]# dmsetup ls
vg1-lv1 (254:0)
[tb-clustermd2 ~]#
```

3. this time both sides have dm device.
```
[tb-clustermd1 ~]# dmsetup ls
vg1-lv1 (254:0)
[tb-clustermd2 ~]# dmsetup ls
vg1-lv1 (254:0)
``` 

4. node1 executes lvchange cmds. please note the return value is 0
```
[tb-clustermd1 ~]# lvchange -ay vg1/lv1 ; echo $?
  WARNING: Found LVs active in VG vg1 with foreign system ID tb-clustermd2.  Possible data corruption.
  Cannot activate LVs in a foreign VG.
0
[tb-clustermd1 ~]# dmsetup ls
vg1-lv1 (254:0)
[tb-clustermd1 ~]# lvchange -an vg1/lv1 ; echo $?
  WARNING: Found LVs active in VG vg1 with foreign system ID tb-clustermd2.  Possible data corruption.
0
[tb-clustermd1 ~]# dmsetup ls
No devices found
[tb-clustermd1 ~]#
```

Thanks.

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [linux-lvm] issue about return value in _lvchange_activate_single
  2020-11-17  3:52 [linux-lvm] issue about return value in _lvchange_activate_single heming.zhao
@ 2020-11-17 15:45 ` David Teigland
  2020-11-18 14:45   ` heming.zhao
  0 siblings, 1 reply; 3+ messages in thread
From: David Teigland @ 2020-11-17 15:45 UTC (permalink / raw)
  To: heming.zhao; +Cc: Zdenek Kabelac, LVM general discussion and development

On Tue, Nov 17, 2020 at 11:52:28AM +0800, heming.zhao@suse.com wrote:
> In lvm functions, it treats "return 0" as error case.
> 
> if _lvchange_activate() return ECMD_FAILED, the caller _lvchange_activate_single() think as normal:
> ```
> if (!_lvchange_activate(cmd, lv)) <== ECMD_FAILED is 5, won't enter if case.
>         return_ECMD_FAILED;

Thanks for finding that.  In some places 0 is the error value and in other
places ECMD_FAILED is the error value; they frequently get mixed up.
I believe this is the bug you are seeing:
https://sourceware.org/git/?p=lvm2.git;a=commit;h=aba9652e584b6f6a422233dea951eb59326a3de2

> 2. node2 change the systemid to itself
> 
> ```
> [tb-clustermd2 ~]# vgchange -y --config "local/extra_system_ids='tb-clustermd1'" --systemid tb-clustermd2 vg1
>   Volume group "vg1" successfully changed
> [tb-clustermd2 ~]# lvchange -ay vg1/lv1
> [tb-clustermd2 ~]# dmsetup ls
> vg1-lv1 (254:0)

This is what the LVM-activate resource agent does, except it wouldn't be
done while the LV is active on another running host.  Just wanted to
clarify that, I don't think it's the point of your illustration here.

> 3. this time both sides have dm device.
> ```
> [tb-clustermd1 ~]# dmsetup ls
> vg1-lv1 (254:0)
> [tb-clustermd2 ~]# dmsetup ls
> vg1-lv1 (254:0)
> ``` 

For the sake of anyone looking at this later, this shouldn't happen in a
properly running cluster.  (If you wanted the LV active on two hosts at
once, you'd use lvmlockd and no system ID on the VG.)

> 4. node1 executes lvchange cmds. please note the return value is 0
> ```
> [tb-clustermd1 ~]# lvchange -ay vg1/lv1 ; echo $?
>   WARNING: Found LVs active in VG vg1 with foreign system ID tb-clustermd2.  Possible data corruption.
>   Cannot activate LVs in a foreign VG.
> 0

That's the one fixed by the commit above.

Dave

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [linux-lvm] issue about return value in _lvchange_activate_single
  2020-11-17 15:45 ` David Teigland
@ 2020-11-18 14:45   ` heming.zhao
  0 siblings, 0 replies; 3+ messages in thread
From: heming.zhao @ 2020-11-18 14:45 UTC (permalink / raw)
  To: David Teigland; +Cc: Zdenek Kabelac, LVM general discussion and development

On 11/17/20 11:45 PM, David Teigland wrote:
> On Tue, Nov 17, 2020 at 11:52:28AM +0800, heming.zhao@suse.com wrote:
>> In lvm functions, it treats "return 0" as error case.
>>
>> if _lvchange_activate() return ECMD_FAILED, the caller _lvchange_activate_single() think as normal:
>> ```
>> if (!_lvchange_activate(cmd, lv)) <== ECMD_FAILED is 5, won't enter if case.
>>          return_ECMD_FAILED;
> 
> Thanks for finding that.  In some places 0 is the error value and in other
> places ECMD_FAILED is the error value; they frequently get mixed up.
> I believe this is the bug you are seeing:
> https://sourceware.org/git/?p=lvm2.git;a=commit;h=aba9652e584b6f6a422233dea951eb59326a3de2
> 

I recommend to use a define (e.g. E_COMM_ERR) not magic number (ZERO) to replace return value. 

>> 2. node2 change the systemid to itself
>>
>> ```
>> [tb-clustermd2 ~]# vgchange -y --config "local/extra_system_ids='tb-clustermd1'" --systemid tb-clustermd2 vg1
>>    Volume group "vg1" successfully changed
>> [tb-clustermd2 ~]# lvchange -ay vg1/lv1
>> [tb-clustermd2 ~]# dmsetup ls
>> vg1-lv1 (254:0)
> 
> This is what the LVM-activate resource agent does, except it wouldn't be
> done while the LV is active on another running host.  Just wanted to
> clarify that, I don't think it's the point of your illustration here.
> 
>> 3. this time both sides have dm device.
>> ```
>> [tb-clustermd1 ~]# dmsetup ls
>> vg1-lv1 (254:0)
>> [tb-clustermd2 ~]# dmsetup ls
>> vg1-lv1 (254:0)
>> ```
> 
> For the sake of anyone looking at this later, this shouldn't happen in a
> properly running cluster.  (If you wanted the LV active on two hosts at
> once, you'd use lvmlockd and no system ID on the VG.)
> 
>> 4. node1 executes lvchange cmds. please note the return value is 0
>> ```
>> [tb-clustermd1 ~]# lvchange -ay vg1/lv1 ; echo $?
>>    WARNING: Found LVs active in VG vg1 with foreign system ID tb-clustermd2.  Possible data corruption.
>>    Cannot activate LVs in a foreign VG.
>> 0
> 
> That's the one fixed by the commit above.
> 
> Dave
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-11-18 14:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-17  3:52 [linux-lvm] issue about return value in _lvchange_activate_single heming.zhao
2020-11-17 15:45 ` David Teigland
2020-11-18 14:45   ` heming.zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).