Hello LVM-maintainers, Currently activation/auto_activation_volume_list is not enable and it does the default behavior: pvscan will activate all the devices on booting. This rule will trigger a clumsy process in HA (corosync+pacemaker stack) env. ## let me show the scenario: 2 nodes (A & B) share a disk, and using systemid to manage vg/lv on this shared disk. (keep the activation/auto_activation_volume_list default style: comment out this cfg item) (below steps come from resource-agent LVM-active script) 1. Node A own & active shared vg/lv, node B standby status. 2. A reboot, B detect & wait for A rejoined cluster. 3. because systemid doesn't be changed, lvm2-pvscan@.service will active the vg/lv on A during booting. 4. A finishes reboot, B starts to switch systemid & active shared vg/lv. 5. on B, pacemaker detects lvm resource is running on both nodes. 6. on B, pacemaker restarts lvm resource and enable it on single node. ## rootcause: we can see step 3,4,5 is useless if step 3 is non-existent. So the rootcause is step <3>: node A auto activate shared vg/lv. ## discussion (how to fix): Could activation/auto_activation_volume_list support a new symbol/function like "!". e.g. auto_activation_volume_list = [ "!vg1", "!vg2/lvol1" ] the '!' means lvm absolutely doesn't active this vg1 & vg2/lvol1 automatically. my question: Does it acceptable for LVM2 adding this new function? Thanks.
On Tue, Nov 17, 2020 at 04:00:10PM +0800, heming.zhao@suse.com wrote: > Hello LVM-maintainers, > > Currently activation/auto_activation_volume_list is not enable and it does the default behavior: > pvscan will activate all the devices on booting. > > This rule will trigger a clumsy process in HA (corosync+pacemaker stack) env. > > ## let me show the scenario: > > 2 nodes (A & B) share a disk, and using systemid to manage vg/lv on this shared disk. > (keep the activation/auto_activation_volume_list default style: comment out this cfg item) > > (below steps come from resource-agent LVM-active script) > 1. Node A own & active shared vg/lv, node B standby status. > 2. A reboot, B detect & wait for A rejoined cluster. > 3. because systemid doesn't be changed, lvm2-pvscan@.service will active the vg/lv on A during booting. > 4. A finishes reboot, B starts to switch systemid & active shared vg/lv. > 5. on B, pacemaker detects lvm resource is running on both nodes. > 6. on B, pacemaker restarts lvm resource and enable it on single node. > > ## rootcause: > > we can see step 3,4,5 is useless if step 3 is non-existent. > So the rootcause is step <3>: node A auto activate shared vg/lv. I believe there's an assumption that the system or a user will not activate LVs that are managed by the cluster, i.e. only LVM-activate will activate LVs managed by the cluster. Perhaps we could make some attempt to enforce that, or at least make sure the instructions for LVM-activate make it clear what to do. > ## discussion (how to fix): > > Could activation/auto_activation_volume_list support a new symbol/function like "!". > e.g. > auto_activation_volume_list = [ "!vg1", "!vg2/lvol1" ] > the '!' means lvm absolutely doesn't active this vg1 & vg2/lvol1 automatically. > > my question: > Does it acceptable for LVM2 adding this new function? auto_activation_volume_list is difficult to use IMO, and I don't think many people use it. Your suggestion sounds reasonable, but I've wondered if autoactivation should be a property set on the VG or LV itself (i.e. in the metadata)? The "activationskip" flag is a possible way to handle the unwanted autoactivation, and also seems to justify the idea of making autoactivation a similar flag. Dave
On 11/18/2020 12:17 AM, David Teigland wrote: > On Tue, Nov 17, 2020 at 04:00:10PM +0800, heming.zhao@suse.com wrote: >> Hello LVM-maintainers, >> >> Currently activation/auto_activation_volume_list is not enable and it does the default behavior: >> pvscan will activate all the devices on booting. >> >> This rule will trigger a clumsy process in HA (corosync+pacemaker stack) env. >> >> ## let me show the scenario: >> >> 2 nodes (A & B) share a disk, and using systemid to manage vg/lv on this shared disk. >> (keep the activation/auto_activation_volume_list default style: comment out this cfg item) >> >> (below steps come from resource-agent LVM-active script) >> 1. Node A own & active shared vg/lv, node B standby status. >> 2. A reboot, B detect & wait for A rejoined cluster. >> 3. because systemid doesn't be changed, lvm2-pvscan@.service will active the vg/lv on A during booting. >> 4. A finishes reboot, B starts to switch systemid & active shared vg/lv. >> 5. on B, pacemaker detects lvm resource is running on both nodes. >> 6. on B, pacemaker restarts lvm resource and enable it on single node. >> >> ## rootcause: >> >> we can see step 3,4,5 is useless if step 3 is non-existent. >> So the rootcause is step <3>: node A auto activate shared vg/lv. > > I believe there's an assumption that the system or a user will not > activate LVs that are managed by the cluster, i.e. only LVM-activate will > activate LVs managed by the cluster. Perhaps we could make some attempt > to enforce that, or at least make sure the instructions for LVM-activate > make it clear what to do. > >> ## discussion (how to fix): >> >> Could activation/auto_activation_volume_list support a new symbol/function like "!". >> e.g. >> auto_activation_volume_list = [ "!vg1", "!vg2/lvol1" ] >> the '!' means lvm absolutely doesn't active this vg1 & vg2/lvol1 automatically. >> >> my question: >> Does it acceptable for LVM2 adding this new function? > > auto_activation_volume_list is difficult to use IMO, and I don't think > many people use it. Your suggestion sounds reasonable, but I've wondered > if autoactivation should be a property set on the VG or LV itself (i.e. > in the metadata)? The "activationskip" flag is a possible way to handle > the unwanted autoactivation, and also seems to justify the idea of making > autoactivation a similar flag. I prefer to use a metadata flag for each VG or LV to skip auto-activation. Otherwise, it is not easy for the pacemaker cluster to manager a local VG(e.g. local or systemid type) in a cluster via active-passive mode. Thanks Gang > > Dave > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >
On 11/18/20 12:17 AM, David Teigland wrote: > On Tue, Nov 17, 2020 at 04:00:10PM +0800, heming.zhao@suse.com wrote: >> Hello LVM-maintainers, >> >> Currently activation/auto_activation_volume_list is not enable and it does the default behavior: >> pvscan will activate all the devices on booting. >> >> This rule will trigger a clumsy process in HA (corosync+pacemaker stack) env. >> >> ## let me show the scenario: >> >> 2 nodes (A & B) share a disk, and using systemid to manage vg/lv on this shared disk. >> (keep the activation/auto_activation_volume_list default style: comment out this cfg item) >> >> (below steps come from resource-agent LVM-active script) >> 1. Node A own & active shared vg/lv, node B standby status. >> 2. A reboot, B detect & wait for A rejoined cluster. >> 3. because systemid doesn't be changed, lvm2-pvscan@.service will active the vg/lv on A during booting. >> 4. A finishes reboot, B starts to switch systemid & active shared vg/lv. >> 5. on B, pacemaker detects lvm resource is running on both nodes. >> 6. on B, pacemaker restarts lvm resource and enable it on single node. >> >> ## rootcause: >> >> we can see step 3,4,5 is useless if step 3 is non-existent. >> So the rootcause is step <3>: node A auto activate shared vg/lv. > > I believe there's an assumption that the system or a user will not > activate LVs that are managed by the cluster, i.e. only LVM-activate will > activate LVs managed by the cluster. Perhaps we could make some attempt > to enforce that, or at least make sure the instructions for LVM-activate > make it clear what to do. > I agree this assumption. >> ## discussion (how to fix): >> >> Could activation/auto_activation_volume_list support a new symbol/function like "!". >> e.g. >> auto_activation_volume_list = [ "!vg1", "!vg2/lvol1" ] >> the '!' means lvm absolutely doesn't active this vg1 & vg2/lvol1 automatically. >> >> my question: >> Does it acceptable for LVM2 adding this new function? > > auto_activation_volume_list is difficult to use IMO, and I don't think > many people use it. Your suggestion sounds reasonable, but I've wondered > if autoactivation should be a property set on the VG or LV itself (i.e. > in the metadata)? The "activationskip" flag is a possible way to handle > the unwanted autoactivation, and also seems to justify the idea of making > autoactivation a similar flag. > the idea of new flag is very good. and we should have a complete solution. now I am thinking is: when/how to clean this flag. how to manage it without ha stack the normal logic of doing remove action is to be done in RA stop cmd. If the customer doesn't want to follow the rule to stop resource in crmsh. and only use "systemctl stop" & "rm -rf pacemaker & corosync" to stop, the flag won't be remove anymore. (or pacemaker is abnormal, customer must use force method) We should add some new parameters of exist cmds to show & manage this new flag. Thanks
On Wed, Nov 18, 2020 at 09:28:21AM +0800, Gang He wrote: > I prefer to use a metadata flag for each VG or LV to skip auto-activation. > Otherwise, it is not easy for the pacemaker cluster to manager a local > VG(e.g. local or systemid type) in a cluster via active-passive mode. I created a bug for this: https://bugzilla.redhat.com/show_bug.cgi?id=1899214 Dave
Hi David, On 2020/11/19 2:23, David Teigland wrote: > On Wed, Nov 18, 2020 at 09:28:21AM +0800, Gang He wrote: >> I prefer to use a metadata flag for each VG or LV to skip auto-activation. >> Otherwise, it is not easy for the pacemaker cluster to manager a local >> VG(e.g. local or systemid type) in a cluster via active-passive mode. > > I created a bug for this: > https://bugzilla.redhat.com/show_bug.cgi?id=1899214 Thank for your follow-up. More comments here, Should we keep the default behavior like before? e.g. VG/LV should be auto-activated by default like before.Otherwise, some users will feel strange after lvm upgrade. Second, how to keep the compatibility with the existed VG/LV? since we can upgrade lvm2 version, but VG/LV is possible old. I wonder if there are some Reserved Bits in lvm meta-data layout to use? if yes, I feel this proposal is very perfect. Thanks Gang > > Dave > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >
On 11/19/20 2:23 AM, David Teigland wrote:
> On Wed, Nov 18, 2020 at 09:28:21AM +0800, Gang He wrote:
>> I prefer to use a metadata flag for each VG or LV to skip auto-activation.
>> Otherwise, it is not easy for the pacemaker cluster to manager a local
>> VG(e.g. local or systemid type) in a cluster via active-passive mode.
>
> I created a bug for this:
> https://bugzilla.redhat.com/show_bug.cgi?id=1899214
>
Hello Dave
I read the bug ticket and verified the "lvchange -k", it works.
If I understand correctly, the "lvchange -k" is enough for current issue.
And you previous mails wanted to have a new function/flag for unify management vg & lv, am I right?