All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] new multipath device mistakenly replaced another PV in existing volume group
@ 2017-04-19 21:53 Neutron Sharc
  2017-04-20 16:04 ` David Teigland
  0 siblings, 1 reply; 5+ messages in thread
From: Neutron Sharc @ 2017-04-19 21:53 UTC (permalink / raw)
  To: linux-lvm

I'm seeing a strange problem (iscsi LUNs + multipath + lvm) that I
will walk through an example.

I have an iscsi target machine that exposes many iscsi LUNs. Iscsi
initiator logs in 4 iscsi LUNs (vol1_[0-4]),  creates a multipath
device for each LUN (/dev/mapper/vol1_[0-4]), and combines the 4
multipath devices into a volume group and LV (vol1/vol1_lv).

Then I log in another 4 iscsi LUNs (vol3_[0-3]), create a multipath
device for each new LUN (/dev/mapper/vol3_[0-3]).  Now there is a
strange thing:
some new multipath devices (/dev/mapper/vol3_0, /dev/mapper/vol3_2)
replaced existing PVs in vol1.  As a result, these new multipath
devices have open-count > 0, so I cannot pvcreate on them:

$ sudo dmsetup ls --tree
vol1-vol1_lv (252:4)
 ├─vol3_0 (252:9)   <==  fresh multipath device, should NOT be in vol1
 │  └─ (65:128)
 ├─vol3_2 (252:10)  <==  fresh multipath device, should NOT be in vol1
 │  └─ (65:144)
 ├─vol1_1 (252:1)
 │  └─ (65:48)
 └─vol1_0 (252:0)
    └─ (65:32)
vol1_3 (252:3)  <== was in vol1-vol1_lv, but was knocked out
 └─ (65:16)
vol1_2 (252:2)  <== was in vol1-vol1_lv, but was knocked out
 └─ (65:64)
vol3_3 (252:11)
 └─ (65:160)
vol3_1 (252:12)
 └─ (65:176)


Please note that all these vol3_[0-3] are fresh, without any LVM
metadata on them, as shown by pvscan::
sudo pvscan --cache /dev/mapper/vol3_0
  Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
  Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
  Incorrect metadata area header checksum on /dev/mapper/vol3_0@offset 4096


$ sudo pvcreate /dev/mapper/vol3_0   <== this multipath device was
mistakenly included into vol1

  Found duplicate PV 9F6vU9NVBfEq1w3e04T5UreO6fDVPJNy: using
/dev/mapper/vol1_3 not /dev/mapper/vol3_0
  Using duplicate PV /dev/mapper/vol1_3 without holders, replacing
/dev/mapper/vol3_0
  Can't open /dev/mapper/vol3_0 exclusively.  Mounted filesystem?


========== Configs I used:
BTW,  I have enabled lvmetad,  my lvm.conf has this:
filter = [ "a|/dev/mapper/.*|", "r|.*|" ]
global_filter = [ "a|/dev/mapper/.*|", "r|.*|" ]

My /etc/multipath.conf is:
defaults {
  user_friendly_names     yes
  path_grouping_policy    failover
  polling_interval        10
  path_selector           "round-robin 0"
  find_multipaths         yes
  features       "1 queue_if_no_path"
}
blacklist {
  devnode  "^sda[1-9]"
}
multipaths {
  multipath {
    wwid 360000000758757de9fb289cbde12abab
    alias vol1_0
  }
  // more devices
}

iscsi initiator is on centos 6.5, with pkgs version:
lvm2-2.02.143-12.el6.x86_64
device-mapper-multipath-0.4.9-100.el6.x86_64

iscsi target is tgtd on another Ubuntu machine.



Comments are appreciated.


-Shawn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] new multipath device mistakenly replaced another PV in existing volume group
  2017-04-19 21:53 [linux-lvm] new multipath device mistakenly replaced another PV in existing volume group Neutron Sharc
@ 2017-04-20 16:04 ` David Teigland
  2017-04-20 16:24   ` David Teigland
  0 siblings, 1 reply; 5+ messages in thread
From: David Teigland @ 2017-04-20 16:04 UTC (permalink / raw)
  To: Neutron Sharc; +Cc: linux-lvm

On Wed, Apr 19, 2017 at 02:53:19PM -0700, Neutron Sharc wrote:
> Please note that all these vol3_[0-3] are fresh, without any LVM
> metadata on them, as shown by pvscan::
> sudo pvscan --cache /dev/mapper/vol3_0
>   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096

That error indicates there is LVM metadata on them.

>   Found duplicate PV 9F6vU9NVBfEq1w3e04T5UreO6fDVPJNy: using
> /dev/mapper/vol1_3 not /dev/mapper/vol3_0

That error indicates it's the same LVM metadata on them.

Perhaps you're exporting the same data source on the back end via separate
devices, or have copied the data sources, or are using snapshots of them.

Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] new multipath device mistakenly replaced another PV in existing volume group
  2017-04-20 16:04 ` David Teigland
@ 2017-04-20 16:24   ` David Teigland
  2017-04-22  2:13     ` Neutron Sharc
  0 siblings, 1 reply; 5+ messages in thread
From: David Teigland @ 2017-04-20 16:24 UTC (permalink / raw)
  To: Neutron Sharc; +Cc: linux-lvm

On Thu, Apr 20, 2017 at 11:04:32AM -0500, David Teigland wrote:
> On Wed, Apr 19, 2017 at 02:53:19PM -0700, Neutron Sharc wrote:
> > Please note that all these vol3_[0-3] are fresh, without any LVM
> > metadata on them, as shown by pvscan::
> > sudo pvscan --cache /dev/mapper/vol3_0
> >   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
> >   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
> >   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
> 
> That error indicates there is LVM metadata on them.
> 
> >   Found duplicate PV 9F6vU9NVBfEq1w3e04T5UreO6fDVPJNy: using
> > /dev/mapper/vol1_3 not /dev/mapper/vol3_0
> 
> That error indicates it's the same LVM metadata on them.
> 
> Perhaps you're exporting the same data source on the back end via separate
> devices, or have copied the data sources, or are using snapshots of them.

Also, if you upgrade to a new version of lvm, there is better checking and
reporting for these conditions.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] new multipath device mistakenly replaced another PV in existing volume group
  2017-04-20 16:24   ` David Teigland
@ 2017-04-22  2:13     ` Neutron Sharc
  0 siblings, 0 replies; 5+ messages in thread
From: Neutron Sharc @ 2017-04-22  2:13 UTC (permalink / raw)
  To: David Teigland; +Cc: linux-lvm

Thank you David for replying.

The problem turns out to be caused by stale read buffer at backend.

When lvm read a fresh iscsi LUN2 at backend, the read will miss.  So
the read buffer isn't updated, and it contains previous data which may
be metadata of other LUN1.  LVM receives this data and think lun2 has
the same metadata as lun1, and starts to get confused.

I made a fix to zero out read buffer on a read miss at backend.  Now
the problem is gone.

btw,  I'm running centos 6.5 with lvm 2.02.143.   The latest lvm is
2.02.169 (or later?).  How to upgrade to the latest lvm2 on centos
6.5?








On Thu, Apr 20, 2017 at 9:24 AM, David Teigland <teigland@redhat.com> wrote:
> On Thu, Apr 20, 2017 at 11:04:32AM -0500, David Teigland wrote:
>> On Wed, Apr 19, 2017 at 02:53:19PM -0700, Neutron Sharc wrote:
>> > Please note that all these vol3_[0-3] are fresh, without any LVM
>> > metadata on them, as shown by pvscan::
>> > sudo pvscan --cache /dev/mapper/vol3_0
>> >   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>> >   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>> >   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>>
>> That error indicates there is LVM metadata on them.
>>
>> >   Found duplicate PV 9F6vU9NVBfEq1w3e04T5UreO6fDVPJNy: using
>> > /dev/mapper/vol1_3 not /dev/mapper/vol3_0
>>
>> That error indicates it's the same LVM metadata on them.
>>
>> Perhaps you're exporting the same data source on the back end via separate
>> devices, or have copied the data sources, or are using snapshots of them.
>
> Also, if you upgrade to a new version of lvm, there is better checking and
> reporting for these conditions.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] new multipath device mistakenly replaced another PV in existing volume group
       [not found] <971194068.8176464.1492969500597.ref@mail.yahoo.com>
@ 2017-04-23 17:45 ` matthew patton
  0 siblings, 0 replies; 5+ messages in thread
From: matthew patton @ 2017-04-23 17:45 UTC (permalink / raw)
  To: David Teigland, LVM general discussion and development

> I made a fix to zero out read buffer on a read miss at backend. 

how many more land mines like this exist? One *ALWAYS* zero buffers before use.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-23 17:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-19 21:53 [linux-lvm] new multipath device mistakenly replaced another PV in existing volume group Neutron Sharc
2017-04-20 16:04 ` David Teigland
2017-04-20 16:24   ` David Teigland
2017-04-22  2:13     ` Neutron Sharc
     [not found] <971194068.8176464.1492969500597.ref@mail.yahoo.com>
2017-04-23 17:45 ` matthew patton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.