* [linux-lvm] Mirror fail/recover test @ 2010-02-24 16:14 jose nuno neto 2010-02-24 18:55 ` malahal 0 siblings, 1 reply; 16+ messages in thread From: jose nuno neto @ 2010-02-24 16:14 UTC (permalink / raw) To: linux-lvm Hi I'm trying to test the failure of a SAN Mirrored Lv, and the recover and check for data lost. Im runing RedHat 5.4 2.6.18-164.2.1.el5 lvm2-2.02.46-8.el5_4.1 I create a 2mirror+log lv ok, can lvconvert to one leg only, can delete ok. But when I simulate a disk fail either with dd if=/dev/zero of=pvmirror_device echo offline > /sys/block/pvmirror_device/device/status lvs -a -o +devices stills shows the lv has mirrored ( should switch to non-mirrored right?) , and show Unknonwn device on the pvmirror_device I destroy When I issue a lvconvert -m 0 complains about pv --uuid not found, if I try to pvcreate --uuid pvmirror_device I got a error complaining about still mounted device If I try lvconvert --repair i hangs forever What should be the correct procedure to recover from this? Thanks Jose -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Mirror fail/recover test 2010-02-24 16:14 [linux-lvm] Mirror fail/recover test jose nuno neto @ 2010-02-24 18:55 ` malahal 2010-02-25 10:36 ` jose nuno neto 0 siblings, 1 reply; 16+ messages in thread From: malahal @ 2010-02-24 18:55 UTC (permalink / raw) To: linux-lvm jose nuno neto [jose.neto@liber4e.com] wrote: > Hi > > I'm trying to test the failure of a SAN Mirrored Lv, and the recover and > check for data lost. > > Im runing RedHat 5.4 > 2.6.18-164.2.1.el5 > lvm2-2.02.46-8.el5_4.1 > > I create a 2mirror+log lv ok, can lvconvert to one leg only, can delete ok. > But when I simulate a disk fail either with > dd if=/dev/zero of=pvmirror_device > echo offline > /sys/block/pvmirror_device/device/status What is the output of "dmsetup status" at this point? There must be some messages in the /var/log/messages file if you enable them. > lvs -a -o +devices > stills shows the lv has mirrored ( should switch to non-mirrored right?) , Yes, provided you successfully started the dmeventd monitoring thread and it handled the failure event. Thanks, Malahal. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Mirror fail/recover test 2010-02-24 18:55 ` malahal @ 2010-02-25 10:36 ` jose nuno neto 2010-02-25 16:11 ` malahal 0 siblings, 1 reply; 16+ messages in thread From: jose nuno neto @ 2010-02-25 10:36 UTC (permalink / raw) To: LVM general discussion and development Much thanks for your interest im putting more info below > jose nuno neto [jose.neto@liber4e.com] wrote: >> Hi >> >> I'm trying to test the failure of a SAN Mirrored Lv, and the recover and >> check for data lost. >> >> Im runing RedHat 5.4 >> 2.6.18-164.2.1.el5 >> lvm2-2.02.46-8.el5_4.1 >> >> I create a 2mirror+log lv ok, can lvconvert to one leg only, can delete >> ok. >> But when I simulate a disk fail either with >> dd if=/dev/zero of=pvmirror_device >> echo offline > /sys/block/pvmirror_device/device/status > > What is the output of "dmsetup status" at this point? > There must be some messages in the /var/log/messages file if you enable > them. This is my setup for the device Im unppluging multipath -l -v2 | grep -A 7 3600a0b800048f9b200000c2b4b5980b7 mpath12 (3600a0b800048f9b200000c2b4b5980b7) dm-8 SUN,CSM200_R [size=52G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] \_ round-robin 0 [prio=0][enabled] \_ 7:0:1:1 sdo 8:224 [active][undef] \_ 9:0:1:1 sdq 65:0 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 7:0:0:1 sdd 8:48 [active][undef] \_ 9:0:0:1 sdf 8:80 [active][undef] Before UnPluging dmsetup status mpath12 0 109051904 multipath 2 0 0 0 2 1 E 0 2 0 8:224 A 0 65:0 A 0 E 0 2 0 8:48 A 0 8:80 A 0 echo offline > /sys/block/sdd/device/state echo offline > /sys/block/sdo/device/state echo offline > /sys/block/sdq/device/state echo offline > /sys/block/sdf/device/state dmsetup status mpath12 0 109051904 multipath 2 0 0 0 2 1 E 0 2 0 8:224 F 1 65:0 F 1 E 0 2 0 8:48 F 1 8:80 F 1 Feb 25 11:10:32 malta9 multipathd: sdd: rdac checker reports path is down Feb 25 11:10:32 malta9 multipathd: sdd: rdac checker reports path is down Feb 25 11:10:32 malta9 multipathd: sdo: rdac checker reports path is down Feb 25 11:10:32 malta9 multipathd: sdo: rdac checker reports path is down Feb 25 11:10:32 malta9 multipathd: sdq: rdac checker reports path is down Feb 25 11:10:32 malta9 multipathd: sdq: rdac checker reports path is down Feb 25 11:10:32 malta9 multipathd: dm-8: devmap already registered Feb 25 11:10:32 malta9 multipathd: dm-8: devmap already registered Feb 25 11:10:52 malta9 multipathd: sdf: rdac checker reports path is down Feb 25 11:10:52 malta9 multipathd: sdf: rdac checker reports path is down Feb 25 11:10:52 malta9 multipathd: dm-8: devmap already registered Feb 25 11:10:52 malta9 multipathd: dm-8: devmap already registered dmeventd is running root 4601 0.0 0.1 96272 69668 ? S<Lsl Feb24 0:00 [dmeventd] Also I have lvm.conf option ignore_suspended_devices = 1 that should prevent this right? > >> lvs -a -o +devices >> stills shows the lv has mirrored ( should switch to non-mirrored right?) >> , > > Yes, provided you successfully started the dmeventd monitoring thread > and it handled the failure event. > > Thanks, Malahal. > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > lvm.conf devices { dir = "/dev" scan = [ "/dev" ] preferred_names = [ ] filter = [ "r/disk/", "r/sd.*/", "a/.*/" ] cache_dir = "/etc/lvm/cache" cache_file_prefix = "" write_cache_state = 1 sysfs_scan = 1 md_component_detection = 1 md_chunk_alignment = 1 data_alignment = 0 ignore_suspended_devices = 1 } log { verbose = 1 syslog = 1 file = "/var/log/lvm2.log" overwrite = 0 level = 0 indent = 1 command_names = 1 prefix = " " } backup { backup = 1 backup_dir = "/etc/lvm/backup" archive = 1 archive_dir = "/etc/lvm/archive" retain_min = 10 retain_days = 30 } shell { history_size = 100 } global { library_dir = "/usr/lib64" umask = 077 test = 0 units = "h" activation = 1 proc = "/proc" locking_type = 1 fallback_to_clustered_locking = 1 fallback_to_local_locking = 1 locking_dir = "/var/lock/lvm" } activation { missing_stripe_filler = "error" reserved_stack = 256 reserved_memory = 8192 process_priority = -18 volume_list = [ "rootvg" , "@cluster.test" ] mirror_region_size = 512 readahead = "auto" mirror_log_fault_policy = "allocate" mirror_device_fault_policy = "remove" } dmeventd { mirror_library = "libdevmapper-event-lvm2mirror.so" snapshot_library = "libdevmapper-event-lvm2snapshot.so" } -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Mirror fail/recover test 2010-02-25 10:36 ` jose nuno neto @ 2010-02-25 16:11 ` malahal 2010-03-02 10:31 ` [linux-lvm] Mirror fail/recover test SOLVED jose nuno neto 0 siblings, 1 reply; 16+ messages in thread From: malahal @ 2010-02-25 16:11 UTC (permalink / raw) To: linux-lvm jose nuno neto [jose.neto@liber4e.com] wrote: > Much thanks for your interest > im putting more info below > > > jose nuno neto [jose.neto@liber4e.com] wrote: > >> Hi > >> > >> I'm trying to test the failure of a SAN Mirrored Lv, and the recover and > >> check for data lost. > >> > >> Im runing RedHat 5.4 > >> 2.6.18-164.2.1.el5 > >> lvm2-2.02.46-8.el5_4.1 > > multipath -l -v2 | grep -A 7 3600a0b800048f9b200000c2b4b5980b7 > mpath12 (3600a0b800048f9b200000c2b4b5980b7) dm-8 SUN,CSM200_R > [size=52G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] > \_ round-robin 0 [prio=0][enabled] > \_ 7:0:1:1 sdo 8:224 [active][undef] > \_ 9:0:1:1 sdq 65:0 [active][undef] > \_ round-robin 0 [prio=0][enabled] > \_ 7:0:0:1 sdd 8:48 [active][undef] > \_ 9:0:0:1 sdf 8:80 [active][undef] > > Before UnPluging > dmsetup status mpath12 > 0 109051904 multipath 2 0 0 0 2 1 E 0 2 0 8:224 A 0 65:0 A 0 E 0 2 0 8:48 > A 0 8:80 A 0 > > echo offline > /sys/block/sdd/device/state > echo offline > /sys/block/sdo/device/state > echo offline > /sys/block/sdq/device/state > echo offline > /sys/block/sdf/device/state > > dmsetup status mpath12 > 0 109051904 multipath 2 0 0 0 2 1 E 0 2 0 8:224 F 1 65:0 F 1 E 0 2 0 8:48 > F 1 8:80 F 1 I was actually asking for "dmsetup status <mirror-device>" rather than multipath device. I didn't know that you were using multipath devices!!! Anyway, looks like you have mpath12 that probably queues I/O on path failures rather than failing them back to upper layers. In other words, if you were to run "dd" or any other app to mpath12, it would hang too. mpath12 seems to keep the request and forever wait for the paths to become available again in your case. If you really want it to fail, configure your multipath accordingly. Thanks, Malahal. PS: "features=1 queue_if_no_path" in your 'multipath -ll' output is the source of error here... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Mirror fail/recover test SOLVED 2010-02-25 16:11 ` malahal @ 2010-03-02 10:31 ` jose nuno neto 2010-04-14 15:03 ` [linux-lvm] Lvm hangs on San fail jose nuno neto 0 siblings, 1 reply; 16+ messages in thread From: jose nuno neto @ 2010-03-02 10:31 UTC (permalink / raw) To: LVM general discussion and development Hi Much thanks for your insights, you're right multipath was keep the I/O on queue and lvm didn't fail the mirror I had to create a device section on multipath.conf so that the features=0 would be on used by multipath Best Regards Jose > jose nuno neto [jose.neto@liber4e.com] wrote: >> Much thanks for your interest >> im putting more info below >> >> > jose nuno neto [jose.neto@liber4e.com] wrote: >> >> Hi >> >> >> >> I'm trying to test the failure of a SAN Mirrored Lv, and the recover >> and >> >> check for data lost. >> >> >> >> Im runing RedHat 5.4 >> >> 2.6.18-164.2.1.el5 >> >> lvm2-2.02.46-8.el5_4.1 >> >> multipath -l -v2 | grep -A 7 3600a0b800048f9b200000c2b4b5980b7 >> mpath12 (3600a0b800048f9b200000c2b4b5980b7) dm-8 SUN,CSM200_R >> [size=52G][features=1 queue_if_no_path][hwhandler=1 rdac][rw] >> \_ round-robin 0 [prio=0][enabled] >> \_ 7:0:1:1 sdo 8:224 [active][undef] >> \_ 9:0:1:1 sdq 65:0 [active][undef] >> \_ round-robin 0 [prio=0][enabled] >> \_ 7:0:0:1 sdd 8:48 [active][undef] >> \_ 9:0:0:1 sdf 8:80 [active][undef] >> >> Before UnPluging >> dmsetup status mpath12 >> 0 109051904 multipath 2 0 0 0 2 1 E 0 2 0 8:224 A 0 65:0 A 0 E 0 2 0 >> 8:48 >> A 0 8:80 A 0 >> >> echo offline > /sys/block/sdd/device/state >> echo offline > /sys/block/sdo/device/state >> echo offline > /sys/block/sdq/device/state >> echo offline > /sys/block/sdf/device/state >> >> dmsetup status mpath12 >> 0 109051904 multipath 2 0 0 0 2 1 E 0 2 0 8:224 F 1 65:0 F 1 E 0 2 0 >> 8:48 >> F 1 8:80 F 1 > > I was actually asking for "dmsetup status <mirror-device>" rather than > multipath device. I didn't know that you were using multipath devices!!! > > Anyway, looks like you have mpath12 that probably queues I/O on path > failures rather than failing them back to upper layers. In other words, > if you were to run "dd" or any other app to mpath12, it would hang too. > > mpath12 seems to keep the request and forever wait for the paths to > become available again in your case. If you really want it to fail, > configure your multipath accordingly. > > Thanks, Malahal. > PS: "features=1 queue_if_no_path" in your 'multipath -ll' output is the > source of error here... > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [linux-lvm] Lvm hangs on San fail 2010-03-02 10:31 ` [linux-lvm] Mirror fail/recover test SOLVED jose nuno neto @ 2010-04-14 15:03 ` jose nuno neto 2010-04-14 17:38 ` Eugene Vilensky 2010-04-14 23:02 ` brem belguebli 0 siblings, 2 replies; 16+ messages in thread From: jose nuno neto @ 2010-04-14 15:03 UTC (permalink / raw) To: linux-lvm Hi2all I'm on RHEL 5.4 with lvm2-2.02.46-8.el5_4.1 2.6.18-164.2.1.el5 I have a multipathed SAN connection with what Im builing LVs Its a Cluster system, and I want LVs to switch on failure If I simulate a fail through the OS via /sys/bus/scsi/devices/$DEVICE/delete I get a LV fail and the service switch to other node But if I do it "real" portdown on the SAN Switch, multipath reports path down, but LVM commands hang forever and nothing gets switched from the logs i see multipath failing paths, and lvm Failed to remove faulty "devices" Any ideas how I should "fix" it? Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has failed. Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in vg_ora_scapa-lv_ora_scapa_redo Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an event. Waiting... Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active paths: 0 Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active paths: 0 Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active paths: 0 Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active paths: 0 Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in vg_syb_roger-lv_syb_roger_admin Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices in vg_syb_roger-lv_syb_roger_admin Much Thanks Jose ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-14 15:03 ` [linux-lvm] Lvm hangs on San fail jose nuno neto @ 2010-04-14 17:38 ` Eugene Vilensky 2010-04-14 23:02 ` brem belguebli 1 sibling, 0 replies; 16+ messages in thread From: Eugene Vilensky @ 2010-04-14 17:38 UTC (permalink / raw) To: LVM general discussion and development What is your multipath.conf setting of "queue_if_no_path? On 4/14/10, jose nuno neto <jose.neto@liber4e.com> wrote: > Hi2all > > I'm on RHEL 5.4 with > lvm2-2.02.46-8.el5_4.1 > 2.6.18-164.2.1.el5 > > I have a multipathed SAN connection with what Im builing LVs > Its a Cluster system, and I want LVs to switch on failure > > If I simulate a fail through the OS via /sys/bus/scsi/devices/$DEVICE/delete > I get a LV fail and the service switch to other node > > But if I do it "real" portdown on the SAN Switch, multipath reports path > down, but LVM commands hang forever and nothing gets switched > > from the logs i see multipath failing paths, and lvm Failed to remove faulty > "devices" > > Any ideas how I should "fix" it? > > Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has failed. > Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in > vg_ora_scapa-lv_ora_scapa_redo > Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an > event. Waiting... > > Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active > paths: 0 > Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active > paths: 0 > Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active > paths: 0 > Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active > paths: 0 > > Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in > vg_syb_roger-lv_syb_roger_admin > Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices in > vg_syb_roger-lv_syb_roger_admin > > Much Thanks > Jose > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > -- Sent from my mobile device Regards, Eugene Vilensky evilensky@gmail.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-14 15:03 ` [linux-lvm] Lvm hangs on San fail jose nuno neto 2010-04-14 17:38 ` Eugene Vilensky @ 2010-04-14 23:02 ` brem belguebli 2010-04-15 8:29 ` jose nuno neto 1 sibling, 1 reply; 16+ messages in thread From: brem belguebli @ 2010-04-14 23:02 UTC (permalink / raw) To: LVM general discussion and development post your multipath.conf file, you may be queuing forever ? On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote: > Hi2all > > I'm on RHEL 5.4 with > lvm2-2.02.46-8.el5_4.1 > 2.6.18-164.2.1.el5 > > I have a multipathed SAN connection with what Im builing LVs > Its a Cluster system, and I want LVs to switch on failure > > If I simulate a fail through the OS via /sys/bus/scsi/devices/$DEVICE/delete > I get a LV fail and the service switch to other node > > But if I do it "real" portdown on the SAN Switch, multipath reports path > down, but LVM commands hang forever and nothing gets switched > > from the logs i see multipath failing paths, and lvm Failed to remove faulty > "devices" > > Any ideas how I should "fix" it? > > Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has failed. > Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in > vg_ora_scapa-lv_ora_scapa_redo > Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an > event. Waiting... > > Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active > paths: 0 > Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active > paths: 0 > Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active > paths: 0 > Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active > paths: 0 > > Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in > vg_syb_roger-lv_syb_roger_admin > Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices in > vg_syb_roger-lv_syb_roger_admin > > Much Thanks > Jose > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-14 23:02 ` brem belguebli @ 2010-04-15 8:29 ` jose nuno neto 2010-04-15 9:32 ` Bryan Whitehead 2010-04-15 11:59 ` jose nuno neto 0 siblings, 2 replies; 16+ messages in thread From: jose nuno neto @ 2010-04-15 8:29 UTC (permalink / raw) To: LVM general discussion and development GoodMornings This is what I have on multipath.conf blacklist { wwid SSun_VOL0_266DCF4A wwid SSun_VOL0_5875CF4A devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^hd[a-z]" } defaults { user_friendly_names yes } devices { device { vendor "HITACHI" product "OPEN-V" path_grouping_policy group_by_node_name failback immediate no_path_retry fail } device { vendor "IET" product "VIRTUAL-DISK" path_checker tur path_grouping_policy failover failback immediate no_path_retry fail } } As an example this is one LUN. It shoes [features=0] so I'd say it should fail right way mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V -SU [size=26G][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=4][active] \_ 5:0:1:0 sdu 65:64 [active][ready] \_ 5:0:1:16384 sdac 65:192 [active][ready] \_ 5:0:1:32768 sdas 66:192 [active][ready] \_ 5:0:1:49152 sdba 67:64 [active][ready] \_ round-robin 0 [prio=4][enabled] \_ 3:0:1:0 sdaw 67:0 [active][ready] \_ 3:0:1:16384 sdbe 67:128 [active][ready] \_ 3:0:1:32768 sdbi 67:192 [active][ready] \_ 3:0:1:49152 sdbm 68:0 [active][ready] It think they fail since I see this messages from LVM: Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in vg_syb_roger-lv_syb_roger_admin Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices in vg_syb_roger-lv_syb_roger_admin But from some reason LVM cant remove them, any option I should have on lvm.conf? BestRegards Jose > post your multipath.conf file, you may be queuing forever ? > > > > On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote: >> Hi2all >> >> I'm on RHEL 5.4 with >> lvm2-2.02.46-8.el5_4.1 >> 2.6.18-164.2.1.el5 >> >> I have a multipathed SAN connection with what Im builing LVs >> Its a Cluster system, and I want LVs to switch on failure >> >> If I simulate a fail through the OS via >> /sys/bus/scsi/devices/$DEVICE/delete >> I get a LV fail and the service switch to other node >> >> But if I do it "real" portdown on the SAN Switch, multipath reports path >> down, but LVM commands hang forever and nothing gets switched >> >> from the logs i see multipath failing paths, and lvm Failed to remove >> faulty >> "devices" >> >> Any ideas how I should "fix" it? >> >> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has failed. >> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in >> vg_ora_scapa-lv_ora_scapa_redo >> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an >> event. Waiting... >> >> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >> paths: 0 >> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >> paths: 0 >> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >> paths: 0 >> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >> paths: 0 >> >> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >> vg_syb_roger-lv_syb_roger_admin >> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices >> in >> vg_syb_roger-lv_syb_roger_admin >> >> Much Thanks >> Jose >> >> _______________________________________________ >> linux-lvm mailing list >> linux-lvm@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-lvm >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-15 8:29 ` jose nuno neto @ 2010-04-15 9:32 ` Bryan Whitehead 2010-04-15 11:59 ` jose nuno neto 1 sibling, 0 replies; 16+ messages in thread From: Bryan Whitehead @ 2010-04-15 9:32 UTC (permalink / raw) To: LVM general discussion and development Can you post the output of pvdisplay? Also the output of multipath when the port is down? If your multipath output is still showing all paths [active][ready] when you shut a port down, you might need to change the path_checker option. I don't have a Hitachi array but readsector0 (the default) did not work for me, directio does. This could be LVM seeing IO is timing out, but the multipath stuff isn't downing a dead path. On Thu, Apr 15, 2010 at 1:29 AM, jose nuno neto <jose.neto@liber4e.com> wrote: > GoodMornings > > This is what I have on multipath.conf > > blacklist { > � � � �wwid SSun_VOL0_266DCF4A > � � � �wwid SSun_VOL0_5875CF4A > � � � �devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" > � � � �devnode "^hd[a-z]" > } > defaults { > � � � � � � � �user_friendly_names � � � � � � yes > } > devices { > � � � device { > � � � � � � � �vendor � � � � � � � � � � � � �"HITACHI" > � � � � � � � �product � � � � � � � � � � � � "OPEN-V" > � � � � � � � �path_grouping_policy � � � � � �group_by_node_name > � � � � � � � �failback � � � � � � � � � � � �immediate > � � � � � � � �no_path_retry � � � � � � � � � fail > � � � } > � � � device { > � � � � � � � �vendor � � � � � � � � � � � � �"IET" > � � � � � � � �product � � � � � � � � � � � � "VIRTUAL-DISK" > � � � � � � � �path_checker � � � � � � � � � �tur > � � � � � � � �path_grouping_policy � � � � � �failover > � � � � � � � �failback � � � � � � � � � � � �immediate > � � � � � � � �no_path_retry � � � � � � � � � fail > � � � } > } > > As an example this is one LUN. It shoes [features=0] so I'd say it should > fail right way > > mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V � � �-SU > [size=26G][features=0][hwhandler=0][rw] > \_ round-robin 0 [prio=4][active] > �\_ 5:0:1:0 � � sdu �65:64 �[active][ready] > �\_ 5:0:1:16384 sdac 65:192 [active][ready] > �\_ 5:0:1:32768 sdas 66:192 [active][ready] > �\_ 5:0:1:49152 sdba 67:64 �[active][ready] > \_ round-robin 0 [prio=4][enabled] > �\_ 3:0:1:0 � � sdaw 67:0 � [active][ready] > �\_ 3:0:1:16384 sdbe 67:128 [active][ready] > �\_ 3:0:1:32768 sdbi 67:192 [active][ready] > �\_ 3:0:1:49152 sdbm 68:0 � [active][ready] > > It think they fail since I see this messages from LVM: > Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in > vg_syb_roger-lv_syb_roger_admin > Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices in > vg_syb_roger-lv_syb_roger_admin > > But from some reason LVM cant remove them, any option I should have on > lvm.conf? > > BestRegards > Jose >> post your multipath.conf file, you may be queuing forever ? >> >> >> >> On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote: >>> Hi2all >>> >>> I'm on RHEL 5.4 with >>> lvm2-2.02.46-8.el5_4.1 >>> 2.6.18-164.2.1.el5 >>> >>> I have a multipathed SAN connection with what Im builing LVs >>> Its a Cluster system, and I want LVs to switch on failure >>> >>> If I simulate a fail through the OS via >>> /sys/bus/scsi/devices/$DEVICE/delete >>> I get a LV fail and the service switch to other node >>> >>> But if I do it "real" portdown on the SAN Switch, multipath reports path >>> down, but LVM commands hang forever and nothing gets switched >>> >>> from the logs i see multipath failing paths, and lvm Failed to remove >>> faulty >>> "devices" >>> >>> Any ideas how I should �"fix" it? >>> >>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has failed. >>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in >>> vg_ora_scapa-lv_ora_scapa_redo >>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an >>> event. �Waiting... >>> >>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>> paths: 0 >>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>> paths: 0 >>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>> paths: 0 >>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>> paths: 0 >>> >>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >>> vg_syb_roger-lv_syb_roger_admin >>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices >>> in >>> vg_syb_roger-lv_syb_roger_admin >>> >>> Much Thanks >>> Jose >>> >>> _______________________________________________ >>> linux-lvm mailing list >>> linux-lvm@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-lvm >>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> >> >> _______________________________________________ >> linux-lvm mailing list >> linux-lvm@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-lvm >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-15 8:29 ` jose nuno neto 2010-04-15 9:32 ` Bryan Whitehead @ 2010-04-15 11:59 ` jose nuno neto 2010-04-15 12:41 ` Eugene Vilensky 1 sibling, 1 reply; 16+ messages in thread From: jose nuno neto @ 2010-04-15 11:59 UTC (permalink / raw) To: LVM general discussion and development hellos I spent more time on this and it seems since LVM cant write to any pv on the volumes it has lost, it cannot write the failure of the devices and update the metadata on other PVs. So it hangs forever Is this right? > GoodMornings > > This is what I have on multipath.conf > > blacklist { > wwid SSun_VOL0_266DCF4A > wwid SSun_VOL0_5875CF4A > devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" > devnode "^hd[a-z]" > } > defaults { > user_friendly_names yes > } > devices { > device { > vendor "HITACHI" > product "OPEN-V" > path_grouping_policy group_by_node_name > failback immediate > no_path_retry fail > } > device { > vendor "IET" > product "VIRTUAL-DISK" > path_checker tur > path_grouping_policy failover > failback immediate > no_path_retry fail > } > } > > As an example this is one LUN. It shoes [features=0] so I'd say it should > fail right way > > mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V > -SU > [size=26G][features=0][hwhandler=0][rw] > \_ round-robin 0 [prio=4][active] > \_ 5:0:1:0 sdu 65:64 [active][ready] > \_ 5:0:1:16384 sdac 65:192 [active][ready] > \_ 5:0:1:32768 sdas 66:192 [active][ready] > \_ 5:0:1:49152 sdba 67:64 [active][ready] > \_ round-robin 0 [prio=4][enabled] > \_ 3:0:1:0 sdaw 67:0 [active][ready] > \_ 3:0:1:16384 sdbe 67:128 [active][ready] > \_ 3:0:1:32768 sdbi 67:192 [active][ready] > \_ 3:0:1:49152 sdbm 68:0 [active][ready] > > It think they fail since I see this messages from LVM: > Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in > vg_syb_roger-lv_syb_roger_admin > Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices in > vg_syb_roger-lv_syb_roger_admin > > But from some reason LVM cant remove them, any option I should have on > lvm.conf? > > BestRegards > Jose >> post your multipath.conf file, you may be queuing forever ? >> >> >> >> On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote: >>> Hi2all >>> >>> I'm on RHEL 5.4 with >>> lvm2-2.02.46-8.el5_4.1 >>> 2.6.18-164.2.1.el5 >>> >>> I have a multipathed SAN connection with what Im builing LVs >>> Its a Cluster system, and I want LVs to switch on failure >>> >>> If I simulate a fail through the OS via >>> /sys/bus/scsi/devices/$DEVICE/delete >>> I get a LV fail and the service switch to other node >>> >>> But if I do it "real" portdown on the SAN Switch, multipath reports >>> path >>> down, but LVM commands hang forever and nothing gets switched >>> >>> from the logs i see multipath failing paths, and lvm Failed to remove >>> faulty >>> "devices" >>> >>> Any ideas how I should "fix" it? >>> >>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has failed. >>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in >>> vg_ora_scapa-lv_ora_scapa_redo >>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an >>> event. Waiting... >>> >>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>> paths: 0 >>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>> paths: 0 >>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>> paths: 0 >>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>> paths: 0 >>> >>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >>> vg_syb_roger-lv_syb_roger_admin >>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices >>> in >>> vg_syb_roger-lv_syb_roger_admin >>> >>> Much Thanks >>> Jose >>> >>> _______________________________________________ >>> linux-lvm mailing list >>> linux-lvm@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-lvm >>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> >> >> _______________________________________________ >> linux-lvm mailing list >> linux-lvm@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-lvm >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-15 11:59 ` jose nuno neto @ 2010-04-15 12:41 ` Eugene Vilensky 2010-04-16 8:55 ` jose nuno neto 0 siblings, 1 reply; 16+ messages in thread From: Eugene Vilensky @ 2010-04-15 12:41 UTC (permalink / raw) To: LVM general discussion and development Can you show us a pvdisplay or verbose vgdisplay ? On 4/15/10, jose nuno neto <jose.neto@liber4e.com> wrote: > hellos > > I spent more time on this and it seems since LVM cant write to any pv on > the volumes it has lost, it cannot write the failure of the devices and > update the metadata on other PVs. So it hangs forever > > Is this right? > >> GoodMornings >> >> This is what I have on multipath.conf >> >> blacklist { >> wwid SSun_VOL0_266DCF4A >> wwid SSun_VOL0_5875CF4A >> devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" >> devnode "^hd[a-z]" >> } >> defaults { >> user_friendly_names yes >> } >> devices { >> device { >> vendor "HITACHI" >> product "OPEN-V" >> path_grouping_policy group_by_node_name >> failback immediate >> no_path_retry fail >> } >> device { >> vendor "IET" >> product "VIRTUAL-DISK" >> path_checker tur >> path_grouping_policy failover >> failback immediate >> no_path_retry fail >> } >> } >> >> As an example this is one LUN. It shoes [features=0] so I'd say it should >> fail right way >> >> mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V >> -SU >> [size=26G][features=0][hwhandler=0][rw] >> \_ round-robin 0 [prio=4][active] >> \_ 5:0:1:0 sdu 65:64 [active][ready] >> \_ 5:0:1:16384 sdac 65:192 [active][ready] >> \_ 5:0:1:32768 sdas 66:192 [active][ready] >> \_ 5:0:1:49152 sdba 67:64 [active][ready] >> \_ round-robin 0 [prio=4][enabled] >> \_ 3:0:1:0 sdaw 67:0 [active][ready] >> \_ 3:0:1:16384 sdbe 67:128 [active][ready] >> \_ 3:0:1:32768 sdbi 67:192 [active][ready] >> \_ 3:0:1:49152 sdbm 68:0 [active][ready] >> >> It think they fail since I see this messages from LVM: >> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >> vg_syb_roger-lv_syb_roger_admin >> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices in >> vg_syb_roger-lv_syb_roger_admin >> >> But from some reason LVM cant remove them, any option I should have on >> lvm.conf? >> >> BestRegards >> Jose >>> post your multipath.conf file, you may be queuing forever ? >>> >>> >>> >>> On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote: >>>> Hi2all >>>> >>>> I'm on RHEL 5.4 with >>>> lvm2-2.02.46-8.el5_4.1 >>>> 2.6.18-164.2.1.el5 >>>> >>>> I have a multipathed SAN connection with what Im builing LVs >>>> Its a Cluster system, and I want LVs to switch on failure >>>> >>>> If I simulate a fail through the OS via >>>> /sys/bus/scsi/devices/$DEVICE/delete >>>> I get a LV fail and the service switch to other node >>>> >>>> But if I do it "real" portdown on the SAN Switch, multipath reports >>>> path >>>> down, but LVM commands hang forever and nothing gets switched >>>> >>>> from the logs i see multipath failing paths, and lvm Failed to remove >>>> faulty >>>> "devices" >>>> >>>> Any ideas how I should "fix" it? >>>> >>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has failed. >>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in >>>> vg_ora_scapa-lv_ora_scapa_redo >>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an >>>> event. Waiting... >>>> >>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>>> paths: 0 >>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>>> paths: 0 >>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>>> paths: 0 >>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>>> paths: 0 >>>> >>>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >>>> vg_syb_roger-lv_syb_roger_admin >>>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices >>>> in >>>> vg_syb_roger-lv_syb_roger_admin >>>> >>>> Much Thanks >>>> Jose >>>> >>>> _______________________________________________ >>>> linux-lvm mailing list >>>> linux-lvm@redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-lvm >>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >>> >>> >>> _______________________________________________ >>> linux-lvm mailing list >>> linux-lvm@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-lvm >>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >>> >> >> > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > -- Sent from my mobile device Regards, Eugene Vilensky evilensky@gmail.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-15 12:41 ` Eugene Vilensky @ 2010-04-16 8:55 ` jose nuno neto 2010-04-16 20:15 ` Bryan Whitehead 2010-04-17 9:00 ` brem belguebli 0 siblings, 2 replies; 16+ messages in thread From: jose nuno neto @ 2010-04-16 8:55 UTC (permalink / raw) To: LVM general discussion and development Hi > Can you show us a pvdisplay or verbose vgdisplay ? > Here goes the vgdisplay -v of one of the vgs with mirrors ########################################################### --- Volume group --- VG Name vg_ora_jura System ID Format lvm2 Metadata Areas 3 Metadata Sequence No 705 VG Access read/write VG Status resizable MAX LV 0 Cur LV 4 Open LV 4 Max PV 0 Cur PV 3 Act PV 3 VG Size 52.79 GB PE Size 4.00 MB Total PE 13515 Alloc PE / Size 12292 / 48.02 GB Free PE / Size 1223 / 4.78 GB VG UUID nttQ3x-4ecP-Q6ms-jt2u-UIs4-texj-Q9Nxdt --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_arch VG Name vg_ora_jura LV UUID 8oUfYn-2TrP-yS6K-pcS2-cgI4-tcv1-33dSdX LV Write Access read/write LV Status available # open 1 LV Size 5.00 GB Current LE 1280 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:28 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_export VG Name vg_ora_jura LV UUID NLfQT6-36TS-DRHq-PJRf-9UDv-L8mz-HjPea2 LV Write Access read/write LV Status available # open 1 LV Size 5.00 GB Current LE 1280 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:32 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_data VG Name vg_ora_jura LV UUID VtSBIL-XvCw-23xK-NVAH-DvYn-P2sE-OkZJro LV Write Access read/write LV Status available # open 1 LV Size 12.00 GB Current LE 3072 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:40 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_redo VG Name vg_ora_jura LV UUID KRHKBG-71Qv-YBsA-oJDt-igzP-EYaI-gPwcBX LV Write Access read/write LV Status available # open 1 LV Size 2.00 GB Current LE 512 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:48 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mimage_0 VG Name vg_ora_jura LV UUID lQCOAt-aoK3-HBp1-xrQW-eh7L-6t94-CyAg5c LV Write Access read/write LV Status available # open 1 LV Size 5.00 GB Current LE 1280 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:26 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mimage_1 VG Name vg_ora_jura LV UUID snrnPc-8FxY-ekAk-ooNe-sBws-tuI0-cTFfj3 LV Write Access read/write LV Status available # open 1 LV Size 5.00 GB Current LE 1280 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:27 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mlog VG Name vg_ora_jura LV UUID ouqaCQ-Deex-iArv-xLe9-jg8b-5cLf-3SChQ1 LV Write Access read/write LV Status available # open 1 LV Size 4.00 MB Current LE 1 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:25 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_data_mlog VG Name vg_ora_jura LV UUID TmE2S0-r8ST-v624-RxUn-Qppw-2l8p-jM9EC9 LV Write Access read/write LV Status available # open 1 LV Size 4.00 MB Current LE 1 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:37 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_data_mimage_0 VG Name vg_ora_jura LV UUID 8hR0bP-g9mR-OSXS-KdUM-ouZ6-KVdS-sfz51c LV Write Access read/write LV Status available # open 1 LV Size 12.00 GB Current LE 3072 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:38 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_data_mimage_1 VG Name vg_ora_jura LV UUID fzdzrD-7p6d-XFkA-UHyr-CPad-F2nV-6QIU9p LV Write Access read/write LV Status available # open 1 LV Size 12.00 GB Current LE 3072 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:39 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_export_mlog VG Name vg_ora_jura LV UUID 29yLY8-N3Lv-46pN-1jze-50A2-wlhu-quuoMa LV Write Access read/write LV Status available # open 1 LV Size 4.00 MB Current LE 1 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:29 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_export_mimage_0 VG Name vg_ora_jura LV UUID 1uMTsf-wPaQ-ItTy-rpma-m2La-TGZl-C4KIU4 LV Write Access read/write LV Status available # open 1 LV Size 5.00 GB Current LE 1280 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:30 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_export_mimage_1 VG Name vg_ora_jura LV UUID cm8Kn7-knL3-mUPL-XFvU-geMm-Wxff-32x2va LV Write Access read/write LV Status available # open 1 LV Size 5.00 GB Current LE 1280 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:31 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mlog VG Name vg_ora_jura LV UUID 811tNy-eaC5-zfZQ-1QVf-cbYP-1MIM-v6waJF LV Write Access read/write LV Status available # open 1 LV Size 4.00 MB Current LE 1 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:45 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mimage_0 VG Name vg_ora_jura LV UUID aUZAer-f5rl-1f2X-9jgY-f8CJ-jdwe-F5Pmao LV Write Access read/write LV Status available # open 1 LV Size 2.00 GB Current LE 512 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:46 --- Logical volume --- LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mimage_1 VG Name vg_ora_jura LV UUID gAEJym-sSbq-rC4P-AjpI-OibV-k3yI-lDx1I6 LV Write Access read/write LV Status available # open 1 LV Size 2.00 GB Current LE 512 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:47 --- Physical volumes --- PV Name /dev/mapper/mpath-dc1-b PV UUID hgjXU1-2qjo-RsmS-1XJI-d0kZ-oc4A-ZKCza8 PV Status allocatable Total PE / Free PE 6749 / 605 PV Name /dev/mapper/mpath-dc2-b PV UUID hcANwN-aeJT-PIAq-bPsf-9d3e-ylkS-GDjAGR PV Status allocatable Total PE / Free PE 6749 / 605 PV Name /dev/mapper/mpath-dc2-mlog1p1 PV UUID 4l9Qvo-SaAV-Ojlk-D1YB-Tkud-Yjg0-e5RkgJ PV Status allocatable Total PE / Free PE 17 / 13 > On 4/15/10, jose nuno neto <jose.neto@liber4e.com> wrote: >> hellos >> >> I spent more time on this and it seems since LVM cant write to any pv on >> the volumes it has lost, it cannot write the failure of the devices and >> update the metadata on other PVs. So it hangs forever >> >> Is this right? >> >>> GoodMornings >>> >>> This is what I have on multipath.conf >>> >>> blacklist { >>> wwid SSun_VOL0_266DCF4A >>> wwid SSun_VOL0_5875CF4A >>> devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" >>> devnode "^hd[a-z]" >>> } >>> defaults { >>> user_friendly_names yes >>> } >>> devices { >>> device { >>> vendor "HITACHI" >>> product "OPEN-V" >>> path_grouping_policy group_by_node_name >>> failback immediate >>> no_path_retry fail >>> } >>> device { >>> vendor "IET" >>> product "VIRTUAL-DISK" >>> path_checker tur >>> path_grouping_policy failover >>> failback immediate >>> no_path_retry fail >>> } >>> } >>> >>> As an example this is one LUN. It shoes [features=0] so I'd say it >>> should >>> fail right way >>> >>> mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V >>> -SU >>> [size=26G][features=0][hwhandler=0][rw] >>> \_ round-robin 0 [prio=4][active] >>> \_ 5:0:1:0 sdu 65:64 [active][ready] >>> \_ 5:0:1:16384 sdac 65:192 [active][ready] >>> \_ 5:0:1:32768 sdas 66:192 [active][ready] >>> \_ 5:0:1:49152 sdba 67:64 [active][ready] >>> \_ round-robin 0 [prio=4][enabled] >>> \_ 3:0:1:0 sdaw 67:0 [active][ready] >>> \_ 3:0:1:16384 sdbe 67:128 [active][ready] >>> \_ 3:0:1:32768 sdbi 67:192 [active][ready] >>> \_ 3:0:1:49152 sdbm 68:0 [active][ready] >>> >>> It think they fail since I see this messages from LVM: >>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >>> vg_syb_roger-lv_syb_roger_admin >>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices >>> in >>> vg_syb_roger-lv_syb_roger_admin >>> >>> But from some reason LVM cant remove them, any option I should have on >>> lvm.conf? >>> >>> BestRegards >>> Jose >>>> post your multipath.conf file, you may be queuing forever ? >>>> >>>> >>>> >>>> On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote: >>>>> Hi2all >>>>> >>>>> I'm on RHEL 5.4 with >>>>> lvm2-2.02.46-8.el5_4.1 >>>>> 2.6.18-164.2.1.el5 >>>>> >>>>> I have a multipathed SAN connection with what Im builing LVs >>>>> Its a Cluster system, and I want LVs to switch on failure >>>>> >>>>> If I simulate a fail through the OS via >>>>> /sys/bus/scsi/devices/$DEVICE/delete >>>>> I get a LV fail and the service switch to other node >>>>> >>>>> But if I do it "real" portdown on the SAN Switch, multipath reports >>>>> path >>>>> down, but LVM commands hang forever and nothing gets switched >>>>> >>>>> from the logs i see multipath failing paths, and lvm Failed to remove >>>>> faulty >>>>> "devices" >>>>> >>>>> Any ideas how I should "fix" it? >>>>> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has >>>>> failed. >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in >>>>> vg_ora_scapa-lv_ora_scapa_redo >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an >>>>> event. Waiting... >>>>> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>>>> paths: 0 >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>>>> paths: 0 >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>>>> paths: 0 >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>>>> paths: 0 >>>>> >>>>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >>>>> vg_syb_roger-lv_syb_roger_admin >>>>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty >>>>> devices >>>>> in >>>>> vg_syb_roger-lv_syb_roger_admin >>>>> >>>>> Much Thanks >>>>> Jose >>>>> >>>>> _______________________________________________ >>>>> linux-lvm mailing list >>>>> linux-lvm@redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-lvm >>>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >>>> >>>> >>>> _______________________________________________ >>>> linux-lvm mailing list >>>> linux-lvm@redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-lvm >>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >>>> >>> >>> >> >> _______________________________________________ >> linux-lvm mailing list >> linux-lvm@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-lvm >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> > > -- > Sent from my mobile device > > Regards, > Eugene Vilensky > evilensky@gmail.com > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-16 8:55 ` jose nuno neto @ 2010-04-16 20:15 ` Bryan Whitehead 2010-04-17 9:00 ` brem belguebli 1 sibling, 0 replies; 16+ messages in thread From: Bryan Whitehead @ 2010-04-16 20:15 UTC (permalink / raw) To: LVM general discussion and development Can you post the output of multipath when the port is down? On Fri, Apr 16, 2010 at 1:55 AM, jose nuno neto <jose.neto@liber4e.com> wrote: > Hi > > >> Can you show us a pvdisplay or verbose vgdisplay ? >> > > Here goes the vgdisplay -v of one of the vgs with mirrors > > ########################################################### > > --- Volume group --- > �VG Name � � � � � � � vg_ora_jura > �System ID > �Format � � � � � � � �lvm2 > �Metadata Areas � � � �3 > �Metadata Sequence No �705 > �VG Access � � � � � � read/write > �VG Status � � � � � � resizable > �MAX LV � � � � � � � �0 > �Cur LV � � � � � � � �4 > �Open LV � � � � � � � 4 > �Max PV � � � � � � � �0 > �Cur PV � � � � � � � �3 > �Act PV � � � � � � � �3 > �VG Size � � � � � � � 52.79 GB > �PE Size � � � � � � � 4.00 MB > �Total PE � � � � � � �13515 > �Alloc PE / Size � � � 12292 / 48.02 GB > �Free �PE / Size � � � 1223 / 4.78 GB > �VG UUID � � � � � � � nttQ3x-4ecP-Q6ms-jt2u-UIs4-texj-Q9Nxdt > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_arch > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �8oUfYn-2TrP-yS6K-pcS2-cgI4-tcv1-33dSdX > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �5.00 GB > �Current LE � � � � � � 1280 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:28 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_export > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �NLfQT6-36TS-DRHq-PJRf-9UDv-L8mz-HjPea2 > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �5.00 GB > �Current LE � � � � � � 1280 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:32 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_data > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �VtSBIL-XvCw-23xK-NVAH-DvYn-P2sE-OkZJro > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �12.00 GB > �Current LE � � � � � � 3072 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:40 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_redo > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �KRHKBG-71Qv-YBsA-oJDt-igzP-EYaI-gPwcBX > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �2.00 GB > �Current LE � � � � � � 512 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:48 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_arch_mimage_0 > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �lQCOAt-aoK3-HBp1-xrQW-eh7L-6t94-CyAg5c > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �5.00 GB > �Current LE � � � � � � 1280 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:26 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_arch_mimage_1 > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �snrnPc-8FxY-ekAk-ooNe-sBws-tuI0-cTFfj3 > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �5.00 GB > �Current LE � � � � � � 1280 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:27 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_arch_mlog > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �ouqaCQ-Deex-iArv-xLe9-jg8b-5cLf-3SChQ1 > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �4.00 MB > �Current LE � � � � � � 1 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:25 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_data_mlog > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �TmE2S0-r8ST-v624-RxUn-Qppw-2l8p-jM9EC9 > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �4.00 MB > �Current LE � � � � � � 1 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:37 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_data_mimage_0 > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �8hR0bP-g9mR-OSXS-KdUM-ouZ6-KVdS-sfz51c > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �12.00 GB > �Current LE � � � � � � 3072 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:38 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_data_mimage_1 > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �fzdzrD-7p6d-XFkA-UHyr-CPad-F2nV-6QIU9p > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �12.00 GB > �Current LE � � � � � � 3072 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:39 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_export_mlog > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �29yLY8-N3Lv-46pN-1jze-50A2-wlhu-quuoMa > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �4.00 MB > �Current LE � � � � � � 1 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:29 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_export_mimage_0 > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �1uMTsf-wPaQ-ItTy-rpma-m2La-TGZl-C4KIU4 > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �5.00 GB > �Current LE � � � � � � 1280 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:30 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_export_mimage_1 > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �cm8Kn7-knL3-mUPL-XFvU-geMm-Wxff-32x2va > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �5.00 GB > �Current LE � � � � � � 1280 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:31 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_redo_mlog > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �811tNy-eaC5-zfZQ-1QVf-cbYP-1MIM-v6waJF > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �4.00 MB > �Current LE � � � � � � 1 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:45 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_redo_mimage_0 > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �aUZAer-f5rl-1f2X-9jgY-f8CJ-jdwe-F5Pmao > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �2.00 GB > �Current LE � � � � � � 512 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:46 > > �--- Logical volume --- > �LV Name � � � � � � � �/dev/vg_ora_jura/lv_ora_jura_redo_mimage_1 > �VG Name � � � � � � � �vg_ora_jura > �LV UUID � � � � � � � �gAEJym-sSbq-rC4P-AjpI-OibV-k3yI-lDx1I6 > �LV Write Access � � � �read/write > �LV Status � � � � � � �available > �# open � � � � � � � � 1 > �LV Size � � � � � � � �2.00 GB > �Current LE � � � � � � 512 > �Segments � � � � � � � 1 > �Allocation � � � � � � inherit > �Read ahead sectors � � auto > �- currently set to � � 256 > �Block device � � � � � 253:47 > > �--- Physical volumes --- > �PV Name � � � � � � � /dev/mapper/mpath-dc1-b > �PV UUID � � � � � � � hgjXU1-2qjo-RsmS-1XJI-d0kZ-oc4A-ZKCza8 > �PV Status � � � � � � allocatable > �Total PE / Free PE � �6749 / 605 > > �PV Name � � � � � � � /dev/mapper/mpath-dc2-b > �PV UUID � � � � � � � hcANwN-aeJT-PIAq-bPsf-9d3e-ylkS-GDjAGR > �PV Status � � � � � � allocatable > �Total PE / Free PE � �6749 / 605 > > �PV Name � � � � � � � /dev/mapper/mpath-dc2-mlog1p1 > �PV UUID � � � � � � � 4l9Qvo-SaAV-Ojlk-D1YB-Tkud-Yjg0-e5RkgJ > �PV Status � � � � � � allocatable > �Total PE / Free PE � �17 / 13 > > > >> On 4/15/10, jose nuno neto <jose.neto@liber4e.com> wrote: >>> hellos >>> >>> I spent more time on this and it seems since LVM cant write to any pv on >>> the �volumes it has lost, it cannot write the failure of the devices and >>> update the metadata on other PVs. So it hangs forever >>> >>> Is this right? >>> >>>> GoodMornings >>>> >>>> This is what I have on multipath.conf >>>> >>>> blacklist { >>>> � � � � wwid SSun_VOL0_266DCF4A >>>> � � � � wwid SSun_VOL0_5875CF4A >>>> � � � � devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" >>>> � � � � devnode "^hd[a-z]" >>>> } >>>> defaults { >>>> � � � � � � � � user_friendly_names � � � � � � yes >>>> } >>>> devices { >>>> � � � �device { >>>> � � � � � � � � vendor � � � � � � � � � � � � �"HITACHI" >>>> � � � � � � � � product � � � � � � � � � � � � "OPEN-V" >>>> � � � � � � � � path_grouping_policy � � � � � �group_by_node_name >>>> � � � � � � � � failback � � � � � � � � � � � �immediate >>>> � � � � � � � � no_path_retry � � � � � � � � � fail >>>> � � � �} >>>> � � � �device { >>>> � � � � � � � � vendor � � � � � � � � � � � � �"IET" >>>> � � � � � � � � product � � � � � � � � � � � � "VIRTUAL-DISK" >>>> � � � � � � � � path_checker � � � � � � � � � �tur >>>> � � � � � � � � path_grouping_policy � � � � � �failover >>>> � � � � � � � � failback � � � � � � � � � � � �immediate >>>> � � � � � � � � no_path_retry � � � � � � � � � fail >>>> � � � �} >>>> } >>>> >>>> As an example this is one LUN. It shoes [features=0] so I'd say it >>>> should >>>> fail right way >>>> >>>> mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V >>>> -SU >>>> [size=26G][features=0][hwhandler=0][rw] >>>> \_ round-robin 0 [prio=4][active] >>>> �\_ 5:0:1:0 � � sdu �65:64 �[active][ready] >>>> �\_ 5:0:1:16384 sdac 65:192 [active][ready] >>>> �\_ 5:0:1:32768 sdas 66:192 [active][ready] >>>> �\_ 5:0:1:49152 sdba 67:64 �[active][ready] >>>> \_ round-robin 0 [prio=4][enabled] >>>> �\_ 3:0:1:0 � � sdaw 67:0 � [active][ready] >>>> �\_ 3:0:1:16384 sdbe 67:128 [active][ready] >>>> �\_ 3:0:1:32768 sdbi 67:192 [active][ready] >>>> �\_ 3:0:1:49152 sdbm 68:0 � [active][ready] >>>> >>>> It think they fail since I see this messages from LVM: >>>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >>>> vg_syb_roger-lv_syb_roger_admin >>>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices >>>> in >>>> vg_syb_roger-lv_syb_roger_admin >>>> >>>> But from some reason LVM cant remove them, any option I should have on >>>> lvm.conf? >>>> >>>> BestRegards >>>> Jose >>>>> post your multipath.conf file, you may be queuing forever ? >>>>> >>>>> >>>>> >>>>> On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote: >>>>>> Hi2all >>>>>> >>>>>> I'm on RHEL 5.4 with >>>>>> lvm2-2.02.46-8.el5_4.1 >>>>>> 2.6.18-164.2.1.el5 >>>>>> >>>>>> I have a multipathed SAN connection with what Im builing LVs >>>>>> Its a Cluster system, and I want LVs to switch on failure >>>>>> >>>>>> If I simulate a fail through the OS via >>>>>> /sys/bus/scsi/devices/$DEVICE/delete >>>>>> I get a LV fail and the service switch to other node >>>>>> >>>>>> But if I do it "real" portdown on the SAN Switch, multipath reports >>>>>> path >>>>>> down, but LVM commands hang forever and nothing gets switched >>>>>> >>>>>> from the logs i see multipath failing paths, and lvm Failed to remove >>>>>> faulty >>>>>> "devices" >>>>>> >>>>>> Any ideas how I should �"fix" it? >>>>>> >>>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has >>>>>> failed. >>>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in >>>>>> vg_ora_scapa-lv_ora_scapa_redo >>>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an >>>>>> event. �Waiting... >>>>>> >>>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>>>>> paths: 0 >>>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active >>>>>> paths: 0 >>>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>>>>> paths: 0 >>>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active >>>>>> paths: 0 >>>>>> >>>>>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >>>>>> vg_syb_roger-lv_syb_roger_admin >>>>>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty >>>>>> devices >>>>>> in >>>>>> vg_syb_roger-lv_syb_roger_admin >>>>>> >>>>>> Much Thanks >>>>>> Jose >>>>>> >>>>>> _______________________________________________ >>>>>> linux-lvm mailing list >>>>>> linux-lvm@redhat.com >>>>>> https://www.redhat.com/mailman/listinfo/linux-lvm >>>>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >>>>> >>>>> >>>>> _______________________________________________ >>>>> linux-lvm mailing list >>>>> linux-lvm@redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-lvm >>>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >>>>> >>>> >>>> >>> >>> _______________________________________________ >>> linux-lvm mailing list >>> linux-lvm@redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-lvm >>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >>> >> >> -- >> Sent from my mobile device >> >> Regards, >> Eugene Vilensky >> evilensky@gmail.com >> >> _______________________________________________ >> linux-lvm mailing list >> linux-lvm@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-lvm >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-16 8:55 ` jose nuno neto 2010-04-16 20:15 ` Bryan Whitehead @ 2010-04-17 9:00 ` brem belguebli 2010-04-19 9:21 ` jose nuno neto 1 sibling, 1 reply; 16+ messages in thread From: brem belguebli @ 2010-04-17 9:00 UTC (permalink / raw) To: LVM general discussion and development Hi Jose, You have a total of 8 paths per LUN, 4 are marked active thru HBA host5 and the remaining 4 are marked enabled on HBA3 (you're on 2 differnet FABRICS right ?) , this may due to the fact that you use policy group_by_node_name. I don't know if this mode if it actually load balances across the 2 HBA's. When you pull the cable (this is the test that you're doing and that s failling ?) you say it times out forever. As you're in policy group_by_node_name, which corresponds to the fc_transport target node name you should look at the state of the target ports bound to the HBA you disconnected (is it the test you're doing?) (state Blocked ?) /sys/class/fc_remote_ports/rport:H:B-R (where H is your HBA number )forever due to may dev_loss_tmo or fast_io_fail_tmo too high (both timers are located under /sys/class/fc_remote_ports/rport.... I have almost the same setup with almost the same storage (OPEN-V) from a pair of HP XP (OEM'ized Hitachi arrays) and things are setup to use maximum 4 paths per LUN (2 per fabric), some storage experts tend to say it is already too much, and as multipath policy I use multibus to distribute across the 2 fabrics. Hope all this will help you say this happens when you pull the fiber cable from the server On Fri, 2010-04-16 at 08:55 +0000, jose nuno neto wrote: > Hi > > > > Can you show us a pvdisplay or verbose vgdisplay ? > > > > Here goes the vgdisplay -v of one of the vgs with mirrors > > ########################################################### > > --- Volume group --- > VG Name vg_ora_jura > System ID > Format lvm2 > Metadata Areas 3 > Metadata Sequence No 705 > VG Access read/write > VG Status resizable > MAX LV 0 > Cur LV 4 > Open LV 4 > Max PV 0 > Cur PV 3 > Act PV 3 > VG Size 52.79 GB > PE Size 4.00 MB > Total PE 13515 > Alloc PE / Size 12292 / 48.02 GB > Free PE / Size 1223 / 4.78 GB > VG UUID nttQ3x-4ecP-Q6ms-jt2u-UIs4-texj-Q9Nxdt > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_arch > VG Name vg_ora_jura > LV UUID 8oUfYn-2TrP-yS6K-pcS2-cgI4-tcv1-33dSdX > LV Write Access read/write > LV Status available > # open 1 > LV Size 5.00 GB > Current LE 1280 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:28 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_export > VG Name vg_ora_jura > LV UUID NLfQT6-36TS-DRHq-PJRf-9UDv-L8mz-HjPea2 > LV Write Access read/write > LV Status available > # open 1 > LV Size 5.00 GB > Current LE 1280 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:32 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_data > VG Name vg_ora_jura > LV UUID VtSBIL-XvCw-23xK-NVAH-DvYn-P2sE-OkZJro > LV Write Access read/write > LV Status available > # open 1 > LV Size 12.00 GB > Current LE 3072 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:40 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_redo > VG Name vg_ora_jura > LV UUID KRHKBG-71Qv-YBsA-oJDt-igzP-EYaI-gPwcBX > LV Write Access read/write > LV Status available > # open 1 > LV Size 2.00 GB > Current LE 512 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:48 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mimage_0 > VG Name vg_ora_jura > LV UUID lQCOAt-aoK3-HBp1-xrQW-eh7L-6t94-CyAg5c > LV Write Access read/write > LV Status available > # open 1 > LV Size 5.00 GB > Current LE 1280 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:26 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mimage_1 > VG Name vg_ora_jura > LV UUID snrnPc-8FxY-ekAk-ooNe-sBws-tuI0-cTFfj3 > LV Write Access read/write > LV Status available > # open 1 > LV Size 5.00 GB > Current LE 1280 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:27 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mlog > VG Name vg_ora_jura > LV UUID ouqaCQ-Deex-iArv-xLe9-jg8b-5cLf-3SChQ1 > LV Write Access read/write > LV Status available > # open 1 > LV Size 4.00 MB > Current LE 1 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:25 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_data_mlog > VG Name vg_ora_jura > LV UUID TmE2S0-r8ST-v624-RxUn-Qppw-2l8p-jM9EC9 > LV Write Access read/write > LV Status available > # open 1 > LV Size 4.00 MB > Current LE 1 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:37 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_data_mimage_0 > VG Name vg_ora_jura > LV UUID 8hR0bP-g9mR-OSXS-KdUM-ouZ6-KVdS-sfz51c > LV Write Access read/write > LV Status available > # open 1 > LV Size 12.00 GB > Current LE 3072 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:38 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_data_mimage_1 > VG Name vg_ora_jura > LV UUID fzdzrD-7p6d-XFkA-UHyr-CPad-F2nV-6QIU9p > LV Write Access read/write > LV Status available > # open 1 > LV Size 12.00 GB > Current LE 3072 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:39 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_export_mlog > VG Name vg_ora_jura > LV UUID 29yLY8-N3Lv-46pN-1jze-50A2-wlhu-quuoMa > LV Write Access read/write > LV Status available > # open 1 > LV Size 4.00 MB > Current LE 1 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:29 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_export_mimage_0 > VG Name vg_ora_jura > LV UUID 1uMTsf-wPaQ-ItTy-rpma-m2La-TGZl-C4KIU4 > LV Write Access read/write > LV Status available > # open 1 > LV Size 5.00 GB > Current LE 1280 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:30 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_export_mimage_1 > VG Name vg_ora_jura > LV UUID cm8Kn7-knL3-mUPL-XFvU-geMm-Wxff-32x2va > LV Write Access read/write > LV Status available > # open 1 > LV Size 5.00 GB > Current LE 1280 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:31 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mlog > VG Name vg_ora_jura > LV UUID 811tNy-eaC5-zfZQ-1QVf-cbYP-1MIM-v6waJF > LV Write Access read/write > LV Status available > # open 1 > LV Size 4.00 MB > Current LE 1 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:45 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mimage_0 > VG Name vg_ora_jura > LV UUID aUZAer-f5rl-1f2X-9jgY-f8CJ-jdwe-F5Pmao > LV Write Access read/write > LV Status available > # open 1 > LV Size 2.00 GB > Current LE 512 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:46 > > --- Logical volume --- > LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mimage_1 > VG Name vg_ora_jura > LV UUID gAEJym-sSbq-rC4P-AjpI-OibV-k3yI-lDx1I6 > LV Write Access read/write > LV Status available > # open 1 > LV Size 2.00 GB > Current LE 512 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:47 > > --- Physical volumes --- > PV Name /dev/mapper/mpath-dc1-b > PV UUID hgjXU1-2qjo-RsmS-1XJI-d0kZ-oc4A-ZKCza8 > PV Status allocatable > Total PE / Free PE 6749 / 605 > > PV Name /dev/mapper/mpath-dc2-b > PV UUID hcANwN-aeJT-PIAq-bPsf-9d3e-ylkS-GDjAGR > PV Status allocatable > Total PE / Free PE 6749 / 605 > > PV Name /dev/mapper/mpath-dc2-mlog1p1 > PV UUID 4l9Qvo-SaAV-Ojlk-D1YB-Tkud-Yjg0-e5RkgJ > PV Status allocatable > Total PE / Free PE 17 / 13 > > > > > On 4/15/10, jose nuno neto <jose.neto@liber4e.com> wrote: > >> hellos > >> > >> I spent more time on this and it seems since LVM cant write to any pv on > >> the volumes it has lost, it cannot write the failure of the devices and > >> update the metadata on other PVs. So it hangs forever > >> > >> Is this right? > >> > >>> GoodMornings > >>> > >>> This is what I have on multipath.conf > >>> > >>> blacklist { > >>> wwid SSun_VOL0_266DCF4A > >>> wwid SSun_VOL0_5875CF4A > >>> devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" > >>> devnode "^hd[a-z]" > >>> } > >>> defaults { > >>> user_friendly_names yes > >>> } > >>> devices { > >>> device { > >>> vendor "HITACHI" > >>> product "OPEN-V" > >>> path_grouping_policy group_by_node_name > >>> failback immediate > >>> no_path_retry fail > >>> } > >>> device { > >>> vendor "IET" > >>> product "VIRTUAL-DISK" > >>> path_checker tur > >>> path_grouping_policy failover > >>> failback immediate > >>> no_path_retry fail > >>> } > >>> } > >>> > >>> As an example this is one LUN. It shoes [features=0] so I'd say it > >>> should > >>> fail right way > >>> > >>> mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V > >>> -SU > >>> [size=26G][features=0][hwhandler=0][rw] > >>> \_ round-robin 0 [prio=4][active] > >>> \_ 5:0:1:0 sdu 65:64 [active][ready] > >>> \_ 5:0:1:16384 sdac 65:192 [active][ready] > >>> \_ 5:0:1:32768 sdas 66:192 [active][ready] > >>> \_ 5:0:1:49152 sdba 67:64 [active][ready] > >>> \_ round-robin 0 [prio=4][enabled] > >>> \_ 3:0:1:0 sdaw 67:0 [active][ready] > >>> \_ 3:0:1:16384 sdbe 67:128 [active][ready] > >>> \_ 3:0:1:32768 sdbi 67:192 [active][ready] > >>> \_ 3:0:1:49152 sdbm 68:0 [active][ready] > >>> > >>> It think they fail since I see this messages from LVM: > >>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in > >>> vg_syb_roger-lv_syb_roger_admin > >>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty devices > >>> in > >>> vg_syb_roger-lv_syb_roger_admin > >>> > >>> But from some reason LVM cant remove them, any option I should have on > >>> lvm.conf? > >>> > >>> BestRegards > >>> Jose > >>>> post your multipath.conf file, you may be queuing forever ? > >>>> > >>>> > >>>> > >>>> On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote: > >>>>> Hi2all > >>>>> > >>>>> I'm on RHEL 5.4 with > >>>>> lvm2-2.02.46-8.el5_4.1 > >>>>> 2.6.18-164.2.1.el5 > >>>>> > >>>>> I have a multipathed SAN connection with what Im builing LVs > >>>>> Its a Cluster system, and I want LVs to switch on failure > >>>>> > >>>>> If I simulate a fail through the OS via > >>>>> /sys/bus/scsi/devices/$DEVICE/delete > >>>>> I get a LV fail and the service switch to other node > >>>>> > >>>>> But if I do it "real" portdown on the SAN Switch, multipath reports > >>>>> path > >>>>> down, but LVM commands hang forever and nothing gets switched > >>>>> > >>>>> from the logs i see multipath failing paths, and lvm Failed to remove > >>>>> faulty > >>>>> "devices" > >>>>> > >>>>> Any ideas how I should "fix" it? > >>>>> > >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has > >>>>> failed. > >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in > >>>>> vg_ora_scapa-lv_ora_scapa_redo > >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling an > >>>>> event. Waiting... > >>>>> > >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active > >>>>> paths: 0 > >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining active > >>>>> paths: 0 > >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active > >>>>> paths: 0 > >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining active > >>>>> paths: 0 > >>>>> > >>>>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in > >>>>> vg_syb_roger-lv_syb_roger_admin > >>>>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty > >>>>> devices > >>>>> in > >>>>> vg_syb_roger-lv_syb_roger_admin > >>>>> > >>>>> Much Thanks > >>>>> Jose > >>>>> > >>>>> _______________________________________________ > >>>>> linux-lvm mailing list > >>>>> linux-lvm@redhat.com > >>>>> https://www.redhat.com/mailman/listinfo/linux-lvm > >>>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > >>>> > >>>> > >>>> _______________________________________________ > >>>> linux-lvm mailing list > >>>> linux-lvm@redhat.com > >>>> https://www.redhat.com/mailman/listinfo/linux-lvm > >>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > >>>> > >>> > >>> > >> > >> _______________________________________________ > >> linux-lvm mailing list > >> linux-lvm@redhat.com > >> https://www.redhat.com/mailman/listinfo/linux-lvm > >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > >> > > > > -- > > Sent from my mobile device > > > > Regards, > > Eugene Vilensky > > evilensky@gmail.com > > > > _______________________________________________ > > linux-lvm mailing list > > linux-lvm@redhat.com > > https://www.redhat.com/mailman/listinfo/linux-lvm > > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [linux-lvm] Lvm hangs on San fail 2010-04-17 9:00 ` brem belguebli @ 2010-04-19 9:21 ` jose nuno neto 0 siblings, 0 replies; 16+ messages in thread From: jose nuno neto @ 2010-04-19 9:21 UTC (permalink / raw) To: LVM general discussion and development GoodMornings In the meantime we did an upgrade on RHEL to 5.5 and multipath now looks more accurate showing only 1path per HBA. We have a 2datacenter setup with 4Fabrics between them. 2Fabrics for each datacenter. mpath-dc2-a (360060e8004f240000000f24000000502) dm-12 HITACHI,OPEN-V -SU [size=26G][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=1][active] \_ 3:0:1:0 sdg 8:96 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 5:0:1:0 sdo 8:224 [active][ready] I'll repeat the tests and look at the state you're saying I'm using group_by_node_name because before with 8links it was a mess, but it spreads some load between the paths, but not on all of them. anyway that was it the "strange" paths i'll see how it goes now Thanks Jose > Hi Jose, > > You have a total of 8 paths per LUN, 4 are marked active thru HBA host5 > and the remaining 4 are marked enabled on HBA3 (you're on 2 differnet > FABRICS right ?) , this may due to the fact that you use policy > group_by_node_name. I don't know if this mode if it actually load > balances across the 2 HBA's. > > > When you pull the cable (this is the test that you're doing and that s > failling ?) you say it times out forever. > As you're in policy group_by_node_name, which corresponds to the > fc_transport target node name you should look at the state of the target > ports bound to the HBA you disconnected (is it the test you're doing?) > (state Blocked ?) /sys/class/fc_remote_ports/rport:H:B-R (where H is > your HBA number )forever due to may dev_loss_tmo or fast_io_fail_tmo too > high (both timers are located under /sys/class/fc_remote_ports/rport.... > > I have almost the same setup with almost the same storage (OPEN-V) from > a pair of HP XP (OEM'ized Hitachi arrays) and things are setup to use > maximum 4 paths per LUN (2 per fabric), some storage experts tend to say > it is already too much, and as multipath policy I use multibus to > distribute across the 2 fabrics. > > Hope all this will help > > > > > > > > you say this happens when you pull the fiber cable from the server > > On Fri, 2010-04-16 at 08:55 +0000, jose nuno neto wrote: >> Hi >> >> >> > Can you show us a pvdisplay or verbose vgdisplay ? >> > >> >> Here goes the vgdisplay -v of one of the vgs with mirrors >> >> ########################################################### >> >> --- Volume group --- >> VG Name vg_ora_jura >> System ID >> Format lvm2 >> Metadata Areas 3 >> Metadata Sequence No 705 >> VG Access read/write >> VG Status resizable >> MAX LV 0 >> Cur LV 4 >> Open LV 4 >> Max PV 0 >> Cur PV 3 >> Act PV 3 >> VG Size 52.79 GB >> PE Size 4.00 MB >> Total PE 13515 >> Alloc PE / Size 12292 / 48.02 GB >> Free PE / Size 1223 / 4.78 GB >> VG UUID nttQ3x-4ecP-Q6ms-jt2u-UIs4-texj-Q9Nxdt >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_arch >> VG Name vg_ora_jura >> LV UUID 8oUfYn-2TrP-yS6K-pcS2-cgI4-tcv1-33dSdX >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 5.00 GB >> Current LE 1280 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:28 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_export >> VG Name vg_ora_jura >> LV UUID NLfQT6-36TS-DRHq-PJRf-9UDv-L8mz-HjPea2 >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 5.00 GB >> Current LE 1280 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:32 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_data >> VG Name vg_ora_jura >> LV UUID VtSBIL-XvCw-23xK-NVAH-DvYn-P2sE-OkZJro >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 12.00 GB >> Current LE 3072 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:40 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_redo >> VG Name vg_ora_jura >> LV UUID KRHKBG-71Qv-YBsA-oJDt-igzP-EYaI-gPwcBX >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 2.00 GB >> Current LE 512 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:48 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mimage_0 >> VG Name vg_ora_jura >> LV UUID lQCOAt-aoK3-HBp1-xrQW-eh7L-6t94-CyAg5c >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 5.00 GB >> Current LE 1280 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:26 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mimage_1 >> VG Name vg_ora_jura >> LV UUID snrnPc-8FxY-ekAk-ooNe-sBws-tuI0-cTFfj3 >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 5.00 GB >> Current LE 1280 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:27 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mlog >> VG Name vg_ora_jura >> LV UUID ouqaCQ-Deex-iArv-xLe9-jg8b-5cLf-3SChQ1 >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 4.00 MB >> Current LE 1 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:25 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_data_mlog >> VG Name vg_ora_jura >> LV UUID TmE2S0-r8ST-v624-RxUn-Qppw-2l8p-jM9EC9 >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 4.00 MB >> Current LE 1 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:37 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_data_mimage_0 >> VG Name vg_ora_jura >> LV UUID 8hR0bP-g9mR-OSXS-KdUM-ouZ6-KVdS-sfz51c >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 12.00 GB >> Current LE 3072 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:38 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_data_mimage_1 >> VG Name vg_ora_jura >> LV UUID fzdzrD-7p6d-XFkA-UHyr-CPad-F2nV-6QIU9p >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 12.00 GB >> Current LE 3072 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:39 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_export_mlog >> VG Name vg_ora_jura >> LV UUID 29yLY8-N3Lv-46pN-1jze-50A2-wlhu-quuoMa >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 4.00 MB >> Current LE 1 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:29 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_export_mimage_0 >> VG Name vg_ora_jura >> LV UUID 1uMTsf-wPaQ-ItTy-rpma-m2La-TGZl-C4KIU4 >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 5.00 GB >> Current LE 1280 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:30 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_export_mimage_1 >> VG Name vg_ora_jura >> LV UUID cm8Kn7-knL3-mUPL-XFvU-geMm-Wxff-32x2va >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 5.00 GB >> Current LE 1280 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:31 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mlog >> VG Name vg_ora_jura >> LV UUID 811tNy-eaC5-zfZQ-1QVf-cbYP-1MIM-v6waJF >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 4.00 MB >> Current LE 1 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:45 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mimage_0 >> VG Name vg_ora_jura >> LV UUID aUZAer-f5rl-1f2X-9jgY-f8CJ-jdwe-F5Pmao >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 2.00 GB >> Current LE 512 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:46 >> >> --- Logical volume --- >> LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mimage_1 >> VG Name vg_ora_jura >> LV UUID gAEJym-sSbq-rC4P-AjpI-OibV-k3yI-lDx1I6 >> LV Write Access read/write >> LV Status available >> # open 1 >> LV Size 2.00 GB >> Current LE 512 >> Segments 1 >> Allocation inherit >> Read ahead sectors auto >> - currently set to 256 >> Block device 253:47 >> >> --- Physical volumes --- >> PV Name /dev/mapper/mpath-dc1-b >> PV UUID hgjXU1-2qjo-RsmS-1XJI-d0kZ-oc4A-ZKCza8 >> PV Status allocatable >> Total PE / Free PE 6749 / 605 >> >> PV Name /dev/mapper/mpath-dc2-b >> PV UUID hcANwN-aeJT-PIAq-bPsf-9d3e-ylkS-GDjAGR >> PV Status allocatable >> Total PE / Free PE 6749 / 605 >> >> PV Name /dev/mapper/mpath-dc2-mlog1p1 >> PV UUID 4l9Qvo-SaAV-Ojlk-D1YB-Tkud-Yjg0-e5RkgJ >> PV Status allocatable >> Total PE / Free PE 17 / 13 >> >> >> >> > On 4/15/10, jose nuno neto <jose.neto@liber4e.com> wrote: >> >> hellos >> >> >> >> I spent more time on this and it seems since LVM cant write to any pv >> on >> >> the volumes it has lost, it cannot write the failure of the devices >> and >> >> update the metadata on other PVs. So it hangs forever >> >> >> >> Is this right? >> >> >> >>> GoodMornings >> >>> >> >>> This is what I have on multipath.conf >> >>> >> >>> blacklist { >> >>> wwid SSun_VOL0_266DCF4A >> >>> wwid SSun_VOL0_5875CF4A >> >>> devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" >> >>> devnode "^hd[a-z]" >> >>> } >> >>> defaults { >> >>> user_friendly_names yes >> >>> } >> >>> devices { >> >>> device { >> >>> vendor "HITACHI" >> >>> product "OPEN-V" >> >>> path_grouping_policy group_by_node_name >> >>> failback immediate >> >>> no_path_retry fail >> >>> } >> >>> device { >> >>> vendor "IET" >> >>> product "VIRTUAL-DISK" >> >>> path_checker tur >> >>> path_grouping_policy failover >> >>> failback immediate >> >>> no_path_retry fail >> >>> } >> >>> } >> >>> >> >>> As an example this is one LUN. It shoes [features=0] so I'd say it >> >>> should >> >>> fail right way >> >>> >> >>> mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V >> >>> -SU >> >>> [size=26G][features=0][hwhandler=0][rw] >> >>> \_ round-robin 0 [prio=4][active] >> >>> \_ 5:0:1:0 sdu 65:64 [active][ready] >> >>> \_ 5:0:1:16384 sdac 65:192 [active][ready] >> >>> \_ 5:0:1:32768 sdas 66:192 [active][ready] >> >>> \_ 5:0:1:49152 sdba 67:64 [active][ready] >> >>> \_ round-robin 0 [prio=4][enabled] >> >>> \_ 3:0:1:0 sdaw 67:0 [active][ready] >> >>> \_ 3:0:1:16384 sdbe 67:128 [active][ready] >> >>> \_ 3:0:1:32768 sdbi 67:192 [active][ready] >> >>> \_ 3:0:1:49152 sdbm 68:0 [active][ready] >> >>> >> >>> It think they fail since I see this messages from LVM: >> >>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >> >>> vg_syb_roger-lv_syb_roger_admin >> >>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty >> devices >> >>> in >> >>> vg_syb_roger-lv_syb_roger_admin >> >>> >> >>> But from some reason LVM cant remove them, any option I should have >> on >> >>> lvm.conf? >> >>> >> >>> BestRegards >> >>> Jose >> >>>> post your multipath.conf file, you may be queuing forever ? >> >>>> >> >>>> >> >>>> >> >>>> On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote: >> >>>>> Hi2all >> >>>>> >> >>>>> I'm on RHEL 5.4 with >> >>>>> lvm2-2.02.46-8.el5_4.1 >> >>>>> 2.6.18-164.2.1.el5 >> >>>>> >> >>>>> I have a multipathed SAN connection with what Im builing LVs >> >>>>> Its a Cluster system, and I want LVs to switch on failure >> >>>>> >> >>>>> If I simulate a fail through the OS via >> >>>>> /sys/bus/scsi/devices/$DEVICE/delete >> >>>>> I get a LV fail and the service switch to other node >> >>>>> >> >>>>> But if I do it "real" portdown on the SAN Switch, multipath >> reports >> >>>>> path >> >>>>> down, but LVM commands hang forever and nothing gets switched >> >>>>> >> >>>>> from the logs i see multipath failing paths, and lvm Failed to >> remove >> >>>>> faulty >> >>>>> "devices" >> >>>>> >> >>>>> Any ideas how I should "fix" it? >> >>>>> >> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has >> >>>>> failed. >> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in >> >>>>> vg_ora_scapa-lv_ora_scapa_redo >> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling >> an >> >>>>> event. Waiting... >> >>>>> >> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining >> active >> >>>>> paths: 0 >> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining >> active >> >>>>> paths: 0 >> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining >> active >> >>>>> paths: 0 >> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining >> active >> >>>>> paths: 0 >> >>>>> >> >>>>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in >> >>>>> vg_syb_roger-lv_syb_roger_admin >> >>>>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty >> >>>>> devices >> >>>>> in >> >>>>> vg_syb_roger-lv_syb_roger_admin >> >>>>> >> >>>>> Much Thanks >> >>>>> Jose >> >>>>> >> >>>>> _______________________________________________ >> >>>>> linux-lvm mailing list >> >>>>> linux-lvm@redhat.com >> >>>>> https://www.redhat.com/mailman/listinfo/linux-lvm >> >>>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> linux-lvm mailing list >> >>>> linux-lvm@redhat.com >> >>>> https://www.redhat.com/mailman/listinfo/linux-lvm >> >>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> >>>> >> >>> >> >>> >> >> >> >> _______________________________________________ >> >> linux-lvm mailing list >> >> linux-lvm@redhat.com >> >> https://www.redhat.com/mailman/listinfo/linux-lvm >> >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> >> >> > >> > -- >> > Sent from my mobile device >> > >> > Regards, >> > Eugene Vilensky >> > evilensky@gmail.com >> > >> > _______________________________________________ >> > linux-lvm mailing list >> > linux-lvm@redhat.com >> > https://www.redhat.com/mailman/listinfo/linux-lvm >> > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ >> > >> >> _______________________________________________ >> linux-lvm mailing list >> linux-lvm@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-lvm >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-04-19 9:22 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-02-24 16:14 [linux-lvm] Mirror fail/recover test jose nuno neto 2010-02-24 18:55 ` malahal 2010-02-25 10:36 ` jose nuno neto 2010-02-25 16:11 ` malahal 2010-03-02 10:31 ` [linux-lvm] Mirror fail/recover test SOLVED jose nuno neto 2010-04-14 15:03 ` [linux-lvm] Lvm hangs on San fail jose nuno neto 2010-04-14 17:38 ` Eugene Vilensky 2010-04-14 23:02 ` brem belguebli 2010-04-15 8:29 ` jose nuno neto 2010-04-15 9:32 ` Bryan Whitehead 2010-04-15 11:59 ` jose nuno neto 2010-04-15 12:41 ` Eugene Vilensky 2010-04-16 8:55 ` jose nuno neto 2010-04-16 20:15 ` Bryan Whitehead 2010-04-17 9:00 ` brem belguebli 2010-04-19 9:21 ` jose nuno neto
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.