All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [Fwd] further testing w/ multipath ... and bugs
@ 2005-06-24 14:14 James.Smart
  0 siblings, 0 replies; 3+ messages in thread
From: James.Smart @ 2005-06-24 14:14 UTC (permalink / raw)
  To: andmike, christophe.varoqui, andrew.vasquez; +Cc: linux-scsi

> I did not see an answer to this issue. I am also hitting the problem
> (i.e., devices being removed) during some dm-mp port bounce testing.
> Is this the correct behavior going forward for the fc transport?

Current design is when the LLDD tells the transport that the port is
gone, it (and it's subtree) gets deleted.  Right or not ? don't know.
I would assume having the real state, where it does not exist if not
present, is the right overall approach.  Also makes it consistent
with other storage devices that go away when not connected
(usb sticks, etc).

> Also I see a difference in behavior between the lpfc and 
> qla2xxx drivers
> where the lpfc is not removing target even though the 
> "rport-5:0-5: blocked
> FC remote port time out: removing target" message is printed. 
> I guess I
> can look into the difference myself, but I thought Andrew or 
> James S you
> two would know.

I'll have to look into things. Likely this is a reference count issue,
and may be a reference count on a sdev. I doubt lpfc does anything relative
to this scenario, but I'll check. The drivers could also differ on the
return codes returned when the rport is in this transition state, which
may affect the sdev/block device and references.

Keep in mind there are issues in .12 if the target is torn down.
There was some recent patches to correct this.
http://marc.theaimsgroup.com/?l=linux-scsi&m=111845669410785&w=2

-- james s 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Fwd] further testing w/ multipath ... and bugs
  2005-06-13 12:30 Christophe Varoqui
@ 2005-06-23 21:05 ` Mike Anderson
  0 siblings, 0 replies; 3+ messages in thread
From: Mike Anderson @ 2005-06-23 21:05 UTC (permalink / raw)
  To: Christophe Varoqui, Andrew Vasquez, James.Smart; +Cc: linux-scsi

I did not see an answer to this issue. I am also hitting the problem
(i.e., devices being removed) during some dm-mp port bounce testing.
Is this the correct behavior going forward for the fc transport?

Also I see a difference in behavior between the lpfc and qla2xxx drivers
where the lpfc is not removing target even though the "rport-5:0-5: blocked
FC remote port time out: removing target" message is printed. I guess I
can look into the difference myself, but I thought Andrew or James S you
two would know.

Christophe Varoqui [christophe.varoqui@free.fr] wrote:
> I should have posted this here in the first place.
> Seems related to the recent fc_remote_ports and qlogic work.
> 
> Regards,
> cvaroqui
> 
> ----- Forwarded message from Christophe Varoqui <christophe.varoqui@free.fr> -----
> 
> List-Id: device-mapper development <dm-devel.redhat.com>
> 
> Here is an additional one :
> 
> When at the end of the previous scenario, with a dd in D-state, I "dmsetup remove_all" ... it effectively accept to remove the maps. Exec'ing multipath again gives :
> 
>  [<c027506c>] end_that_request_last+0xcc/0x100                                  
>  [<c02b19ed>] scsi_end_request+0x9d/0xe0                                    
>  [<c02b1d45>] scsi_io_completion+0x155/0x500                                  
>  [<c0327643>] ip_rcv+0x3a3/0x560                                            
>  [<c012c1de>] del_timer+0x5e/0x70                                               
>  [<c02cdce4>] sd_rw_intr+0x164/0x320                                         
>  [<c0149531>] mempool_free+0x81/0xa0                                          
>  [<c02c60cd>] qla2x00_process_response_queue+0x14d/0x1d0                    
>  [<c02ac946>] scsi_finish_command+0x96/0xe0                                   
>  [<c033f2e3>] tcp_write_timer+0x73/0xe0                                      
>  [<c02ac836>] scsi_softirq+0xa6/0xe0                                        
>  [<c01285b2>] __do_softirq+0x82/0x100                                      
>  [<c0128665>] do_softirq+0x35/0x40                                         
>  [<c010675b>] do_IRQ+0x3b/0x70                                                
>  [<c0104c1e>] common_interrupt+0x1a/0x20                           
>  [<c0102030>] default_idle+0x0/0x30                                
>  [<c0102053>] default_idle+0x23/0x30                               
>  [<c0102104>] cpu_idle+0x64/0x80                                   
>  [<c0462965>] start_kernel+0x185/0x1d0                             
>  [<c0462370>] unknown_bootoption+0x0/0x1e0                         
> Code: 90 80 3e 00 7e f9 fa eb e8 89 d8 8b 74 24 0c 8b 5c 24 08 83 c4 10 c3 c7 04
>  24 c4 6a 38 c0 8b 44 24 10 89 44 24 04 e8 6d 30 db ff <0f> 0b 95 00 c2 62 38 c0
>  eb bc 8d 76 00 53 83 ec 08 89 c3 fa 81                     
>  <0>Kernel panic - not syncing: Fatal exception in interrupt       
> 
> Regards,
> cvaroqui
> 
> On Mon, Jun 13, 2005 at 10:11:54AM +0200, Christophe Varoqui wrote:
> > Hello,
> > 
> > I'm testing Mike Christie's START_STOP hwhandler and discovered a bunch of new, interesting, phenomenons :
> > 
> > A little context first :
> > o kernel 2.6.12-rc6 + qlogic discovery patch
> > o qla2342 (dual 2GB)
> > o EVA5000, Solaris-tagged connections
> > 
> > Here is a map created by multipath, fresh from boot :
> > 
> > eva1_lun2 (3600508b400014ba7000120000cf00000)
> > [size=50 GB][features="1 queue_if_no_path"][hwhandler="1 hp_sw"]
> > \_ round-robin 0 [active][best]
> >   \_ 0:0:0:2 sdb  8:16    [ready ][active]
> >   \_ 1:0:0:2 sdf  8:80    [ready ][active]
> > \_ round-robin 0 [enabled]
> >   \_ 0:0:1:2 sdd  8:48    [faulty][active]
> >   \_ 1:0:1:2 sdh  8:112   [faulty][active]
> > 
> > Start a background stream read with dd on that map.
> > 
> > Do a port disable on the FC switch port connected to HBA 0
> > Consistently at this moment I get the following in the logs :
> > 
> > qla2300 0000:05:0d.0: LOOP DOWN detected.
> > Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
> > in_atomic():1, irqs_disabled():1
> >  [<c0120a74>] __might_sleep+0xa4/0xc0
> >  [<c026a466>] device_for_each_child+0x26/0x80
> >  [<c02b3180>] target_block+0x0/0x30
> >  [<c02bbdae>] fc_remote_port_block+0x2e/0x60
> >  [<c02bdbf5>] qla2x00_mark_all_devices_lost+0x55/0x60
> >  [<c02c597e>] qla2x00_async_event+0x83e/0xd60
> >  [<c011dd2b>] find_busiest_group+0xbb/0x310
> >  [<c02cdce4>] sd_rw_intr+0x164/0x320
> >  [<c02c4e37>] qla2300_intr_handler+0x77/0x240
> >  [<c0144882>] handle_IRQ_event+0x32/0x70
> >  [<c0144997>] __do_IRQ+0xd7/0x140
> >  [<c0106756>] do_IRQ+0x36/0x70
> >  [<c0104c1e>] common_interrupt+0x1a/0x20
> >  [<c0102030>] default_idle+0x0/0x30
> >  [<c0102053>] default_idle+0x23/0x30
> >  [<c0102104>] cpu_idle+0x64/0x80
> > 
> > If I wait long enough, I then get the following :
> > 
> >  rport-0:0-0: blocked FC remote port time out: removing target
> >  rport-0:0-1: blocked FC remote port time out: removing target
> > 
> > ... which is rather new to me.
> > 
> > As a side effect, all sd associated are removed, uevents are sent signaling the disks have gone. This triggers checker removal on multipathd side in the current implementation.
> > 
> > Then, upon port reenable, sd are registred again with different minor than before. uevent adds get sent, multipath reconfigures the maps and ...
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address 00000000
> >  printing eip:
> > f8b0d29f
> > *pde = 08e4d001
> > Oops: 0000 [#1]
> > SMP
> > Modules linked in: dm_round_robin dm_hp_sw dm_multipath md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc video button battery ac ohci_hcd tg3 floppy dm_mod qla6312
> > CPU:    2
> > EIP:    0060:[<f8b0d29f>]    Not tainted VLI
> > EFLAGS: 00010086   (2.6.12-rc6)
> > EIP is at rr_select_path+0xf/0x60 [dm_round_robin]
> > eax: f6a989cc   ebx: 00000000   ecx: f6a978c0   edx: f7f1e77c
> > esi: f7f1e77c   edi: 00000000   ebp: 00000001   esp: f65d1f00
> > ds: 007b   es: 007b   ss: 0068
> > Process kmpathd/2 (pid: 4564, threadinfo=f65d0000 task=f6708aa0)
> > Stack: f6a989c0 f7f1e740 f8ae3bc2 f7f1e740 f7f1e740 f8ae3c90 f7f1e740 f7f1e740
> >        00000000 f7f1e74c f8ae3f9c 00000286 00000000 f7f1e754 f7f1e740 f7f34100
> >        f7f1e790 00000282 c01339a2 00000000 000f42b4 f6cdfe5c f7f34128 f7f34110
> > Call Trace:
> >  [<f8ae3bc2>] __choose_path_in_pg+0x12/0x40 [dm_multipath]
> >  [<f8ae3c90>] __choose_pgpath+0xa0/0xb0 [dm_multipath]
> >  [<f8ae3f9c>] process_queued_ios+0x7c/0xf0 [dm_multipath]
> >  [<c01339a2>] worker_thread+0x1c2/0x250
> >  [<f8ae3f20>] process_queued_ios+0x0/0xf0 [dm_multipath]
> >  [<c011eaa0>] default_wake_function+0x0/0x10
> >  [<c011eaa0>] default_wake_function+0x0/0x10
> >  [<c01337e0>] worker_thread+0x0/0x250
> >  [<c0137c65>] kthread+0xa5/0xf0
> >  [<c0137bc0>] kthread+0x0/0xf0
> >  [<c0102445>] kernel_thread_helper+0x5/0x10
> > Code: 42 04 89 10 89 58 04 89 03 31 c0 5b c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 83 ec 08 89 74 24 04 89 d6 89 1c 24 8b 58 04 <8b> 03 39 d8 74 30 89 c1 8b 50 04 8b 00 85 c9 89 50 04 89 02 8b
> > 
> > Here dd is now stuck in D-state.
> > 
> > I Will post more as I continue my hammering.
> > 
> > Regards,
> > cvaroqui
> > 
> ----- End forwarded message -----
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Fwd] further testing w/ multipath ... and bugs
@ 2005-06-13 12:30 Christophe Varoqui
  2005-06-23 21:05 ` Mike Anderson
  0 siblings, 1 reply; 3+ messages in thread
From: Christophe Varoqui @ 2005-06-13 12:30 UTC (permalink / raw)
  To: linux-scsi

I should have posted this here in the first place.
Seems related to the recent fc_remote_ports and qlogic work.

Regards,
cvaroqui

----- Forwarded message from Christophe Varoqui <christophe.varoqui@free.fr> -----

List-Id: device-mapper development <dm-devel.redhat.com>

Here is an additional one :

When at the end of the previous scenario, with a dd in D-state, I "dmsetup remove_all" ... it effectively accept to remove the maps. Exec'ing multipath again gives :

 [<c027506c>] end_that_request_last+0xcc/0x100                                  
 [<c02b19ed>] scsi_end_request+0x9d/0xe0                                    
 [<c02b1d45>] scsi_io_completion+0x155/0x500                                  
 [<c0327643>] ip_rcv+0x3a3/0x560                                            
 [<c012c1de>] del_timer+0x5e/0x70                                               
 [<c02cdce4>] sd_rw_intr+0x164/0x320                                         
 [<c0149531>] mempool_free+0x81/0xa0                                          
 [<c02c60cd>] qla2x00_process_response_queue+0x14d/0x1d0                    
 [<c02ac946>] scsi_finish_command+0x96/0xe0                                   
 [<c033f2e3>] tcp_write_timer+0x73/0xe0                                      
 [<c02ac836>] scsi_softirq+0xa6/0xe0                                        
 [<c01285b2>] __do_softirq+0x82/0x100                                      
 [<c0128665>] do_softirq+0x35/0x40                                         
 [<c010675b>] do_IRQ+0x3b/0x70                                                
 [<c0104c1e>] common_interrupt+0x1a/0x20                           
 [<c0102030>] default_idle+0x0/0x30                                
 [<c0102053>] default_idle+0x23/0x30                               
 [<c0102104>] cpu_idle+0x64/0x80                                   
 [<c0462965>] start_kernel+0x185/0x1d0                             
 [<c0462370>] unknown_bootoption+0x0/0x1e0                         
Code: 90 80 3e 00 7e f9 fa eb e8 89 d8 8b 74 24 0c 8b 5c 24 08 83 c4 10 c3 c7 04
 24 c4 6a 38 c0 8b 44 24 10 89 44 24 04 e8 6d 30 db ff <0f> 0b 95 00 c2 62 38 c0
 eb bc 8d 76 00 53 83 ec 08 89 c3 fa 81                     
 <0>Kernel panic - not syncing: Fatal exception in interrupt       

Regards,
cvaroqui

On Mon, Jun 13, 2005 at 10:11:54AM +0200, Christophe Varoqui wrote:
> Hello,
> 
> I'm testing Mike Christie's START_STOP hwhandler and discovered a bunch of new, interesting, phenomenons :
> 
> A little context first :
> o kernel 2.6.12-rc6 + qlogic discovery patch
> o qla2342 (dual 2GB)
> o EVA5000, Solaris-tagged connections
> 
> Here is a map created by multipath, fresh from boot :
> 
> eva1_lun2 (3600508b400014ba7000120000cf00000)
> [size=50 GB][features="1 queue_if_no_path"][hwhandler="1 hp_sw"]
> \_ round-robin 0 [active][best]
>   \_ 0:0:0:2 sdb  8:16    [ready ][active]
>   \_ 1:0:0:2 sdf  8:80    [ready ][active]
> \_ round-robin 0 [enabled]
>   \_ 0:0:1:2 sdd  8:48    [faulty][active]
>   \_ 1:0:1:2 sdh  8:112   [faulty][active]
> 
> Start a background stream read with dd on that map.
> 
> Do a port disable on the FC switch port connected to HBA 0
> Consistently at this moment I get the following in the logs :
> 
> qla2300 0000:05:0d.0: LOOP DOWN detected.
> Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
> in_atomic():1, irqs_disabled():1
>  [<c0120a74>] __might_sleep+0xa4/0xc0
>  [<c026a466>] device_for_each_child+0x26/0x80
>  [<c02b3180>] target_block+0x0/0x30
>  [<c02bbdae>] fc_remote_port_block+0x2e/0x60
>  [<c02bdbf5>] qla2x00_mark_all_devices_lost+0x55/0x60
>  [<c02c597e>] qla2x00_async_event+0x83e/0xd60
>  [<c011dd2b>] find_busiest_group+0xbb/0x310
>  [<c02cdce4>] sd_rw_intr+0x164/0x320
>  [<c02c4e37>] qla2300_intr_handler+0x77/0x240
>  [<c0144882>] handle_IRQ_event+0x32/0x70
>  [<c0144997>] __do_IRQ+0xd7/0x140
>  [<c0106756>] do_IRQ+0x36/0x70
>  [<c0104c1e>] common_interrupt+0x1a/0x20
>  [<c0102030>] default_idle+0x0/0x30
>  [<c0102053>] default_idle+0x23/0x30
>  [<c0102104>] cpu_idle+0x64/0x80
> 
> If I wait long enough, I then get the following :
> 
>  rport-0:0-0: blocked FC remote port time out: removing target
>  rport-0:0-1: blocked FC remote port time out: removing target
> 
> ... which is rather new to me.
> 
> As a side effect, all sd associated are removed, uevents are sent signaling the disks have gone. This triggers checker removal on multipathd side in the current implementation.
> 
> Then, upon port reenable, sd are registred again with different minor than before. uevent adds get sent, multipath reconfigures the maps and ...
> 
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
>  printing eip:
> f8b0d29f
> *pde = 08e4d001
> Oops: 0000 [#1]
> SMP
> Modules linked in: dm_round_robin dm_hp_sw dm_multipath md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc video button battery ac ohci_hcd tg3 floppy dm_mod qla6312
> CPU:    2
> EIP:    0060:[<f8b0d29f>]    Not tainted VLI
> EFLAGS: 00010086   (2.6.12-rc6)
> EIP is at rr_select_path+0xf/0x60 [dm_round_robin]
> eax: f6a989cc   ebx: 00000000   ecx: f6a978c0   edx: f7f1e77c
> esi: f7f1e77c   edi: 00000000   ebp: 00000001   esp: f65d1f00
> ds: 007b   es: 007b   ss: 0068
> Process kmpathd/2 (pid: 4564, threadinfo=f65d0000 task=f6708aa0)
> Stack: f6a989c0 f7f1e740 f8ae3bc2 f7f1e740 f7f1e740 f8ae3c90 f7f1e740 f7f1e740
>        00000000 f7f1e74c f8ae3f9c 00000286 00000000 f7f1e754 f7f1e740 f7f34100
>        f7f1e790 00000282 c01339a2 00000000 000f42b4 f6cdfe5c f7f34128 f7f34110
> Call Trace:
>  [<f8ae3bc2>] __choose_path_in_pg+0x12/0x40 [dm_multipath]
>  [<f8ae3c90>] __choose_pgpath+0xa0/0xb0 [dm_multipath]
>  [<f8ae3f9c>] process_queued_ios+0x7c/0xf0 [dm_multipath]
>  [<c01339a2>] worker_thread+0x1c2/0x250
>  [<f8ae3f20>] process_queued_ios+0x0/0xf0 [dm_multipath]
>  [<c011eaa0>] default_wake_function+0x0/0x10
>  [<c011eaa0>] default_wake_function+0x0/0x10
>  [<c01337e0>] worker_thread+0x0/0x250
>  [<c0137c65>] kthread+0xa5/0xf0
>  [<c0137bc0>] kthread+0x0/0xf0
>  [<c0102445>] kernel_thread_helper+0x5/0x10
> Code: 42 04 89 10 89 58 04 89 03 31 c0 5b c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 83 ec 08 89 74 24 04 89 d6 89 1c 24 8b 58 04 <8b> 03 39 d8 74 30 89 c1 8b 50 04 8b 00 85 c9 89 50 04 89 02 8b
> 
> Here dd is now stuck in D-state.
> 
> I Will post more as I continue my hammering.
> 
> Regards,
> cvaroqui
> 
----- End forwarded message -----

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-06-24 14:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-24 14:14 [Fwd] further testing w/ multipath ... and bugs James.Smart
  -- strict thread matches above, loose matches on Subject: below --
2005-06-13 12:30 Christophe Varoqui
2005-06-23 21:05 ` Mike Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.