Re: Any hope for ide-scsi (error handling)?

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15 18:55 Any hope for ide-scsi (error handling)? Zwane Mwaikambo
@ 2003-03-15  2:05 ` dan carpenter
  2003-03-15 19:53   ` Zwane Mwaikambo
  0 siblings, 1 reply; 16+ messages in thread
From: dan carpenter @ 2003-03-15  2:05 UTC (permalink / raw)
  To: Zwane Mwaikambo, Linux Kernel; +Cc: wrlk

On Saturday 15 March 2003 07:55 pm, Zwane Mwaikambo wrote:
>
> bad: scheduling while atomic!
> Call Trace:
>

   887          spin_lock_irqsave(&ide_lock, flags);
   888          while (HWGROUP(drive)->handler) {
   889                  HWGROUP(drive)->handler = NULL;
   890                  schedule_timeout(1);
   891          }

Here is at least one bad call to schedule() in 
static int idescsi_reset (Scsi_Cmnd *cmd)

regards,
dan




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15 19:53   ` Zwane Mwaikambo
@ 2003-03-15  2:52     ` dan carpenter
  2003-03-15 20:23       ` Zwane Mwaikambo
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: dan carpenter @ 2003-03-15  2:52 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Linux Kernel, wrlk

On Saturday 15 March 2003 08:53 pm, Zwane Mwaikambo wrote:
> On Sat, 15 Mar 2003, dan carpenter wrote:
> > On Saturday 15 March 2003 07:55 pm, Zwane Mwaikambo wrote:
> > > bad: scheduling while atomic!
> > > Call Trace:
> >
> >    887          spin_lock_irqsave(&ide_lock, flags);
> >    888          while (HWGROUP(drive)->handler) {
> >    889                  HWGROUP(drive)->handler = NULL;
> >    890                  schedule_timeout(1);
> >    891          }
> >
> > Here is at least one bad call to schedule() in
> > static int idescsi_reset (Scsi_Cmnd *cmd)
>
> Apart from the schedule with the ide_lock held, what is that code actually
> doing?
>
> 	Zwane

Hm...  Good question.  I have no idea what the while loop is for.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15 21:01         ` Willem Riede
@ 2003-03-15  4:43           ` dan carpenter
  2003-03-15 21:11           ` Zwane Mwaikambo
  2003-03-16  1:23           ` Alan Cox
  2 siblings, 0 replies; 16+ messages in thread
From: dan carpenter @ 2003-03-15  4:43 UTC (permalink / raw)
  To: wrlk, Zwane Mwaikambo; +Cc: Linux Kernel

On Saturday 15 March 2003 10:01 pm, Willem Riede wrote:
> It may not be elegant to schedule(1) with the lock taken, but it
> does work.
>
> However, my latest patch doesn't seem to be applied, since in my
> version I have a set_current_state(TASK_INTERRUPTIBLE); before
> the schedule.

I don't see how it works.

spin_lock_irqsave() increments  preempt_count()
in_atomic checks is defined as:
# define in_atomic()    ((preempt_count() & ~PREEMPT_ACTIVE) != kernel_locked())
kernel_locked() is defined as:
#define kernel_locked()         (current->lock_depth >= 0)

If you call schedule while in_atomic() then it prints out the error 
"bad: scheduling while atomic!\n".  

As far as I can see set_current_state() doesn't affect
preempt_count() or current->lock_depth.  I must be 
missing something...

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Any hope for ide-scsi (error handling)?
@ 2003-03-15 18:55 Zwane Mwaikambo
  2003-03-15  2:05 ` dan carpenter
  0 siblings, 1 reply; 16+ messages in thread
From: Zwane Mwaikambo @ 2003-03-15 18:55 UTC (permalink / raw)
  To: Linux Kernel

I've been trying to upgrade to 2.5 and one of my stumbling blocks is 
cdwriting, i have both a scsi cdwriter and an ide one and haven't managed 
to write with either (SCSI bug report later). I know i shouldn't be using 
ide-scsi but it gets tiring typing since the gui interfaces to 
cdrecord are still playing catchup (sorry for not being hardcore). 
Unfortunately due to the nature of IDE it also completely takes out the 
harddisk attached to the same cable. I've setup a test rig for this so if 
anyone has suggestions i'm up for testing.

Here is a rather long paste of an attempt at writing with ide-scsi. There 
is a deadlock at the end of it all.

kernel is 2.5.64 (Alan should i be using 2.5.64-ac?)

cdrecord      S CB565DC0 4211491704  1310   1250  1311               
(NOTLB)
Call Trace:
 [<c011cfb2>] schedule+0x182/0x320
 [<c0318a81>] sg_ioctl+0xc71/0xe20

(an ls -l i tried on a harddisk on the same channel)

ls            D C02BDC24 4217658044  1313   1180                1249 
(NOTLB)
Call Trace:
 [<c02bdc24>] do_ide_request+0x14/0x20
 [<c027b510>] generic_unplug_device+0xd0/0xe0
 [<c011cfb2>] schedule+0x182/0x320
 [<c011e779>] io_schedule+0x29/0x40
 [<c015c9dd>] __wait_on_buffer+0xad/0xb0

...

hdd: irq timeout: status=0xd0 { Busy }
ide-scsi: abort called for 530
Debug: sleeping function called from illegal context at 
include/asm/semaphore.h:119
Call Trace:
 [<c02dc9ec>] scsi_sleep+0x7c/0xa0
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
 [<c011d143>] schedule+0x313/0x320
 [<c01082fe>] __down+0xfe/0x1d0
 [<c011d1a0>] default_wake_function+0x0/0x20
 [<c010ac1b>] show_trace+0x3b/0x80
 [<c010869b>] __down_failed+0xb/0x14
 [<c02dd450>] .text.lock.scsi_error+0x97/0xc7
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

Debug: sleeping function called from illegal context at 
include/asm/semaphore.h:119
Call Trace:
 [<c02dc9ec>] scsi_sleep+0x7c/0xa0
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
 [<c011d143>] schedule+0x313/0x320
 [<c01082fe>] __down+0xfe/0x1d0
 [<c011d1a0>] default_wake_function+0x0/0x20
 [<c010ac1b>] show_trace+0x3b/0x80
 [<c010869b>] __down_failed+0xb/0x14
 [<c02dd450>] .text.lock.scsi_error+0x97/0xc7
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

Debug: sleeping function called from illegal context at 
include/asm/semaphore.h:119
Call Trace:
 [<c02dc9ec>] scsi_sleep+0x7c/0xa0
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
 [<c011d143>] schedule+0x313/0x320
 [<c01082fe>] __down+0xfe/0x1d0
 [<c011d1a0>] default_wake_function+0x0/0x20
 [<c010ac1b>] show_trace+0x3b/0x80
 [<c010869b>] __down_failed+0xb/0x14
 [<c02dd450>] .text.lock.scsi_error+0x97/0xc7
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

Debug: sleeping function called from illegal context at 
include/asm/semaphore.h:119
Call Trace:
 [<c02dc9ec>] scsi_sleep+0x7c/0xa0
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
 [<c011d143>] schedule+0x313/0x320
 [<c01082fe>] __down+0xfe/0x1d0
 [<c011d1a0>] default_wake_function+0x0/0x20
 [<c010ac1b>] show_trace+0x3b/0x80
 [<c010869b>] __down_failed+0xb/0x14
 [<c02dd450>] .text.lock.scsi_error+0x97/0xc7
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

Debug: sleeping function called from illegal context at 
include/asm/semaphore.h:119
Call Trace:
 [<c02dc9ec>] scsi_sleep+0x7c/0xa0
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
 [<c011d143>] schedule+0x313/0x320
 [<c01082fe>] __down+0xfe/0x1d0
 [<c011d1a0>] default_wake_function+0x0/0x20
 [<c010ac1b>] show_trace+0x3b/0x80
 [<c010869b>] __down_failed+0xb/0x14
 [<c02dd450>] .text.lock.scsi_error+0x97/0xc7
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

Debug: sleeping function called from illegal context at 
include/asm/semaphore.h:119
Call Trace:
 [<c02dc9ec>] scsi_sleep+0x7c/0xa0
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
 [<c011d143>] schedule+0x313/0x320
 [<c01082fe>] __down+0xfe/0x1d0
 [<c011d1a0>] default_wake_function+0x0/0x20
 [<c010ac1b>] show_trace+0x3b/0x80
 [<c010869b>] __down_failed+0xb/0x14
 [<c02dd450>] .text.lock.scsi_error+0x97/0xc7
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

Debug: sleeping function called from illegal context at 
include/asm/semaphore.h:119
Call Trace:
 [<c02dc9ec>] scsi_sleep+0x7c/0xa0
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
 [<c011d143>] schedule+0x313/0x320
 [<c01082fe>] __down+0xfe/0x1d0
 [<c011d1a0>] default_wake_function+0x0/0x20
 [<c010ac1b>] show_trace+0x3b/0x80
 [<c010869b>] __down_failed+0xb/0x14
 [<c02dd450>] .text.lock.scsi_error+0x97/0xc7
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

Debug: sleeping function called from illegal context at 
include/asm/semaphore.h:119
Call Trace:
 [<c02dc9ec>] scsi_sleep+0x7c/0xa0
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

bad: scheduling while atomic!
Call Trace:
 [<c011d143>] schedule+0x313/0x320
 [<c01082fe>] __down+0xfe/0x1d0
 [<c011d1a0>] default_wake_function+0x0/0x20
 [<c010ac1b>] show_trace+0x3b/0x80
 [<c010869b>] __down_failed+0xb/0x14
 [<c02dd450>] .text.lock.scsi_error+0x97/0xc7
 [<c02dc950>] scsi_sleep_done+0x0/0x20
 [<c03127fc>] idescsi_abort+0x1bc/0x1d0
 [<c02dc1ba>] scsi_try_to_abort_cmd+0x9a/0xf0
 [<c02dc30e>] scsi_eh_abort_cmds+0x3e/0x80
 [<c02dcef3>] scsi_unjam_host+0xf3/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

ide-scsi: reset called for 530
bad: scheduling while atomic!
Call Trace:
 [<c011d143>] schedule+0x313/0x320
 [<c012bd84>] schedule_timeout+0x64/0xb0
 [<c012bd10>] process_timeout+0x0/0x10
 [<c0312973>] idescsi_reset+0x163/0x180
 [<c02dc3da>] scsi_try_bus_device_reset+0x8a/0x100
 [<c02dc499>] scsi_eh_bus_device_reset+0x49/0xc0
 [<c02dcd47>] scsi_eh_ready_devs+0x17/0x60
 [<c02dcf07>] scsi_unjam_host+0x107/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

NMI Watchdog detected LOCKUP on CPU0, eip c02be73d, registers:
CPU:    0
EIP:    0060:[<c02be73d>]    Tainted: PF 
EFLAGS: 00000086
EIP is at .text.lock.ide_io+0x40/0x93
eax: d1e6cdcc   ebx: c1616000   ecx: 00000106   edx: 00000001
esi: c12d7500   edi: d1e6cdcc   ebp: c1617e08   esp: c1617de0
ds: 007b   es: 007b   ss: 0068
Process scsi_eh_1 (pid: 12, threadinfo=c1616000 task=c1760080)
Stack: c05c3160 c12d7500 00000282 c1617e08 c03557e1 c05c3160 0018e20c d1e6cdf0 
       c12d7500 c02bdcc0 c1617e30 c012c272 d1e6cdcc d1e6cdcc fffffffd c1617e50 
       00000000 00000001 c0561ee8 fffffffd c1617e50 c0127698 c12d7500 00000000 
Call Trace:
 [<c03557e1>] i8042_timer_func+0x21/0x30
 [<c02bdcc0>] ide_timer_expiry+0x0/0x310
 [<c012c272>] __run_timers+0xd2/0x1f7
 [<c0127698>] do_softirq+0xc8/0xd0
 [<c0117307>] smp_apic_timer_interrupt+0x77/0x80
 [<c010a9ba>] apic_timer_interrupt+0x1a/0x20
 [<c011d036>] schedule+0x206/0x320
 [<c012bd84>] schedule_timeout+0x64/0xb0
 [<c012bd10>] process_timeout+0x0/0x10
 [<c0312973>] idescsi_reset+0x163/0x180
 [<c02dc3da>] scsi_try_bus_device_reset+0x8a/0x100
 [<c02dc499>] scsi_eh_bus_device_reset+0x49/0xc0
 [<c02dcd47>] scsi_eh_ready_devs+0x17/0x60
 [<c02dcf07>] scsi_unjam_host+0x107/0x120
 [<c02dd026>] scsi_error_handler+0x106/0x140
 [<c02dcf20>] scsi_error_handler+0x0/0x140
 [<c0107435>] kernel_thread_helper+0x5/0x10

Code: f3 90 80 3d e0 20 56 c0 00 7e f5 e9 b7 f5 ff ff f3 90 80 3d 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15  2:05 ` dan carpenter
@ 2003-03-15 19:53   ` Zwane Mwaikambo
  2003-03-15  2:52     ` dan carpenter
  0 siblings, 1 reply; 16+ messages in thread
From: Zwane Mwaikambo @ 2003-03-15 19:53 UTC (permalink / raw)
  To: dan carpenter; +Cc: Linux Kernel, wrlk

On Sat, 15 Mar 2003, dan carpenter wrote:

> On Saturday 15 March 2003 07:55 pm, Zwane Mwaikambo wrote:
> >
> > bad: scheduling while atomic!
> > Call Trace:
> >
> 
>    887          spin_lock_irqsave(&ide_lock, flags);
>    888          while (HWGROUP(drive)->handler) {
>    889                  HWGROUP(drive)->handler = NULL;
>    890                  schedule_timeout(1);
>    891          }
> 
> Here is at least one bad call to schedule() in 
> static int idescsi_reset (Scsi_Cmnd *cmd)

Apart from the schedule with the ide_lock held, what is that code actually 
doing?

	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15  2:52     ` dan carpenter
@ 2003-03-15 20:23       ` Zwane Mwaikambo
  2003-03-15 21:01         ` Willem Riede
  2003-03-15 20:25       ` bert hubert
  2003-03-16  1:05       ` Alan Cox
  2 siblings, 1 reply; 16+ messages in thread
From: Zwane Mwaikambo @ 2003-03-15 20:23 UTC (permalink / raw)
  To: dan carpenter; +Cc: Linux Kernel, wrlk

On Sat, 15 Mar 2003, dan carpenter wrote:

> > Apart from the schedule with the ide_lock held, what is that code actually
> > doing?
> >
> > 	Zwane
> 
> Hm...  Good question.  I have no idea what the while loop is for.

I suppose the magik is in the comments;

/* first null the handler for the drive and let any process
 * doing IO (on another CPU) run to (partial) completion
 * the lock prevents processing new requests */


-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15  2:52     ` dan carpenter
  2003-03-15 20:23       ` Zwane Mwaikambo
@ 2003-03-15 20:25       ` bert hubert
  2003-03-15 20:41         ` Zwane Mwaikambo
  2003-03-16  1:05       ` Alan Cox
  2 siblings, 1 reply; 16+ messages in thread
From: bert hubert @ 2003-03-15 20:25 UTC (permalink / raw)
  To: dan carpenter; +Cc: Zwane Mwaikambo, Linux Kernel, wrlk

On Sat, Mar 15, 2003 at 03:52:21AM +0100, dan carpenter wrote:
> > >
> > >    887          spin_lock_irqsave(&ide_lock, flags);
> > >    888          while (HWGROUP(drive)->handler) {
> > >    889                  HWGROUP(drive)->handler = NULL;
> > >    890                  schedule_timeout(1);
> > >    891          }
> > >
> > > Here is at least one bad call to schedule() in
> > > static int idescsi_reset (Scsi_Cmnd *cmd)
> >
> > Apart from the schedule with the ide_lock held, what is that code actually
> > doing?
> >
> > 	Zwane
> 
> Hm...  Good question.  I have no idea what the while loop is for.

A construct like this was suggested for use in swsusp too to make sure that
only *one* request is outstanding for a controler. This was also mentioned
to be the unclean way and that there are taskfile interfaces which are
cleaner.

Bert.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO
http://netherlabs.nl                         Consulting

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15 20:25       ` bert hubert
@ 2003-03-15 20:41         ` Zwane Mwaikambo
  2003-03-15 20:55           ` Zwane Mwaikambo
  0 siblings, 1 reply; 16+ messages in thread
From: Zwane Mwaikambo @ 2003-03-15 20:41 UTC (permalink / raw)
  To: bert hubert; +Cc: dan carpenter, Linux Kernel, wrlk, Pavel Machek

On Sat, 15 Mar 2003, bert hubert wrote:

> A construct like this was suggested for use in swsusp too to make sure that
> only *one* request is outstanding for a controler. This was also mentioned
> to be the unclean way and that there are taskfile interfaces which are
> cleaner.

hmm..

cpu0:

do_IRQ();
handle_IRQ_event();
...
static ide_startstop_t read_intr (ide_drive_t *drive)
{
	msect = drive->mult_count; /* let msect = 0 */
...
	
	if (i > 0) {
		if (msect)
			goto read_next;
		if (HWGROUP(drive)->handler != NULL) <- [1]
			BUG();
		ide_set_handler(drive, &read_intr, WAIT_CMD, NULL); <- [2]
                return ide_started;
	}

}

cpu1:
swsusp code;
...
	spin_lock_irqsave(&ide_lock, flags);
	while (HWGROUP(drive)->handler) {
		HWGROUP(drive)->handler = NULL; <- [1]
		schedule_timeout(1);
	}
	<- [2]
...

Ok so at event [1] we have a ->handler set on cpu0 so we pass that check. 
Then cpu1 acquires ide_lock and NULLs it out. cpu0 blocks on the lock in 
ide_set_handler and when cpu1 releases it it re-assigns ->handler. What 
happens then?

	Zwane


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15 20:41         ` Zwane Mwaikambo
@ 2003-03-15 20:55           ` Zwane Mwaikambo
  0 siblings, 0 replies; 16+ messages in thread
From: Zwane Mwaikambo @ 2003-03-15 20:55 UTC (permalink / raw)
  To: bert hubert; +Cc: dan carpenter, Linux Kernel, wrlk, Pavel Machek

On Sat, 15 Mar 2003, Zwane Mwaikambo wrote:

> On Sat, 15 Mar 2003, bert hubert wrote:
> 
> > A construct like this was suggested for use in swsusp too to make sure that
> > only *one* request is outstanding for a controler. This was also mentioned
> > to be the unclean way and that there are taskfile interfaces which are
> > cleaner.

[snip]

> Ok so at event [1] we have a ->handler set on cpu0 so we pass that check. 
> Then cpu1 acquires ide_lock and NULLs it out. cpu0 blocks on the lock in 
> ide_set_handler and when cpu1 releases it it re-assigns ->handler. What 
> happens then?

Skip that, i forgot swsusp doesn't do SMP.

	Zwane


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15 20:23       ` Zwane Mwaikambo
@ 2003-03-15 21:01         ` Willem Riede
  2003-03-15  4:43           ` dan carpenter
                             ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Willem Riede @ 2003-03-15 21:01 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: dan carpenter, Linux Kernel

On 2003.03.15 15:23 Zwane Mwaikambo wrote:
> On Sat, 15 Mar 2003, dan carpenter wrote:
> 
> > > Apart from the schedule with the ide_lock held, what is that code actually
> > > doing?
> > >
> > > 	Zwane
> > 
> > Hm...  Good question.  I have no idea what the while loop is for.
> 
> I suppose the magik is in the comments;
> 
> /* first null the handler for the drive and let any process
>  * doing IO (on another CPU) run to (partial) completion
>  * the lock prevents processing new requests */
> 
Indeed. If you get there, the command in progress is hung.
To be able to restart the device, the old command needs to be
aborted. But that is an inherently racy undertaking. 

Nominally, I just want to set HWGROUP(drive)->handler = NULL.
But there is a small chance, that there is actually (interrupt) 
activity going on for the command, which would result in a new 
entry in HWGROUP(drive)->handler popping up after it is cleared.

The loop as programmed significantly increases the odds that 
the old command is really aborted. 

It may not be elegant to schedule(1) with the lock taken, but it
does work.

However, my latest patch doesn't seem to be applied, since in my
version I have a set_current_state(TASK_INTERRUPTIBLE); before 
the schedule.

Regards, Willem Riede.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15 21:01         ` Willem Riede
  2003-03-15  4:43           ` dan carpenter
@ 2003-03-15 21:11           ` Zwane Mwaikambo
  2003-03-15 22:29             ` Willem Riede
  2003-03-16  1:23           ` Alan Cox
  2 siblings, 1 reply; 16+ messages in thread
From: Zwane Mwaikambo @ 2003-03-15 21:11 UTC (permalink / raw)
  To: Willem Riede; +Cc: dan carpenter, Linux Kernel

On Sat, 15 Mar 2003, Willem Riede wrote:

> Indeed. If you get there, the command in progress is hung.
> To be able to restart the device, the old command needs to be
> aborted. But that is an inherently racy undertaking. 
> 
> Nominally, I just want to set HWGROUP(drive)->handler = NULL.
> But there is a small chance, that there is actually (interrupt) 
> activity going on for the command, which would result in a new 
> entry in HWGROUP(drive)->handler popping up after it is cleared.
> 
> The loop as programmed significantly increases the odds that 
> the old command is really aborted. 
> 
> It may not be elegant to schedule(1) with the lock taken, but it
> does work.
> 
> However, my latest patch doesn't seem to be applied, since in my
> version I have a set_current_state(TASK_INTERRUPTIBLE); before 
> the schedule.

Yeah but what happens when a task tries to acquire ide_lock? Incidentally 
this one crept in via timer softirq so if ide_timer_expiry gets there 
before your scheduled timeout timer?

NMI Watchdog detected LOCKUP on CPU0, eip c02be73d, registers:
CPU:    0
EIP:    0060:[<c02be73d>]    Tainted: PF 
EFLAGS: 00000086
EIP is at .text.lock.ide_io+0x40/0x93

Call Trace:
 [<c03557e1>] i8042_timer_func+0x21/0x30
 [<c02bdcc0>] ide_timer_expiry+0x0/0x310

-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15 21:11           ` Zwane Mwaikambo
@ 2003-03-15 22:29             ` Willem Riede
  0 siblings, 0 replies; 16+ messages in thread
From: Willem Riede @ 2003-03-15 22:29 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: dan carpenter, Linux Kernel

On 2003.03.15 16:11 Zwane Mwaikambo wrote:
> On Sat, 15 Mar 2003, Willem Riede wrote:
> 
> > It may not be elegant to schedule(1) with the lock taken, but it
> > does work.
> > 
> > However, my latest patch doesn't seem to be applied, since in my
> > version I have a set_current_state(TASK_INTERRUPTIBLE); before 
> > the schedule.
> 
> Yeah but what happens when a task tries to acquire ide_lock? 

It gets to wait until I'm done.

Regards, Willem Riede.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-16  1:23           ` Alan Cox
@ 2003-03-16  0:55             ` Willem Riede
  2003-03-16 21:41               ` Alan Cox
  0 siblings, 1 reply; 16+ messages in thread
From: Willem Riede @ 2003-03-16  0:55 UTC (permalink / raw)
  To: Alan Cox; +Cc: Zwane Mwaikambo, dan carpenter, Linux Kernel Mailing List

On 2003.03.15 20:23 Alan Cox wrote:
> On Sat, 2003-03-15 at 21:01, Willem Riede wrote:
> > It may not be elegant to schedule(1) with the lock taken, but it
> > does work.
> 
> You can't sleep holding a lock. You also can't null the hwgroup that
> way and you have to deal with some other locking concerns. Have a look
> how HDIO_*_RESET is handled in the very latest 2.4/2.5-ac code and
> you should be able to follow from that. Note your code paths will be
> much like the ioctl as the existing reset code paths are for paths
> where we abort from a known safe state (drive initiated or locked).
> 
> With the stuff there now you should be able to abort the command
> fairly cleanly and then reset the interface.
> 
I hear you, and I will take a hard look at that code. But please
realize, that if we get to this code segment, the drive has _not_ 
responded to the regular command and is in an _unknown_ state.

Thanks, Willem Riede.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15  2:52     ` dan carpenter
  2003-03-15 20:23       ` Zwane Mwaikambo
  2003-03-15 20:25       ` bert hubert
@ 2003-03-16  1:05       ` Alan Cox
  2 siblings, 0 replies; 16+ messages in thread
From: Alan Cox @ 2003-03-16  1:05 UTC (permalink / raw)
  To: dan carpenter; +Cc: Zwane Mwaikambo, Linux Kernel Mailing List, wrlk

On Sat, 2003-03-15 at 02:52, dan carpenter wrote:
> > > Here is at least one bad call to schedule() in
> > > static int idescsi_reset (Scsi_Cmnd *cmd)
> >
> > Apart from the schedule with the ide_lock held, what is that code actually
> > doing?
> >
> > 	Zwane
> 
> Hm...  Good question.  I have no idea what the while loop is for.

See 2.4.21-pre5-ac2 or later. You are discussing obsolete and known
broken code otherwise. The -ac code is closer to working and has most
but alas it seems not all the reset stuff fixed. While this works for
2.4 the 2.5 ide-scsi error handling has been rewritten by someone for
the scsi changes and so differs from both the 2.4 style scsi error
handling and reality by some margin. Once the reset code settles down
in 2.4.21-pre5-ac and 2.5.64-ac I'll take a crack at ide-scsi again.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-15 21:01         ` Willem Riede
  2003-03-15  4:43           ` dan carpenter
  2003-03-15 21:11           ` Zwane Mwaikambo
@ 2003-03-16  1:23           ` Alan Cox
  2003-03-16  0:55             ` Willem Riede
  2 siblings, 1 reply; 16+ messages in thread
From: Alan Cox @ 2003-03-16  1:23 UTC (permalink / raw)
  To: wrlk; +Cc: Zwane Mwaikambo, dan carpenter, Linux Kernel Mailing List

On Sat, 2003-03-15 at 21:01, Willem Riede wrote:
> It may not be elegant to schedule(1) with the lock taken, but it
> does work.

You can't sleep holding a lock. You also can't null the hwgroup that
way and you have to deal with some other locking concerns. Have a look
how HDIO_*_RESET is handled in the very latest 2.4/2.5-ac code and
you should be able to follow from that. Note your code paths will be
much like the ioctl as the existing reset code paths are for paths
where we abort from a known safe state (drive initiated or locked).

With the stuff there now you should be able to abort the command
fairly cleanly and then reset the interface.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Any hope for ide-scsi (error handling)?
  2003-03-16  0:55             ` Willem Riede
@ 2003-03-16 21:41               ` Alan Cox
  0 siblings, 0 replies; 16+ messages in thread
From: Alan Cox @ 2003-03-16 21:41 UTC (permalink / raw)
  To: wrlk; +Cc: Zwane Mwaikambo, dan carpenter, Linux Kernel Mailing List

On Sun, 2003-03-16 at 00:55, Willem Riede wrote:
> I hear you, and I will take a hard look at that code. But please
> realize, that if we get to this code segment, the drive has _not_ 
> responded to the regular command and is in an _unknown_ state.

Indeed. The error/abort distinction is more about locking/recovery
than drive level semantics. We will still issue an ATAPI reset and
if need be fall back to an ATA reset. What we handle differently is
the assumption that the drive status registers are valid (they are
not), that we are in an interrupt handle an IDE phase (we are not)
and that we want to recovery/reissue or issue the next command
from this context.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-03-16 20:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-15 18:55 Any hope for ide-scsi (error handling)? Zwane Mwaikambo
2003-03-15  2:05 ` dan carpenter
2003-03-15 19:53   ` Zwane Mwaikambo
2003-03-15  2:52     ` dan carpenter
2003-03-15 20:23       ` Zwane Mwaikambo
2003-03-15 21:01         ` Willem Riede
2003-03-15  4:43           ` dan carpenter
2003-03-15 21:11           ` Zwane Mwaikambo
2003-03-15 22:29             ` Willem Riede
2003-03-16  1:23           ` Alan Cox
2003-03-16  0:55             ` Willem Riede
2003-03-16 21:41               ` Alan Cox
2003-03-15 20:25       ` bert hubert
2003-03-15 20:41         ` Zwane Mwaikambo
2003-03-15 20:55           ` Zwane Mwaikambo
2003-03-16  1:05       ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).