All of lore.kernel.org
 help / color / mirror / Atom feed
* Investigating potential flaw in scsi error handling
@ 2008-02-09 21:59 Elias Oltmanns
  2008-02-09 23:30 ` James Bottomley
  0 siblings, 1 reply; 8+ messages in thread
From: Elias Oltmanns @ 2008-02-09 21:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: Tejun Heo

Hi there,

I'm experiencing system lockups with 2.6.24 which I believe to be
related to scsi error handling. Actually, I have patched the mainline
kernel with a disk shock protection patch [1] and in my case it is indeed
the shock protection mechanism that triggers the lockups. However, some
rather lengthy investigations have lead me to the conclusion that this
additional patch is just the means to reproduce the error condition
fairly reliably rather than the origin of the problem.

The problem has only become apparent since Tejun's commit
31cc23b34913bc173680bdc87af79e551bf8cc0d. More precisely, libata now
sets max_host_blocked and max_device_blocked to 1 for all ATA devices.
Various tests I've conducted so far have lead me to the conclusion that
a non zero return code from scsi_dispatch_command is sufficient to
trigger the problem I'm seeing provided that max_host_blocked and
max_device_blocked are set to 1.

Unfortunately, I'm a bit at a loss as to how I should proceed to find
the culprit. I can reliably reproduce the problem using the disk shock
protection patch in order to cause non zero return values from
scsi_dispatch_command. How can I find out where in the error handling of
this condition things might go wrong?

Most likely you will need further information to help me solving this
issue but perhaps you can already come up with some suggestions and tell
me what else you'd like to know.

Thanks in advance,

Elias

[1] http://article.gmane.org/gmane.linux.drivers.hdaps.devel/1094


PS: Since the disk shock protection patch is mainly concerned with an
ATA specific feature, I'm currently working on it to implement it in
libata rather than in the scsi midlayer. This doesn't change anything
with regard to the problem I've described above but has confirmed my
suspicion that it must be the return code from scsi_dispatch_command
that triggers system freeze.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-02-10 16:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-09 21:59 Investigating potential flaw in scsi error handling Elias Oltmanns
2008-02-09 23:30 ` James Bottomley
2008-02-10 12:54   ` Elias Oltmanns
2008-02-10 13:02     ` [PATCH] Make sure that scsi_request_fn() isn't called recursively forever Elias Oltmanns
2008-02-10 14:22     ` Investigating potential flaw in scsi error handling James Bottomley
2008-02-10 15:29       ` Elias Oltmanns
2008-02-10 15:44         ` James Bottomley
2008-02-10 16:04           ` Elias Oltmanns

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.