* cache flush timeouts by blk_queue_ordered() @ 2008-11-20 18:19 Bernd Schubert 2009-01-26 19:55 ` Andrew Morton 0 siblings, 1 reply; 4+ messages in thread From: Bernd Schubert @ 2008-11-20 18:19 UTC (permalink / raw) To: linux-scsi, linux-kernel Hello, with some FC hardware-raid units we have the problem that the SYNCHRONIZE_CACHE command reproducibly fails. [658715.827428] sd 6:0:2:2: last recovery: 4311805647, now: 4459793681 [658715.833980] sd 6:0:2:2: [sdk] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK [658715.842288] sd 6:0:2:2: [sdk] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [658715.850954] sd 6:0:2:2: Activating scsi error recovery (1) [658715.856793] sd 6:0:2:2: trying to abort command [658715.861820] qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df2 2002. [658746.004124] sd 6:0:2:2: last recovery: 4459793692, now: 4459801236 [658746.010686] sd 6:0:2:2: [sdk] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK [658746.019004] sd 6:0:2:2: [sdk] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [658746.027680] sd 6:0:2:2: Activating scsi error recovery (2) [658746.033526] sd 6:0:2:2: trying to abort command [658746.038543] qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df4 2002. My guess is that these units flush their cache when this command is send even though they have a battery backup unit and flushing 2GB cache may take some time... Since I can only reproduce it on systems in production I can't do any experiments, but I guess the default timeout of 30s is not sufficient. Problem is now that this timeout cannot be adjusted by the sysfs scsi device timeout, since sd_prepare_flush() doesn't have the required device structure. The reason for that is blk_queue_ordered(). It neither gets a timeout argument, nor any pointer to the device. I already tried to use container_of() in sd_prepare_flush, but somehow that doesn't seem work if the structure member is a pointer. The next solution that comes into my my mind is to add the timeout argument to blk_queue_ordered() and subsequentely to modifiy all callers. Would such a patch be accepted? Or is there any better solution? Any help is appreciated. Thanks, Bernd PS: (I'm also discussing the cache flush issue with one of our hardware vendors, but fixing their firmware might take ages). ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: cache flush timeouts by blk_queue_ordered() 2008-11-20 18:19 cache flush timeouts by blk_queue_ordered() Bernd Schubert @ 2009-01-26 19:55 ` Andrew Morton 2009-01-26 22:39 ` Bernd Schubert 0 siblings, 1 reply; 4+ messages in thread From: Andrew Morton @ 2009-01-26 19:55 UTC (permalink / raw) To: Bernd Schubert; +Cc: linux-scsi, linux-kernel, Jens Axboe On Thu, 20 Nov 2008 19:19:19 +0100 Bernd Schubert <bs@q-leap.de> wrote: > Hello, > > with some FC hardware-raid units we have the problem that the > SYNCHRONIZE_CACHE command reproducibly fails. > > [658715.827428] sd 6:0:2:2: last recovery: 4311805647, now: 4459793681 > [658715.833980] sd 6:0:2:2: [sdk] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK > [658715.842288] sd 6:0:2:2: [sdk] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 > [658715.850954] sd 6:0:2:2: Activating scsi error recovery (1) > [658715.856793] sd 6:0:2:2: trying to abort command > [658715.861820] qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df2 2002. > [658746.004124] sd 6:0:2:2: last recovery: 4459793692, now: 4459801236 > [658746.010686] sd 6:0:2:2: [sdk] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK > [658746.019004] sd 6:0:2:2: [sdk] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 > [658746.027680] sd 6:0:2:2: Activating scsi error recovery (2) > [658746.033526] sd 6:0:2:2: trying to abort command > [658746.038543] qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df4 2002. Does this result in a dead machine, or does the system still struggle along somehow? > My guess is that these units flush their cache when this command is send > even though they have a battery backup unit and flushing 2GB cache may > take some time... Since I can only reproduce it on systems in production > I can't do any experiments, but I guess the default timeout of 30s is not > sufficient. > > Problem is now that this timeout cannot be adjusted by the sysfs scsi device > timeout, since sd_prepare_flush() doesn't have the required device > structure. The reason for that is blk_queue_ordered(). It neither > gets a timeout argument, nor any pointer to the device. > > I already tried to use container_of() in sd_prepare_flush, but somehow > that doesn't seem work if the structure member is a pointer. > > The next solution that comes into my my mind is to add the timeout argument > to blk_queue_ordered() and subsequentely to modifiy all callers. > Would such a patch be accepted? Or is there any better solution? > > Any help is appreciated. Is this problem still open? > > Thanks, > Bernd > > PS: (I'm also discussing the cache flush issue with one of our > hardware vendors, but fixing their firmware might take ages). Making the timeout tunable sounds like a suitable (if minimal) approach. It appears to presently be a qlogic-specific thing? Command_Entry.time_out? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: cache flush timeouts by blk_queue_ordered() 2009-01-26 19:55 ` Andrew Morton @ 2009-01-26 22:39 ` Bernd Schubert 2009-01-26 22:57 ` Andrew Morton 0 siblings, 1 reply; 4+ messages in thread From: Bernd Schubert @ 2009-01-26 22:39 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-scsi, linux-kernel, Jens Axboe Andrew, thanks a lot! I'm really always impressed how you manage to find time to grep through all the threads, even those nobody cared to look at for months. On Mon, Jan 26, 2009 at 11:55:40AM -0800, Andrew Morton wrote: > On Thu, 20 Nov 2008 19:19:19 +0100 > Bernd Schubert <bs@q-leap.de> wrote: > > > Hello, > > > > with some FC hardware-raid units we have the problem that the > > SYNCHRONIZE_CACHE command reproducibly fails. > > > > [658715.827428] sd 6:0:2:2: last recovery: 4311805647, now: 4459793681 > > [658715.833980] sd 6:0:2:2: [sdk] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK > > [658715.842288] sd 6:0:2:2: [sdk] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 > > [658715.850954] sd 6:0:2:2: Activating scsi error recovery (1) > > [658715.856793] sd 6:0:2:2: trying to abort command > > [658715.861820] qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df2 2002. > > [658746.004124] sd 6:0:2:2: last recovery: 4459793692, now: 4459801236 > > [658746.010686] sd 6:0:2:2: [sdk] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK > > [658746.019004] sd 6:0:2:2: [sdk] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 > > [658746.027680] sd 6:0:2:2: Activating scsi error recovery (2) > > [658746.033526] sd 6:0:2:2: trying to abort command > > [658746.038543] qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df4 2002. > > Does this result in a dead machine, or does the system still struggle > along somehow? Hmm, well, the failure that did then follow with that kernel version is more or less my fault. Actually the system does recover after some time, except with the kernel version I had installed there: Its a bit difficult to explain and please skip this part if is too boring. On the other hand, below is the explanation of another scsi-eh bug: The firmware of the same vendor manages to crash sometimes. The units will then still accept basic scsi commands, but everything related to real io will timeout, since it is a bug in their cache logic. So the linux scsi error handler will activate send recovery commands, but nothing related to io, and so it will succeed to 'recover' the device. Normal io will continue, but timeout again... The error recovery loop would last endless. So I introduced additional patches, which only allow a maximum number of errors within a time limit to offline really failed units. Well, in that kernel version, I still had the number of error and the time as fixed parameter, in the mean time it is adjustable by sysfs (I didn't post the patches yet, see below why). Probably the real solution would be to let the error handler to read some bytes from the device after the basic recovery. > > > My guess is that these units flush their cache when this command is send > > even though they have a battery backup unit and flushing 2GB cache may > > take some time... Since I can only reproduce it on systems in production > > I can't do any experiments, but I guess the default timeout of 30s is not > > sufficient. > > > > Problem is now that this timeout cannot be adjusted by the sysfs scsi device > > timeout, since sd_prepare_flush() doesn't have the required device > > structure. The reason for that is blk_queue_ordered(). It neither > > gets a timeout argument, nor any pointer to the device. > > > > I already tried to use container_of() in sd_prepare_flush, but somehow > > that doesn't seem work if the structure member is a pointer. > > > > The next solution that comes into my my mind is to add the timeout argument > > to blk_queue_ordered() and subsequentely to modifiy all callers. > > Would such a patch be accepted? Or is there any better solution? > > > > Any help is appreciated. > > Is this problem still open? I posted a patch, but never got a comment on it. https://kerneltrap.org/mailarchive/linux-scsi/2008/11/26/4242484 > > > > > Thanks, > > Bernd > > > > PS: (I'm also discussing the cache flush issue with one of our > > hardware vendors, but fixing their firmware might take ages). > > Making the timeout tunable sounds like a suitable (if minimal) > approach. It appears to presently be a qlogic-specific thing? > Command_Entry.time_out? I don't think it is qlogic-specific. On the same qlogic controller also more recent Infortrend device are connected (by fc-switch) and with these units the sync-cache timeouts never came up. The issue may be worked around by a higher timeouts (one needs 120s for all Infortrend units anyway) and also by not using raid-6 on these specific units. But except throwing away all these specific units, there it no real solution ;) Thanks again for your help, Bernd ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: cache flush timeouts by blk_queue_ordered() 2009-01-26 22:39 ` Bernd Schubert @ 2009-01-26 22:57 ` Andrew Morton 0 siblings, 0 replies; 4+ messages in thread From: Andrew Morton @ 2009-01-26 22:57 UTC (permalink / raw) To: Bernd Schubert; +Cc: linux-scsi, linux-kernel, jens.axboe On Mon, 26 Jan 2009 23:39:42 +0100 Bernd Schubert <bs@q-leap.de> wrote: > Andrew, thanks a lot! I'm really always impressed how you manage to find time > to grep through all the threads, even those nobody cared to look at for months. > Only a very small subset, alas. > I posted a patch, but never got a comment on it. > > https://kerneltrap.org/mailarchive/linux-scsi/2008/11/26/4242484 Three weeks ago: it's been forgotten about. I suggest that you redo the patch series and resend the uncontroversial bits. Feel free to copy me on them and I'll put them into my auto-resend queue. That eventually causes something to happen. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-01-26 23:09 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-11-20 18:19 cache flush timeouts by blk_queue_ordered() Bernd Schubert 2009-01-26 19:55 ` Andrew Morton 2009-01-26 22:39 ` Bernd Schubert 2009-01-26 22:57 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).