linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Is BIO_RW_FAILFAST really usable?
@ 2007-12-04  2:46 Neil Brown
  2007-12-04  3:51 ` Jeff Garzik
  2007-12-04  9:13 ` Jens Axboe
  0 siblings, 2 replies; 5+ messages in thread
From: Neil Brown @ 2007-12-04  2:46 UTC (permalink / raw)
  To: linux-kernel, Jens Axboe


I've been looking at use BIO_RW_FAILFAST in md/raid to improve
handling of some error cases.

This is particularly significant for the DASD driver (s390 specific).
I believe it uses optic fibre to connect to the drives.  When one of
these paths is unplugged, IO requests will block until an operator
runs a command to reset the card (or until it is plugged back in).
The only way to avoid this blockage is to use BIO_RW_FAILFAST.  So
we really need BIO_RW_FAILFAST for a reliable RAID1 configuration on
DASD drives.

However, I just tested BIO_RW_FAILFAST on my SATA drives: controller 

02:06.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)

(not using the cards minimal RAID functionality) and requests fail
immediately and always with e.g.

sd 2:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdc, sector 2048

So fail fast obviously isn't generally usable.

What is the answer here?  Is the Silicon Image driver doing the wrong
thing, or is DASD doing the wrong thing, or is BIO_RW_FAILFAST
under-specified and we really need multiple flags or what?

Any ideas?

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is BIO_RW_FAILFAST really usable?
  2007-12-04  2:46 Is BIO_RW_FAILFAST really usable? Neil Brown
@ 2007-12-04  3:51 ` Jeff Garzik
  2007-12-04  4:19   ` Andrey Borzenkov
  2007-12-04  9:13 ` Jens Axboe
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff Garzik @ 2007-12-04  3:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, Jens Axboe, IDE/ATA development list

Neil Brown wrote:
> I've been looking at use BIO_RW_FAILFAST in md/raid to improve
> handling of some error cases.
> 
> This is particularly significant for the DASD driver (s390 specific).
> I believe it uses optic fibre to connect to the drives.  When one of
> these paths is unplugged, IO requests will block until an operator
> runs a command to reset the card (or until it is plugged back in).
> The only way to avoid this blockage is to use BIO_RW_FAILFAST.  So
> we really need BIO_RW_FAILFAST for a reliable RAID1 configuration on
> DASD drives.
> 
> However, I just tested BIO_RW_FAILFAST on my SATA drives: controller 
> 
> 02:06.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
> 
> (not using the cards minimal RAID functionality) and requests fail
> immediately and always with e.g.
> 
> sd 2:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdc, sector 2048
> 
> So fail fast obviously isn't generally usable.
> 
> What is the answer here?  Is the Silicon Image driver doing the wrong
> thing, or is DASD doing the wrong thing, or is BIO_RW_FAILFAST
> under-specified and we really need multiple flags or what?

It's a hard thing to implement, in general, for scalability reasons.

To make it work, you need to examine each driver's error handling to 
figure out what "fail fast" really means.

Most storage drivers are written to try as hard as possible to complete 
a request, where "try as hard as possible" can often mean internal 
retries while trying various multi-path configurations and hardware mode 
changes.  You might be catching SATA in the middle of error handling, 
for example.

So each driver really has a /slight different/ version of "try to 
complete this request", which has the obvious effects on BIO_RW_FAILFAST.

No clue about DASD, but in SATA's case I bet that a media or transfer 
error could be returned to the system more rapidly, while we continue to 
try to recover in the background.  libata doesn't have any direct 
knowledge of fail-fast at this point, IIRC.

But overall it's a job where you must examine each driver, or set of 
drivers :/

	Jeff



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is BIO_RW_FAILFAST really usable?
  2007-12-04  3:51 ` Jeff Garzik
@ 2007-12-04  4:19   ` Andrey Borzenkov
  0 siblings, 0 replies; 5+ messages in thread
From: Andrey Borzenkov @ 2007-12-04  4:19 UTC (permalink / raw)
  To: Jeff Garzik, Neil Brown, linux-kernel, linux-ide

Jeff Garzik wrote:

> Neil Brown wrote:
>> I've been looking at use BIO_RW_FAILFAST in md/raid to improve
>> handling of some error cases.
>> 
>> This is particularly significant for the DASD driver (s390 specific).
>> I believe it uses optic fibre to connect to the drives.  When one of
>> these paths is unplugged, IO requests will block until an operator
>> runs a command to reset the card (or until it is plugged back in).

Are there any options? This reminds me of Emulex lpfc driver that by default
would retry forever but could be configured to fail request after timeout.

>> The only way to avoid this blockage is to use BIO_RW_FAILFAST.  So
>> we really need BIO_RW_FAILFAST for a reliable RAID1 configuration on
>> DASD drives.
>> 
>> However, I just tested BIO_RW_FAILFAST on my SATA drives: controller
>> 
>> 02:06.0 RAID bus controller: Silicon Image, Inc. SiI 3114
>> [SATALink/SATARaid] Serial ATA Controller (rev 02)
>> 
>> (not using the cards minimal RAID functionality) and requests fail
>> immediately and always with e.g.
>> 
>> sd 2:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT
>> driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector
>> 2048
>> 
>> So fail fast obviously isn't generally usable.
>> 
>> What is the answer here?  Is the Silicon Image driver doing the wrong
>> thing, or is DASD doing the wrong thing, or is BIO_RW_FAILFAST
>> under-specified and we really need multiple flags or what?
> 
> It's a hard thing to implement, in general, for scalability reasons.
> 
> To make it work, you need to examine each driver's error handling to
> figure out what "fail fast" really means.
> 
> Most storage drivers are written to try as hard as possible to complete
> a request, where "try as hard as possible" can often mean internal
> retries while trying various multi-path configurations and hardware mode
> changes.  You might be catching SATA in the middle of error handling,
> for example.
> 
> So each driver really has a /slight different/ version of "try to
> complete this request", which has the obvious effects on BIO_RW_FAILFAST.
> 
> No clue about DASD, but in SATA's case I bet that a media or transfer
> error could be returned to the system more rapidly, while we continue to
> try to recover in the background. 

Well, FAILFAST is really needed only for redundant configuration (either
multipath or RAID). But in this case what is the point of retrying request
at all? It just complicates implementation.

Just fail request as soon as possible and let upper layer to recover.

-andrey

> libata doesn't have any direct 
> knowledge of fail-fast at this point, IIRC.
> 
> But overall it's a job where you must examine each driver, or set of
> drivers :/
> 
> Jeff
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is BIO_RW_FAILFAST really usable?
  2007-12-04  2:46 Is BIO_RW_FAILFAST really usable? Neil Brown
  2007-12-04  3:51 ` Jeff Garzik
@ 2007-12-04  9:13 ` Jens Axboe
  2007-12-05 23:14   ` Neil Brown
  1 sibling, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2007-12-04  9:13 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel

On Tue, Dec 04 2007, Neil Brown wrote:
> 
> I've been looking at use BIO_RW_FAILFAST in md/raid to improve
> handling of some error cases.
> 
> This is particularly significant for the DASD driver (s390 specific).
> I believe it uses optic fibre to connect to the drives.  When one of
> these paths is unplugged, IO requests will block until an operator
> runs a command to reset the card (or until it is plugged back in).
> The only way to avoid this blockage is to use BIO_RW_FAILFAST.  So
> we really need BIO_RW_FAILFAST for a reliable RAID1 configuration on
> DASD drives.
> 
> However, I just tested BIO_RW_FAILFAST on my SATA drives: controller 
> 
> 02:06.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
> 
> (not using the cards minimal RAID functionality) and requests fail
> immediately and always with e.g.
> 
> sd 2:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdc, sector 2048
> 
> So fail fast obviously isn't generally usable.
> 
> What is the answer here?  Is the Silicon Image driver doing the wrong
> thing, or is DASD doing the wrong thing, or is BIO_RW_FAILFAST
> under-specified and we really need multiple flags or what?

Hrmpf. It looks like the SCSI layer is a little too trigger happy. Any
chance you could try and trace where this happens?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is BIO_RW_FAILFAST really usable?
  2007-12-04  9:13 ` Jens Axboe
@ 2007-12-05 23:14   ` Neil Brown
  0 siblings, 0 replies; 5+ messages in thread
From: Neil Brown @ 2007-12-05 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Tuesday December 4, jens.axboe@oracle.com wrote:
> 
> Hrmpf. It looks like the SCSI layer is a little too trigger happy. Any
> chance you could try and trace where this happens?

in scsi_lib.c, in scsi_request_fn, near the top of the main
   while (!blk_queue_plugged(q)) {
loop:

		if (!scsi_dev_queue_ready(q, sdev)) {
			if ((req->cmd_flags & REQ_FAILFAST) &&
			    !(req->cmd_flags & REQ_PREEMPT)) {
				scsi_kill_request(req, q);
				continue;
			}
			break;
		}

If I remove the "if failfast and not preempt then kill" logic, my
problem goes away.

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-12-05 23:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-04  2:46 Is BIO_RW_FAILFAST really usable? Neil Brown
2007-12-04  3:51 ` Jeff Garzik
2007-12-04  4:19   ` Andrey Borzenkov
2007-12-04  9:13 ` Jens Axboe
2007-12-05 23:14   ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).