All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET] block: fix merge of requests with different failfast settings
@ 2009-07-03  8:48 ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori

Hello,

Block layer didn't consider failfast status while merging requests and
it led to premature failure of normal (non-failfast) IOs.  Niel
Lambrechts could trigger the problem semi-reliably on ext4 when
resuming from STR.  ext4 uses readahead when reading inodes and
combined with the deterministic extra SATA PHY exception cycle during
resume on the specific configuration, non-readahead inode read would
fail causing ext4 errors.  Please read the following thread for
details.

  http://lkml.org/lkml/2009/5/23/21

This patchset contains the following four patches to fix the problem.

 0001-block-don-t-merge-requests-of-different-failfast-se.patch
 0002-block-use-the-same-failfast-bits-for-bio-and-reques.patch
 0003-block-implement-mixed-merge-of-different-failfast-r.patch
 0004-scsi-block-update-SCSI-to-handle-mixed-merge-failur.patch

0001 disallows merge between requests with different failfast
settings.  This one is the quick fix and should go into 2.6.31 and
later to -stable as the bug is pretty serious and may lead to data
loss.

0002 preps for later changes.

0003-0004 implements and applies mixed merge.  Requests of different
failfast settings are merged as before but failure handling is updated
such that parts which shouldn't fail without retrial are properly
retried.

I spent quite some time thinking about and testing it but I'd really
like more pairs of eyes on this patchset as dangerous bugs can go
unnoticed for quite a while in this area (anyone knows when the
failfast bug was introduced?).

Jens, I think the best way to merge this is to first push 0001 to
Linus's tree and then pull it into for-next and then apply the rest on
top of them.

This patchset contains the following changes.

 block/blk-core.c        |  118 +++++++++++++++++++++++++++++++++++++++++++-----
 block/blk-merge.c       |   43 +++++++++++++++++
 block/blk.h             |    1 
 drivers/scsi/scsi_lib.c |    6 +-
 include/linux/bio.h     |   43 +++++++++--------
 include/linux/blkdev.h  |   27 ++++++++--
 6 files changed, 199 insertions(+), 39 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHSET] block: fix merge of requests with different failfast settings
@ 2009-07-03  8:48 ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi, Nie

Hello,

Block layer didn't consider failfast status while merging requests and
it led to premature failure of normal (non-failfast) IOs.  Niel
Lambrechts could trigger the problem semi-reliably on ext4 when
resuming from STR.  ext4 uses readahead when reading inodes and
combined with the deterministic extra SATA PHY exception cycle during
resume on the specific configuration, non-readahead inode read would
fail causing ext4 errors.  Please read the following thread for
details.

  http://lkml.org/lkml/2009/5/23/21

This patchset contains the following four patches to fix the problem.

 0001-block-don-t-merge-requests-of-different-failfast-se.patch
 0002-block-use-the-same-failfast-bits-for-bio-and-reques.patch
 0003-block-implement-mixed-merge-of-different-failfast-r.patch
 0004-scsi-block-update-SCSI-to-handle-mixed-merge-failur.patch

0001 disallows merge between requests with different failfast
settings.  This one is the quick fix and should go into 2.6.31 and
later to -stable as the bug is pretty serious and may lead to data
loss.

0002 preps for later changes.

0003-0004 implements and applies mixed merge.  Requests of different
failfast settings are merged as before but failure handling is updated
such that parts which shouldn't fail without retrial are properly
retried.

I spent quite some time thinking about and testing it but I'd really
like more pairs of eyes on this patchset as dangerous bugs can go
unnoticed for quite a while in this area (anyone knows when the
failfast bug was introduced?).

Jens, I think the best way to merge this is to first push 0001 to
Linus's tree and then pull it into for-next and then apply the rest on
top of them.

This patchset contains the following changes.

 block/blk-core.c        |  118 +++++++++++++++++++++++++++++++++++++++++++-----
 block/blk-merge.c       |   43 +++++++++++++++++
 block/blk.h             |    1 
 drivers/scsi/scsi_lib.c |    6 +-
 include/linux/bio.h     |   43 +++++++++--------
 include/linux/blkdev.h  |   27 ++++++++--
 6 files changed, 199 insertions(+), 39 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] block: don't merge requests of different failfast settings
  2009-07-03  8:48 ` Tejun Heo
@ 2009-07-03  8:48   ` Tejun Heo
  -1 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori
  Cc: Tejun Heo, Jens Axboe, Theodore Tso

Block layer used to merge requests and bios with different failfast
settings.  This caused regular IOs to fail prematurely when they were
merged into failfast requests for readahead.

Niel Lambrechts could trigger the problem semi-reliably on ext4 when
resuming from STR.  ext4 uses readahead when reading inodes and
combined with the deterministic extra SATA PHY exception cycle during
resume on the specific configuration, non-readahead inode read would
fail causing ext4 errors.  Please read the following thread for
details.

  http://lkml.org/lkml/2009/5/23/21

This patch makes block layer reject merging if the failfast settings
don't match.  This is correct but likely to lower IO performance by
preventing regular IOs from mingling into surrounding readahead
requests.  Changes to allow such mixed merges and handle errors
correctly will be added later.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Niel Lambrechts <niel.lambrechts@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Theodore Tso <tytso@mit.edu>
---
 block/blk-merge.c |    6 ++++++
 block/elevator.c  |    8 ++++++++
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 39ce644..e199967 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -350,6 +350,12 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 	if (blk_integrity_rq(req) != blk_integrity_rq(next))
 		return 0;
 
+	/* don't merge requests of different failfast settings */
+	if (blk_failfast_dev(req)	!= blk_failfast_dev(next)	||
+	    blk_failfast_transport(req)	!= blk_failfast_transport(next)	||
+	    blk_failfast_driver(req)	!= blk_failfast_driver(next))
+		return 0;
+
 	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
diff --git a/block/elevator.c b/block/elevator.c
index ca86192..6f23753 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -100,6 +100,14 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (bio_integrity(bio) != blk_integrity_rq(rq))
 		return 0;
 
+	/*
+	 * Don't merge if failfast settings don't match
+	 */
+	if (bio_failfast_dev(bio)	!= blk_failfast_dev(rq)		||
+	    bio_failfast_transport(bio)	!= blk_failfast_transport(rq)	||
+	    bio_failfast_driver(bio)	!= blk_failfast_driver(rq))
+		return 0;
+
 	if (!elv_iosched_allow_merge(rq, bio))
 		return 0;
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 1/4] block: don't merge requests of different failfast settings
@ 2009-07-03  8:48   ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi, Nie
  Cc: Tejun Heo, Jens Axboe, Theodore Tso

Block layer used to merge requests and bios with different failfast
settings.  This caused regular IOs to fail prematurely when they were
merged into failfast requests for readahead.

Niel Lambrechts could trigger the problem semi-reliably on ext4 when
resuming from STR.  ext4 uses readahead when reading inodes and
combined with the deterministic extra SATA PHY exception cycle during
resume on the specific configuration, non-readahead inode read would
fail causing ext4 errors.  Please read the following thread for
details.

  http://lkml.org/lkml/2009/5/23/21

This patch makes block layer reject merging if the failfast settings
don't match.  This is correct but likely to lower IO performance by
preventing regular IOs from mingling into surrounding readahead
requests.  Changes to allow such mixed merges and handle errors
correctly will be added later.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Niel Lambrechts <niel.lambrechts@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Theodore Tso <tytso@mit.edu>
---
 block/blk-merge.c |    6 ++++++
 block/elevator.c  |    8 ++++++++
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 39ce644..e199967 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -350,6 +350,12 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 	if (blk_integrity_rq(req) != blk_integrity_rq(next))
 		return 0;
 
+	/* don't merge requests of different failfast settings */
+	if (blk_failfast_dev(req)	!= blk_failfast_dev(next)	||
+	    blk_failfast_transport(req)	!= blk_failfast_transport(next)	||
+	    blk_failfast_driver(req)	!= blk_failfast_driver(next))
+		return 0;
+
 	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
diff --git a/block/elevator.c b/block/elevator.c
index ca86192..6f23753 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -100,6 +100,14 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (bio_integrity(bio) != blk_integrity_rq(rq))
 		return 0;
 
+	/*
+	 * Don't merge if failfast settings don't match
+	 */
+	if (bio_failfast_dev(bio)	!= blk_failfast_dev(rq)		||
+	    bio_failfast_transport(bio)	!= blk_failfast_transport(rq)	||
+	    bio_failfast_driver(bio)	!= blk_failfast_driver(rq))
+		return 0;
+
 	if (!elv_iosched_allow_merge(rq, bio))
 		return 0;
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-03  8:48 ` Tejun Heo
@ 2009-07-03  8:48   ` Tejun Heo
  -1 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori
  Cc: Tejun Heo, Jens Axboe

bio and request use the same set of failfast bits.  This patch makes
the following changes to simplify things.

* enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
  bits coincide with __REQ_FAILFAST_* bits.

* The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
  but the matching is useless anyway.  init_request_from_bio() is
  responsible for setting FAILFAST bits on FS requests and non-FS
  requests never use BIO_RW_AHEAD.  Drop the code and comment from
  blk_rq_bio_prep().

* Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
  simplify FAILFAST flags handling in init_request_from_bio().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c       |   19 +++++++------------
 include/linux/bio.h    |   43 +++++++++++++++++++++++--------------------
 include/linux/blkdev.h |    4 ++++
 3 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 8f4b9e0..cd3b265 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1119,17 +1119,13 @@ void init_request_from_bio(struct request *req, struct bio *bio)
 	req->cmd_type = REQ_TYPE_FS;
 
 	/*
-	 * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
+	 * Inherit FAILFAST from bio (for read-ahead, and explicit
+	 * FAILFAST).  FAILFAST flags are identical for req and bio.
 	 */
 	if (bio_rw_ahead(bio))
-		req->cmd_flags |= (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
-				   REQ_FAILFAST_DRIVER);
-	if (bio_failfast_dev(bio))
-		req->cmd_flags |= REQ_FAILFAST_DEV;
-	if (bio_failfast_transport(bio))
-		req->cmd_flags |= REQ_FAILFAST_TRANSPORT;
-	if (bio_failfast_driver(bio))
-		req->cmd_flags |= REQ_FAILFAST_DRIVER;
+		req->cmd_flags |= REQ_FAILFAST_MASK;
+	else
+		req->cmd_flags |= bio->bi_rw & REQ_FAILFAST_MASK;
 
 	if (unlikely(bio_discard(bio))) {
 		req->cmd_flags |= REQ_DISCARD;
@@ -2247,9 +2243,8 @@ EXPORT_SYMBOL_GPL(__blk_end_request_cur);
 void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
 		     struct bio *bio)
 {
-	/* Bit 0 (R/W) is identical in rq->cmd_flags and bio->bi_rw, and
-	   we want BIO_RW_AHEAD (bit 1) to imply REQ_FAILFAST (bit 1). */
-	rq->cmd_flags |= (bio->bi_rw & 3);
+	/* Bit 0 (R/W) is identical in rq->cmd_flags and bio->bi_rw */
+	rq->cmd_flags |= bio->bi_rw & REQ_RW;
 
 	if (bio_has_data(bio)) {
 		rq->nr_phys_segments = bio_phys_segments(q, bio);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2892b71..a299ed3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -142,37 +142,40 @@ struct bio {
  *
  * bit 0 -- data direction
  *	If not set, bio is a read from device. If set, it's a write to device.
- * bit 1 -- rw-ahead when set
- * bit 2 -- barrier
+ * bit 1 -- fail fast device errors
+ * bit 2 -- fail fast transport errors
+ * bit 3 -- fail fast driver errors
+ * bit 4 -- rw-ahead when set
+ * bit 5 -- barrier
  *	Insert a serialization point in the IO queue, forcing previously
  *	submitted IO to be completed before this one is issued.
- * bit 3 -- synchronous I/O hint.
- * bit 4 -- Unplug the device immediately after submitting this bio.
- * bit 5 -- metadata request
+ * bit 6 -- synchronous I/O hint.
+ * bit 7 -- Unplug the device immediately after submitting this bio.
+ * bit 8 -- metadata request
  *	Used for tracing to differentiate metadata and data IO. May also
  *	get some preferential treatment in the IO scheduler
- * bit 6 -- discard sectors
+ * bit 9 -- discard sectors
  *	Informs the lower level device that this range of sectors is no longer
  *	used by the file system and may thus be freed by the device. Used
  *	for flash based storage.
- * bit 7 -- fail fast device errors
- * bit 8 -- fail fast transport errors
- * bit 9 -- fail fast driver errors
  *	Don't want driver retries for any fast fail whatever the reason.
  * bit 10 -- Tell the IO scheduler not to wait for more requests after this
 	one has been submitted, even if it is a SYNC request.
  */
-#define BIO_RW		0	/* Must match RW in req flags (blkdev.h) */
-#define BIO_RW_AHEAD	1	/* Must match FAILFAST in req flags */
-#define BIO_RW_BARRIER	2
-#define BIO_RW_SYNCIO	3
-#define BIO_RW_UNPLUG	4
-#define BIO_RW_META	5
-#define BIO_RW_DISCARD	6
-#define BIO_RW_FAILFAST_DEV		7
-#define BIO_RW_FAILFAST_TRANSPORT	8
-#define BIO_RW_FAILFAST_DRIVER		9
-#define BIO_RW_NOIDLE	10
+enum bio_rw_flags {
+	BIO_RW,
+	BIO_RW_FAILFAST_DEV,
+	BIO_RW_FAILFAST_TRANSPORT,
+	BIO_RW_FAILFAST_DRIVER,
+	/* above flags must match REQ_* */
+	BIO_RW_AHEAD,
+	BIO_RW_BARRIER,
+	BIO_RW_SYNCIO,
+	BIO_RW_UNPLUG,
+	BIO_RW_META,
+	BIO_RW_DISCARD,
+	BIO_RW_NOIDLE,
+};
 
 #define bio_rw_flagged(bio, flag)	((bio)->bi_rw & (1 << (flag)))
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 49ae079..a0e5ce1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -98,6 +98,7 @@ enum rq_flag_bits {
 	__REQ_FAILFAST_DEV,	/* no driver retries of device errors */
 	__REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
 	__REQ_FAILFAST_DRIVER,	/* no driver retries of driver errors */
+	/* above flags must match BIO_RW_* */
 	__REQ_DISCARD,		/* request to discard sectors */
 	__REQ_SORTED,		/* elevator knows about this request */
 	__REQ_SOFTBARRIER,	/* may not be passed by ioscheduler */
@@ -148,6 +149,9 @@ enum rq_flag_bits {
 #define REQ_NOIDLE	(1 << __REQ_NOIDLE)
 #define REQ_IO_STAT	(1 << __REQ_IO_STAT)
 
+#define REQ_FAILFAST_MASK	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | \
+				 REQ_FAILFAST_DRIVER)
+
 #define BLK_MAX_CDB	16
 
 /*
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] block: use the same failfast bits for bio and request
@ 2009-07-03  8:48   ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi, Nie
  Cc: Tejun Heo, Jens Axboe

bio and request use the same set of failfast bits.  This patch makes
the following changes to simplify things.

* enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
  bits coincide with __REQ_FAILFAST_* bits.

* The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
  but the matching is useless anyway.  init_request_from_bio() is
  responsible for setting FAILFAST bits on FS requests and non-FS
  requests never use BIO_RW_AHEAD.  Drop the code and comment from
  blk_rq_bio_prep().

* Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
  simplify FAILFAST flags handling in init_request_from_bio().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c       |   19 +++++++------------
 include/linux/bio.h    |   43 +++++++++++++++++++++++--------------------
 include/linux/blkdev.h |    4 ++++
 3 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 8f4b9e0..cd3b265 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1119,17 +1119,13 @@ void init_request_from_bio(struct request *req, struct bio *bio)
 	req->cmd_type = REQ_TYPE_FS;
 
 	/*
-	 * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
+	 * Inherit FAILFAST from bio (for read-ahead, and explicit
+	 * FAILFAST).  FAILFAST flags are identical for req and bio.
 	 */
 	if (bio_rw_ahead(bio))
-		req->cmd_flags |= (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
-				   REQ_FAILFAST_DRIVER);
-	if (bio_failfast_dev(bio))
-		req->cmd_flags |= REQ_FAILFAST_DEV;
-	if (bio_failfast_transport(bio))
-		req->cmd_flags |= REQ_FAILFAST_TRANSPORT;
-	if (bio_failfast_driver(bio))
-		req->cmd_flags |= REQ_FAILFAST_DRIVER;
+		req->cmd_flags |= REQ_FAILFAST_MASK;
+	else
+		req->cmd_flags |= bio->bi_rw & REQ_FAILFAST_MASK;
 
 	if (unlikely(bio_discard(bio))) {
 		req->cmd_flags |= REQ_DISCARD;
@@ -2247,9 +2243,8 @@ EXPORT_SYMBOL_GPL(__blk_end_request_cur);
 void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
 		     struct bio *bio)
 {
-	/* Bit 0 (R/W) is identical in rq->cmd_flags and bio->bi_rw, and
-	   we want BIO_RW_AHEAD (bit 1) to imply REQ_FAILFAST (bit 1). */
-	rq->cmd_flags |= (bio->bi_rw & 3);
+	/* Bit 0 (R/W) is identical in rq->cmd_flags and bio->bi_rw */
+	rq->cmd_flags |= bio->bi_rw & REQ_RW;
 
 	if (bio_has_data(bio)) {
 		rq->nr_phys_segments = bio_phys_segments(q, bio);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2892b71..a299ed3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -142,37 +142,40 @@ struct bio {
  *
  * bit 0 -- data direction
  *	If not set, bio is a read from device. If set, it's a write to device.
- * bit 1 -- rw-ahead when set
- * bit 2 -- barrier
+ * bit 1 -- fail fast device errors
+ * bit 2 -- fail fast transport errors
+ * bit 3 -- fail fast driver errors
+ * bit 4 -- rw-ahead when set
+ * bit 5 -- barrier
  *	Insert a serialization point in the IO queue, forcing previously
  *	submitted IO to be completed before this one is issued.
- * bit 3 -- synchronous I/O hint.
- * bit 4 -- Unplug the device immediately after submitting this bio.
- * bit 5 -- metadata request
+ * bit 6 -- synchronous I/O hint.
+ * bit 7 -- Unplug the device immediately after submitting this bio.
+ * bit 8 -- metadata request
  *	Used for tracing to differentiate metadata and data IO. May also
  *	get some preferential treatment in the IO scheduler
- * bit 6 -- discard sectors
+ * bit 9 -- discard sectors
  *	Informs the lower level device that this range of sectors is no longer
  *	used by the file system and may thus be freed by the device. Used
  *	for flash based storage.
- * bit 7 -- fail fast device errors
- * bit 8 -- fail fast transport errors
- * bit 9 -- fail fast driver errors
  *	Don't want driver retries for any fast fail whatever the reason.
  * bit 10 -- Tell the IO scheduler not to wait for more requests after this
 	one has been submitted, even if it is a SYNC request.
  */
-#define BIO_RW		0	/* Must match RW in req flags (blkdev.h) */
-#define BIO_RW_AHEAD	1	/* Must match FAILFAST in req flags */
-#define BIO_RW_BARRIER	2
-#define BIO_RW_SYNCIO	3
-#define BIO_RW_UNPLUG	4
-#define BIO_RW_META	5
-#define BIO_RW_DISCARD	6
-#define BIO_RW_FAILFAST_DEV		7
-#define BIO_RW_FAILFAST_TRANSPORT	8
-#define BIO_RW_FAILFAST_DRIVER		9
-#define BIO_RW_NOIDLE	10
+enum bio_rw_flags {
+	BIO_RW,
+	BIO_RW_FAILFAST_DEV,
+	BIO_RW_FAILFAST_TRANSPORT,
+	BIO_RW_FAILFAST_DRIVER,
+	/* above flags must match REQ_* */
+	BIO_RW_AHEAD,
+	BIO_RW_BARRIER,
+	BIO_RW_SYNCIO,
+	BIO_RW_UNPLUG,
+	BIO_RW_META,
+	BIO_RW_DISCARD,
+	BIO_RW_NOIDLE,
+};
 
 #define bio_rw_flagged(bio, flag)	((bio)->bi_rw & (1 << (flag)))
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 49ae079..a0e5ce1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -98,6 +98,7 @@ enum rq_flag_bits {
 	__REQ_FAILFAST_DEV,	/* no driver retries of device errors */
 	__REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
 	__REQ_FAILFAST_DRIVER,	/* no driver retries of driver errors */
+	/* above flags must match BIO_RW_* */
 	__REQ_DISCARD,		/* request to discard sectors */
 	__REQ_SORTED,		/* elevator knows about this request */
 	__REQ_SOFTBARRIER,	/* may not be passed by ioscheduler */
@@ -148,6 +149,9 @@ enum rq_flag_bits {
 #define REQ_NOIDLE	(1 << __REQ_NOIDLE)
 #define REQ_IO_STAT	(1 << __REQ_IO_STAT)
 
+#define REQ_FAILFAST_MASK	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | \
+				 REQ_FAILFAST_DRIVER)
+
 #define BLK_MAX_CDB	16
 
 /*
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] block: implement mixed merge of different failfast requests
  2009-07-03  8:48 ` Tejun Heo
@ 2009-07-03  8:48   ` Tejun Heo
  -1 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori
  Cc: Tejun Heo, Jens Axboe, Niel Lambrechts

Failfast has characteristics from other attributes.  When issuing,
executing and successuflly completing requests, failfast doesn't make
any difference.  It only affects how a request is handled on failure.
Allowing requests with different failfast settings to be merged cause
normal IOs to fail prematurely while not allowing has performance
penalties as failfast is used for read aheads which are likely to be
located near in-flight or to-be-issued normal IOs.

This patch introduces the concept of 'mixed merge'.  A request is a
mixed merge if it is merge of segments which require different
handling on failure.  Currently the only mixable attributes are
failfast ones (or lack thereof).

When a bio with different failfast settings is added to an existing
request or requests of different failfast settings are merged, the
merged request is marked mixed.  Each bio carries failfast settings
and the request always tracks failfast state of the first bio.  When
the request fails, blk_rq_err_bytes() can be used to determine how
many bytes can be safely failed without crossing into an area which
requires further retrials.

This allows request merging regardless of failfast settings while
keeping the failure handling correct.

This patch only implements mixed merge but doesn't enable it.  The
next one will update SCSI to make use of mixed merge.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
---
 block/blk-core.c       |   99 ++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-merge.c      |   43 +++++++++++++++++++++
 block/blk.h            |    1 +
 include/linux/blkdev.h |   23 +++++++++--
 4 files changed, 161 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index cd3b265..0214837 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1165,6 +1165,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
 	const unsigned short prio = bio_prio(bio);
 	const int sync = bio_sync(bio);
 	const int unplug = bio_unplug(bio);
+	const unsigned int ff = bio->bi_rw & REQ_FAILFAST_MASK;
 	int rw_flags;
 
 	if (bio_barrier(bio) && bio_has_data(bio) &&
@@ -1194,6 +1195,9 @@ static int __make_request(struct request_queue *q, struct bio *bio)
 
 		trace_block_bio_backmerge(q, bio);
 
+		if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
+			blk_rq_set_mixed_merge(req);
+
 		req->biotail->bi_next = bio;
 		req->biotail = bio;
 		req->__data_len += bytes;
@@ -1213,6 +1217,12 @@ static int __make_request(struct request_queue *q, struct bio *bio)
 
 		trace_block_bio_frontmerge(q, bio);
 
+		if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) {
+			blk_rq_set_mixed_merge(req);
+			req->cmd_flags &= ~REQ_FAILFAST_MASK;
+			req->cmd_flags |= ff;
+		}
+
 		bio->bi_next = req->bio;
 		req->bio = bio;
 
@@ -1657,6 +1667,50 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
 
+/**
+ * blk_rq_err_bytes - determine number of bytes till the next failure boundary
+ * @rq: request to examine
+ *
+ * Description:
+ *     A request could be merge of IOs which require different failure
+ *     handling.  This function determines the number of bytes which
+ *     can be failed from the beginning of the request without
+ *     crossing into area which need to be retried further.
+ *
+ * Return:
+ *     The number of bytes to fail.
+ *
+ * Context:
+ *     queue_lock must be held.
+ */
+unsigned int blk_rq_err_bytes(const struct request *rq)
+{
+	unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
+	unsigned int bytes = 0;
+	struct bio *bio;
+
+	if (!(rq->cmd_flags & REQ_MIXED_MERGE))
+		return blk_rq_bytes(rq);
+
+	/*
+	 * Currently the only 'mixing' which can happen is between
+	 * different fastfail types.  We can safely fail portions
+	 * which have all the failfast bits that the first one has -
+	 * the ones which are at least as eager to fail as the first
+	 * one.
+	 */
+	for (bio = rq->bio; bio; bio = bio->bi_next) {
+		if ((bio->bi_rw & ff) != ff)
+			break;
+		bytes += bio->bi_size;
+	}
+
+	/* this could lead to infinite loop */
+	BUG_ON(blk_rq_bytes(rq) && !bytes);
+	return bytes;
+}
+EXPORT_SYMBOL_GPL(blk_rq_err_bytes);
+
 static void blk_account_io_completion(struct request *req, unsigned int bytes)
 {
 	if (blk_do_io_stat(req)) {
@@ -2003,6 +2057,12 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 	if (blk_fs_request(req) || blk_discard_rq(req))
 		req->__sector += total_bytes >> 9;
 
+	/* mixed attributes always follow the first bio */
+	if (req->cmd_flags & REQ_MIXED_MERGE) {
+		req->cmd_flags &= ~REQ_FAILFAST_MASK;
+		req->cmd_flags |= req->bio->bi_rw & REQ_FAILFAST_MASK;
+	}
+
 	/*
 	 * If total number of sectors is less than the first segment
 	 * size, something has gone terribly wrong.
@@ -2182,6 +2242,25 @@ bool blk_end_request_cur(struct request *rq, int error)
 EXPORT_SYMBOL_GPL(blk_end_request_cur);
 
 /**
+ * blk_end_request_err - Finish a request till the next failure boundary.
+ * @rq: the request to finish till the next failure boundary for
+ * @error: must be negative errno
+ *
+ * Description:
+ *     Complete @rq till the next failure boundary.
+ *
+ * Return:
+ *     %false - we are done with this request
+ *     %true  - still buffers pending for this request
+ */
+bool blk_end_request_err(struct request *rq, int error)
+{
+	WARN_ON(error >= 0);
+	return blk_end_request(rq, error, blk_rq_err_bytes(rq));
+}
+EXPORT_SYMBOL_GPL(blk_end_request_err);
+
+/**
  * __blk_end_request - Helper function for drivers to complete the request.
  * @rq:       the request being processed
  * @error:    %0 for success, < %0 for error
@@ -2240,6 +2319,26 @@ bool __blk_end_request_cur(struct request *rq, int error)
 }
 EXPORT_SYMBOL_GPL(__blk_end_request_cur);
 
+/**
+ * __blk_end_request_err - Finish a request till the next failure boundary.
+ * @rq: the request to finish till the next failure boundary for
+ * @error: must be negative errno
+ *
+ * Description:
+ *     Complete @rq till the next failure boundary.  Must be called
+ *     with queue lock held.
+ *
+ * Return:
+ *     %false - we are done with this request
+ *     %true  - still buffers pending for this request
+ */
+bool __blk_end_request_err(struct request *rq, int error)
+{
+	WARN_ON(error >= 0);
+	return __blk_end_request(rq, error, blk_rq_err_bytes(rq));
+}
+EXPORT_SYMBOL_GPL(__blk_end_request_err);
+
 void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
 		     struct bio *bio)
 {
diff --git a/block/blk-merge.c b/block/blk-merge.c
index e199967..7c9ca01 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -311,6 +311,36 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	return 1;
 }
 
+/**
+ * blk_rq_set_mixed_merge - mark a request as mixed merge
+ * @rq: request to mark as mixed merge
+ *
+ * Description:
+ *     @rq is about to be mixed merged.  Make sure the attributes
+ *     which can be mixed are set in each bio and mark @rq as mixed
+ *     merged.
+ */
+void blk_rq_set_mixed_merge(struct request *rq)
+{
+	unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
+	struct bio *bio;
+
+	if (rq->cmd_flags & REQ_MIXED_MERGE)
+		return;
+
+	/*
+	 * @rq will no longer represent mixable attributes for all the
+	 * contained bios.  It will just track those of the first one.
+	 * Distributes the attributs to each bio.
+	 */
+	for (bio = rq->bio; bio; bio = bio->bi_next) {
+		WARN_ON_ONCE((bio->bi_rw & REQ_FAILFAST_MASK) &&
+			     (bio->bi_rw & REQ_FAILFAST_MASK) != ff);
+		bio->bi_rw |= ff;
+	}
+	rq->cmd_flags |= REQ_MIXED_MERGE;
+}
+
 static void blk_account_io_merge(struct request *req)
 {
 	if (blk_do_io_stat(req)) {
@@ -366,6 +396,19 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 		return 0;
 
 	/*
+	 * If failfast settings disagree or any of the two is already
+	 * a mixed merge, mark both as mixed before proceeding.  This
+	 * makes sure that all involved bios have mixable attributes
+	 * set properly.
+	 */
+	if ((req->cmd_flags | next->cmd_flags) & REQ_MIXED_MERGE ||
+	    (req->cmd_flags & REQ_FAILFAST_MASK) !=
+	    (next->cmd_flags & REQ_FAILFAST_MASK)) {
+		blk_rq_set_mixed_merge(req);
+		blk_rq_set_mixed_merge(next);
+	}
+
+	/*
 	 * At this point we have either done a back merge
 	 * or front merge. We need the smaller start_time of
 	 * the merged requests to be the current request
diff --git a/block/blk.h b/block/blk.h
index 3fae6ad..5ee3d7e 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -104,6 +104,7 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
 int attempt_back_merge(struct request_queue *q, struct request *rq);
 int attempt_front_merge(struct request_queue *q, struct request *rq);
 void blk_recalc_rq_segments(struct request *rq);
+void blk_rq_set_mixed_merge(struct request *rq);
 
 void blk_queue_congestion_threshold(struct request_queue *q);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a0e5ce1..e58079f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -120,6 +120,7 @@ enum rq_flag_bits {
 	__REQ_INTEGRITY,	/* integrity metadata has been remapped */
 	__REQ_NOIDLE,		/* Don't anticipate more IO after this one */
 	__REQ_IO_STAT,		/* account I/O stat */
+	__REQ_MIXED_MERGE,	/* merge of different types, fail separately */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -148,6 +149,7 @@ enum rq_flag_bits {
 #define REQ_INTEGRITY	(1 << __REQ_INTEGRITY)
 #define REQ_NOIDLE	(1 << __REQ_NOIDLE)
 #define REQ_IO_STAT	(1 << __REQ_IO_STAT)
+#define REQ_MIXED_MERGE	(1 << __REQ_MIXED_MERGE)
 
 #define REQ_FAILFAST_MASK	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | \
 				 REQ_FAILFAST_DRIVER)
@@ -836,11 +838,13 @@ static inline void blk_run_address_space(struct address_space *mapping)
 }
 
 /*
- * blk_rq_pos()		: the current sector
- * blk_rq_bytes()	: bytes left in the entire request
- * blk_rq_cur_bytes()	: bytes left in the current segment
- * blk_rq_sectors()	: sectors left in the entire request
- * blk_rq_cur_sectors()	: sectors left in the current segment
+ * blk_rq_pos()			: the current sector
+ * blk_rq_bytes()		: bytes left in the entire request
+ * blk_rq_cur_bytes()		: bytes left in the current segment
+ * blk_rq_err_bytes()		: bytes left till the next error boundary
+ * blk_rq_sectors()		: sectors left in the entire request
+ * blk_rq_cur_sectors()		: sectors left in the current segment
+ * blk_rq_err_sectors()		: sectors left till the next error boundary
  */
 static inline sector_t blk_rq_pos(const struct request *rq)
 {
@@ -857,6 +861,8 @@ static inline int blk_rq_cur_bytes(const struct request *rq)
 	return rq->bio ? bio_cur_bytes(rq->bio) : 0;
 }
 
+extern unsigned int blk_rq_err_bytes(const struct request *rq);
+
 static inline unsigned int blk_rq_sectors(const struct request *rq)
 {
 	return blk_rq_bytes(rq) >> 9;
@@ -867,6 +873,11 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
 	return blk_rq_cur_bytes(rq) >> 9;
 }
 
+static inline unsigned int blk_rq_err_sectors(const struct request *rq)
+{
+	return blk_rq_err_bytes(rq) >> 9;
+}
+
 /*
  * Request issue related functions.
  */
@@ -893,10 +904,12 @@ extern bool blk_end_request(struct request *rq, int error,
 			    unsigned int nr_bytes);
 extern void blk_end_request_all(struct request *rq, int error);
 extern bool blk_end_request_cur(struct request *rq, int error);
+extern bool blk_end_request_err(struct request *rq, int error);
 extern bool __blk_end_request(struct request *rq, int error,
 			      unsigned int nr_bytes);
 extern void __blk_end_request_all(struct request *rq, int error);
 extern bool __blk_end_request_cur(struct request *rq, int error);
+extern bool __blk_end_request_err(struct request *rq, int error);
 
 extern void blk_complete_request(struct request *);
 extern void __blk_complete_request(struct request *);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] block: implement mixed merge of different failfast requests
@ 2009-07-03  8:48   ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi
  Cc: Tejun Heo, Jens Axboe, Niel Lambrechts

Failfast has characteristics from other attributes.  When issuing,
executing and successuflly completing requests, failfast doesn't make
any difference.  It only affects how a request is handled on failure.
Allowing requests with different failfast settings to be merged cause
normal IOs to fail prematurely while not allowing has performance
penalties as failfast is used for read aheads which are likely to be
located near in-flight or to-be-issued normal IOs.

This patch introduces the concept of 'mixed merge'.  A request is a
mixed merge if it is merge of segments which require different
handling on failure.  Currently the only mixable attributes are
failfast ones (or lack thereof).

When a bio with different failfast settings is added to an existing
request or requests of different failfast settings are merged, the
merged request is marked mixed.  Each bio carries failfast settings
and the request always tracks failfast state of the first bio.  When
the request fails, blk_rq_err_bytes() can be used to determine how
many bytes can be safely failed without crossing into an area which
requires further retrials.

This allows request merging regardless of failfast settings while
keeping the failure handling correct.

This patch only implements mixed merge but doesn't enable it.  The
next one will update SCSI to make use of mixed merge.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
---
 block/blk-core.c       |   99 ++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-merge.c      |   43 +++++++++++++++++++++
 block/blk.h            |    1 +
 include/linux/blkdev.h |   23 +++++++++--
 4 files changed, 161 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index cd3b265..0214837 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1165,6 +1165,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
 	const unsigned short prio = bio_prio(bio);
 	const int sync = bio_sync(bio);
 	const int unplug = bio_unplug(bio);
+	const unsigned int ff = bio->bi_rw & REQ_FAILFAST_MASK;
 	int rw_flags;
 
 	if (bio_barrier(bio) && bio_has_data(bio) &&
@@ -1194,6 +1195,9 @@ static int __make_request(struct request_queue *q, struct bio *bio)
 
 		trace_block_bio_backmerge(q, bio);
 
+		if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
+			blk_rq_set_mixed_merge(req);
+
 		req->biotail->bi_next = bio;
 		req->biotail = bio;
 		req->__data_len += bytes;
@@ -1213,6 +1217,12 @@ static int __make_request(struct request_queue *q, struct bio *bio)
 
 		trace_block_bio_frontmerge(q, bio);
 
+		if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) {
+			blk_rq_set_mixed_merge(req);
+			req->cmd_flags &= ~REQ_FAILFAST_MASK;
+			req->cmd_flags |= ff;
+		}
+
 		bio->bi_next = req->bio;
 		req->bio = bio;
 
@@ -1657,6 +1667,50 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
 
+/**
+ * blk_rq_err_bytes - determine number of bytes till the next failure boundary
+ * @rq: request to examine
+ *
+ * Description:
+ *     A request could be merge of IOs which require different failure
+ *     handling.  This function determines the number of bytes which
+ *     can be failed from the beginning of the request without
+ *     crossing into area which need to be retried further.
+ *
+ * Return:
+ *     The number of bytes to fail.
+ *
+ * Context:
+ *     queue_lock must be held.
+ */
+unsigned int blk_rq_err_bytes(const struct request *rq)
+{
+	unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
+	unsigned int bytes = 0;
+	struct bio *bio;
+
+	if (!(rq->cmd_flags & REQ_MIXED_MERGE))
+		return blk_rq_bytes(rq);
+
+	/*
+	 * Currently the only 'mixing' which can happen is between
+	 * different fastfail types.  We can safely fail portions
+	 * which have all the failfast bits that the first one has -
+	 * the ones which are at least as eager to fail as the first
+	 * one.
+	 */
+	for (bio = rq->bio; bio; bio = bio->bi_next) {
+		if ((bio->bi_rw & ff) != ff)
+			break;
+		bytes += bio->bi_size;
+	}
+
+	/* this could lead to infinite loop */
+	BUG_ON(blk_rq_bytes(rq) && !bytes);
+	return bytes;
+}
+EXPORT_SYMBOL_GPL(blk_rq_err_bytes);
+
 static void blk_account_io_completion(struct request *req, unsigned int bytes)
 {
 	if (blk_do_io_stat(req)) {
@@ -2003,6 +2057,12 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 	if (blk_fs_request(req) || blk_discard_rq(req))
 		req->__sector += total_bytes >> 9;
 
+	/* mixed attributes always follow the first bio */
+	if (req->cmd_flags & REQ_MIXED_MERGE) {
+		req->cmd_flags &= ~REQ_FAILFAST_MASK;
+		req->cmd_flags |= req->bio->bi_rw & REQ_FAILFAST_MASK;
+	}
+
 	/*
 	 * If total number of sectors is less than the first segment
 	 * size, something has gone terribly wrong.
@@ -2182,6 +2242,25 @@ bool blk_end_request_cur(struct request *rq, int error)
 EXPORT_SYMBOL_GPL(blk_end_request_cur);
 
 /**
+ * blk_end_request_err - Finish a request till the next failure boundary.
+ * @rq: the request to finish till the next failure boundary for
+ * @error: must be negative errno
+ *
+ * Description:
+ *     Complete @rq till the next failure boundary.
+ *
+ * Return:
+ *     %false - we are done with this request
+ *     %true  - still buffers pending for this request
+ */
+bool blk_end_request_err(struct request *rq, int error)
+{
+	WARN_ON(error >= 0);
+	return blk_end_request(rq, error, blk_rq_err_bytes(rq));
+}
+EXPORT_SYMBOL_GPL(blk_end_request_err);
+
+/**
  * __blk_end_request - Helper function for drivers to complete the request.
  * @rq:       the request being processed
  * @error:    %0 for success, < %0 for error
@@ -2240,6 +2319,26 @@ bool __blk_end_request_cur(struct request *rq, int error)
 }
 EXPORT_SYMBOL_GPL(__blk_end_request_cur);
 
+/**
+ * __blk_end_request_err - Finish a request till the next failure boundary.
+ * @rq: the request to finish till the next failure boundary for
+ * @error: must be negative errno
+ *
+ * Description:
+ *     Complete @rq till the next failure boundary.  Must be called
+ *     with queue lock held.
+ *
+ * Return:
+ *     %false - we are done with this request
+ *     %true  - still buffers pending for this request
+ */
+bool __blk_end_request_err(struct request *rq, int error)
+{
+	WARN_ON(error >= 0);
+	return __blk_end_request(rq, error, blk_rq_err_bytes(rq));
+}
+EXPORT_SYMBOL_GPL(__blk_end_request_err);
+
 void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
 		     struct bio *bio)
 {
diff --git a/block/blk-merge.c b/block/blk-merge.c
index e199967..7c9ca01 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -311,6 +311,36 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	return 1;
 }
 
+/**
+ * blk_rq_set_mixed_merge - mark a request as mixed merge
+ * @rq: request to mark as mixed merge
+ *
+ * Description:
+ *     @rq is about to be mixed merged.  Make sure the attributes
+ *     which can be mixed are set in each bio and mark @rq as mixed
+ *     merged.
+ */
+void blk_rq_set_mixed_merge(struct request *rq)
+{
+	unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
+	struct bio *bio;
+
+	if (rq->cmd_flags & REQ_MIXED_MERGE)
+		return;
+
+	/*
+	 * @rq will no longer represent mixable attributes for all the
+	 * contained bios.  It will just track those of the first one.
+	 * Distributes the attributs to each bio.
+	 */
+	for (bio = rq->bio; bio; bio = bio->bi_next) {
+		WARN_ON_ONCE((bio->bi_rw & REQ_FAILFAST_MASK) &&
+			     (bio->bi_rw & REQ_FAILFAST_MASK) != ff);
+		bio->bi_rw |= ff;
+	}
+	rq->cmd_flags |= REQ_MIXED_MERGE;
+}
+
 static void blk_account_io_merge(struct request *req)
 {
 	if (blk_do_io_stat(req)) {
@@ -366,6 +396,19 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 		return 0;
 
 	/*
+	 * If failfast settings disagree or any of the two is already
+	 * a mixed merge, mark both as mixed before proceeding.  This
+	 * makes sure that all involved bios have mixable attributes
+	 * set properly.
+	 */
+	if ((req->cmd_flags | next->cmd_flags) & REQ_MIXED_MERGE ||
+	    (req->cmd_flags & REQ_FAILFAST_MASK) !=
+	    (next->cmd_flags & REQ_FAILFAST_MASK)) {
+		blk_rq_set_mixed_merge(req);
+		blk_rq_set_mixed_merge(next);
+	}
+
+	/*
 	 * At this point we have either done a back merge
 	 * or front merge. We need the smaller start_time of
 	 * the merged requests to be the current request
diff --git a/block/blk.h b/block/blk.h
index 3fae6ad..5ee3d7e 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -104,6 +104,7 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
 int attempt_back_merge(struct request_queue *q, struct request *rq);
 int attempt_front_merge(struct request_queue *q, struct request *rq);
 void blk_recalc_rq_segments(struct request *rq);
+void blk_rq_set_mixed_merge(struct request *rq);
 
 void blk_queue_congestion_threshold(struct request_queue *q);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a0e5ce1..e58079f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -120,6 +120,7 @@ enum rq_flag_bits {
 	__REQ_INTEGRITY,	/* integrity metadata has been remapped */
 	__REQ_NOIDLE,		/* Don't anticipate more IO after this one */
 	__REQ_IO_STAT,		/* account I/O stat */
+	__REQ_MIXED_MERGE,	/* merge of different types, fail separately */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -148,6 +149,7 @@ enum rq_flag_bits {
 #define REQ_INTEGRITY	(1 << __REQ_INTEGRITY)
 #define REQ_NOIDLE	(1 << __REQ_NOIDLE)
 #define REQ_IO_STAT	(1 << __REQ_IO_STAT)
+#define REQ_MIXED_MERGE	(1 << __REQ_MIXED_MERGE)
 
 #define REQ_FAILFAST_MASK	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | \
 				 REQ_FAILFAST_DRIVER)
@@ -836,11 +838,13 @@ static inline void blk_run_address_space(struct address_space *mapping)
 }
 
 /*
- * blk_rq_pos()		: the current sector
- * blk_rq_bytes()	: bytes left in the entire request
- * blk_rq_cur_bytes()	: bytes left in the current segment
- * blk_rq_sectors()	: sectors left in the entire request
- * blk_rq_cur_sectors()	: sectors left in the current segment
+ * blk_rq_pos()			: the current sector
+ * blk_rq_bytes()		: bytes left in the entire request
+ * blk_rq_cur_bytes()		: bytes left in the current segment
+ * blk_rq_err_bytes()		: bytes left till the next error boundary
+ * blk_rq_sectors()		: sectors left in the entire request
+ * blk_rq_cur_sectors()		: sectors left in the current segment
+ * blk_rq_err_sectors()		: sectors left till the next error boundary
  */
 static inline sector_t blk_rq_pos(const struct request *rq)
 {
@@ -857,6 +861,8 @@ static inline int blk_rq_cur_bytes(const struct request *rq)
 	return rq->bio ? bio_cur_bytes(rq->bio) : 0;
 }
 
+extern unsigned int blk_rq_err_bytes(const struct request *rq);
+
 static inline unsigned int blk_rq_sectors(const struct request *rq)
 {
 	return blk_rq_bytes(rq) >> 9;
@@ -867,6 +873,11 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
 	return blk_rq_cur_bytes(rq) >> 9;
 }
 
+static inline unsigned int blk_rq_err_sectors(const struct request *rq)
+{
+	return blk_rq_err_bytes(rq) >> 9;
+}
+
 /*
  * Request issue related functions.
  */
@@ -893,10 +904,12 @@ extern bool blk_end_request(struct request *rq, int error,
 			    unsigned int nr_bytes);
 extern void blk_end_request_all(struct request *rq, int error);
 extern bool blk_end_request_cur(struct request *rq, int error);
+extern bool blk_end_request_err(struct request *rq, int error);
 extern bool __blk_end_request(struct request *rq, int error,
 			      unsigned int nr_bytes);
 extern void __blk_end_request_all(struct request *rq, int error);
 extern bool __blk_end_request_cur(struct request *rq, int error);
+extern bool __blk_end_request_err(struct request *rq, int error);
 
 extern void blk_complete_request(struct request *);
 extern void __blk_complete_request(struct request *);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] scsi,block: update SCSI to handle mixed merge failures
  2009-07-03  8:48 ` Tejun Heo
@ 2009-07-03  8:48   ` Tejun Heo
  -1 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori
  Cc: Tejun Heo, Jens Axboe, Niel Lambrechts, James Bottomley

Update scsi_io_completion() such that it only fails requests till the
next error boundary and retry the leftover.  This enables block layer
to merge requests with different failfast settings and still behave
correctly on errors.  Allow merge of requests of different failfast
settings.

As SCSI is currently the only subsystem which follows failfast status,
there's no need to worry about other block drivers for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 block/blk-merge.c       |    6 ------
 block/elevator.c        |    8 --------
 drivers/scsi/scsi_lib.c |    6 ++++--
 3 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 7c9ca01..b0de857 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -380,12 +380,6 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 	if (blk_integrity_rq(req) != blk_integrity_rq(next))
 		return 0;
 
-	/* don't merge requests of different failfast settings */
-	if (blk_failfast_dev(req)	!= blk_failfast_dev(next)	||
-	    blk_failfast_transport(req)	!= blk_failfast_transport(next)	||
-	    blk_failfast_driver(req)	!= blk_failfast_driver(next))
-		return 0;
-
 	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
diff --git a/block/elevator.c b/block/elevator.c
index 6f23753..ca86192 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -100,14 +100,6 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (bio_integrity(bio) != blk_integrity_rq(rq))
 		return 0;
 
-	/*
-	 * Don't merge if failfast settings don't match
-	 */
-	if (bio_failfast_dev(bio)	!= blk_failfast_dev(rq)		||
-	    bio_failfast_transport(bio)	!= blk_failfast_transport(rq)	||
-	    bio_failfast_driver(bio)	!= blk_failfast_driver(rq))
-		return 0;
-
 	if (!elv_iosched_allow_merge(rq, bio))
 		return 0;
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f3c4089..90c94da 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -897,8 +897,10 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 			if (driver_byte(result) & DRIVER_SENSE)
 				scsi_print_sense("", cmd);
 		}
-		blk_end_request_all(req, -EIO);
-		scsi_next_command(cmd);
+		if (blk_end_request_err(req, -EIO))
+			scsi_requeue_command(q, cmd);
+		else
+			scsi_next_command(cmd);
 		break;
 	case ACTION_REPREP:
 		/* Unprep the request and put it back at the head of the queue.
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] scsi,block: update SCSI to handle mixed merge failures
@ 2009-07-03  8:48   ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-03  8:48 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel, linux-scsi
  Cc: Tejun Heo, Jens Axboe, Niel Lambrechts, James Bottomley

Update scsi_io_completion() such that it only fails requests till the
next error boundary and retry the leftover.  This enables block layer
to merge requests with different failfast settings and still behave
correctly on errors.  Allow merge of requests of different failfast
settings.

As SCSI is currently the only subsystem which follows failfast status,
there's no need to worry about other block drivers for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 block/blk-merge.c       |    6 ------
 block/elevator.c        |    8 --------
 drivers/scsi/scsi_lib.c |    6 ++++--
 3 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 7c9ca01..b0de857 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -380,12 +380,6 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 	if (blk_integrity_rq(req) != blk_integrity_rq(next))
 		return 0;
 
-	/* don't merge requests of different failfast settings */
-	if (blk_failfast_dev(req)	!= blk_failfast_dev(next)	||
-	    blk_failfast_transport(req)	!= blk_failfast_transport(next)	||
-	    blk_failfast_driver(req)	!= blk_failfast_driver(next))
-		return 0;
-
 	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
diff --git a/block/elevator.c b/block/elevator.c
index 6f23753..ca86192 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -100,14 +100,6 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (bio_integrity(bio) != blk_integrity_rq(rq))
 		return 0;
 
-	/*
-	 * Don't merge if failfast settings don't match
-	 */
-	if (bio_failfast_dev(bio)	!= blk_failfast_dev(rq)		||
-	    bio_failfast_transport(bio)	!= blk_failfast_transport(rq)	||
-	    bio_failfast_driver(bio)	!= blk_failfast_driver(rq))
-		return 0;
-
 	if (!elv_iosched_allow_merge(rq, bio))
 		return 0;
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f3c4089..90c94da 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -897,8 +897,10 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 			if (driver_byte(result) & DRIVER_SENSE)
 				scsi_print_sense("", cmd);
 		}
-		blk_end_request_all(req, -EIO);
-		scsi_next_command(cmd);
+		if (blk_end_request_err(req, -EIO))
+			scsi_requeue_command(q, cmd);
+		else
+			scsi_next_command(cmd);
 		break;
 	case ACTION_REPREP:
 		/* Unprep the request and put it back at the head of the queue.
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCHSET] block: fix merge of requests with different failfast settings
  2009-07-03  8:48 ` Tejun Heo
                   ` (4 preceding siblings ...)
  (?)
@ 2009-07-03 10:54 ` Jens Axboe
  -1 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2009-07-03 10:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linux Kernel, James Bottomley, linux-scsi, Niel Lambrechts,
	FUJITA Tomonori

On Fri, Jul 03 2009, Tejun Heo wrote:
> Hello,
> 
> Block layer didn't consider failfast status while merging requests and
> it led to premature failure of normal (non-failfast) IOs.  Niel
> Lambrechts could trigger the problem semi-reliably on ext4 when
> resuming from STR.  ext4 uses readahead when reading inodes and
> combined with the deterministic extra SATA PHY exception cycle during
> resume on the specific configuration, non-readahead inode read would
> fail causing ext4 errors.  Please read the following thread for
> details.
> 
>   http://lkml.org/lkml/2009/5/23/21
> 
> This patchset contains the following four patches to fix the problem.
> 
>  0001-block-don-t-merge-requests-of-different-failfast-se.patch
>  0002-block-use-the-same-failfast-bits-for-bio-and-reques.patch
>  0003-block-implement-mixed-merge-of-different-failfast-r.patch
>  0004-scsi-block-update-SCSI-to-handle-mixed-merge-failur.patch
> 
> 0001 disallows merge between requests with different failfast
> settings.  This one is the quick fix and should go into 2.6.31 and
> later to -stable as the bug is pretty serious and may lead to data
> loss.
> 
> 0002 preps for later changes.
> 
> 0003-0004 implements and applies mixed merge.  Requests of different
> failfast settings are merged as before but failure handling is updated
> such that parts which shouldn't fail without retrial are properly
> retried.
> 
> I spent quite some time thinking about and testing it but I'd really
> like more pairs of eyes on this patchset as dangerous bugs can go
> unnoticed for quite a while in this area (anyone knows when the
> failfast bug was introduced?).

It must have been several releases ago. So while the bug is indeed very
nasty, I don't think there's been much fallout from it.

> Jens, I think the best way to merge this is to first push 0001 to
> Linus's tree and then pull it into for-next and then apply the rest on
> top of them.

Yeah I'll do that, #1 for 2.6.31 and the rest for .32. Thanks for
finding and fixing this bug!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-03  8:48   ` Tejun Heo
  (?)
@ 2009-07-05  9:27   ` Boaz Harrosh
  2009-07-09  0:45     ` Tejun Heo
  -1 siblings, 1 reply; 24+ messages in thread
From: Boaz Harrosh @ 2009-07-05  9:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori, Jens Axboe

On 07/03/2009 11:48 AM, Tejun Heo wrote:
> bio and request use the same set of failfast bits.  This patch makes
> the following changes to simplify things.
> 
> * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
>   bits coincide with __REQ_FAILFAST_* bits.
> 
> * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
>   but the matching is useless anyway.  init_request_from_bio() is
>   responsible for setting FAILFAST bits on FS requests and non-FS
>   requests never use BIO_RW_AHEAD.  Drop the code and comment from
>   blk_rq_bio_prep().
> 
> * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
>   simplify FAILFAST flags handling in init_request_from_bio().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Jens Axboe <axboe@kernel.dk>

Hi Tejun.

Thanks for doing this, it has been neglected for a long time.
However, it will happen again, I don't like these implicit matches
which are not enforced, They get to drift away. There are several ways
to make sure two sets of enums stay in sync. (I'll have a try at it
tomorrow. if you want). 

> ---
>  block/blk-core.c       |   19 +++++++------------
>  include/linux/bio.h    |   43 +++++++++++++++++++++++--------------------
>  include/linux/blkdev.h |    4 ++++
>  3 files changed, 34 insertions(+), 32 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 8f4b9e0..cd3b265 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1119,17 +1119,13 @@ void init_request_from_bio(struct request *req, struct bio *bio)
>  	req->cmd_type = REQ_TYPE_FS;
>  
>  	/*
> -	 * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
> +	 * Inherit FAILFAST from bio (for read-ahead, and explicit
> +	 * FAILFAST).  FAILFAST flags are identical for req and bio.
>  	 */
>  	if (bio_rw_ahead(bio))
> -		req->cmd_flags |= (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
> -				   REQ_FAILFAST_DRIVER);
> -	if (bio_failfast_dev(bio))
> -		req->cmd_flags |= REQ_FAILFAST_DEV;
> -	if (bio_failfast_transport(bio))
> -		req->cmd_flags |= REQ_FAILFAST_TRANSPORT;
> -	if (bio_failfast_driver(bio))
> -		req->cmd_flags |= REQ_FAILFAST_DRIVER;
> +		req->cmd_flags |= REQ_FAILFAST_MASK;
> +	else
> +		req->cmd_flags |= bio->bi_rw & REQ_FAILFAST_MASK;
>  
>  	if (unlikely(bio_discard(bio))) {
>  		req->cmd_flags |= REQ_DISCARD;
> @@ -2247,9 +2243,8 @@ EXPORT_SYMBOL_GPL(__blk_end_request_cur);
>  void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
>  		     struct bio *bio)
>  {
> -	/* Bit 0 (R/W) is identical in rq->cmd_flags and bio->bi_rw, and
> -	   we want BIO_RW_AHEAD (bit 1) to imply REQ_FAILFAST (bit 1). */
> -	rq->cmd_flags |= (bio->bi_rw & 3);
> +	/* Bit 0 (R/W) is identical in rq->cmd_flags and bio->bi_rw */
> +	rq->cmd_flags |= bio->bi_rw & REQ_RW;
>  
>  	if (bio_has_data(bio)) {
>  		rq->nr_phys_segments = bio_phys_segments(q, bio);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 2892b71..a299ed3 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -142,37 +142,40 @@ struct bio {
>   *
>   * bit 0 -- data direction
>   *	If not set, bio is a read from device. If set, it's a write to device.
> - * bit 1 -- rw-ahead when set
> - * bit 2 -- barrier
> + * bit 1 -- fail fast device errors
> + * bit 2 -- fail fast transport errors
> + * bit 3 -- fail fast driver errors
> + * bit 4 -- rw-ahead when set
> + * bit 5 -- barrier

Please kill all these evil bit 1, bit 2 ,bit n comments. The ways we
invent to torture ourselfs...

Just move all the comments to the enums declarations below. And be done
with it, also for the next time.

>   *	Insert a serialization point in the IO queue, forcing previously
>   *	submitted IO to be completed before this one is issued.
> - * bit 3 -- synchronous I/O hint.
> - * bit 4 -- Unplug the device immediately after submitting this bio.
> - * bit 5 -- metadata request
> + * bit 6 -- synchronous I/O hint.
> + * bit 7 -- Unplug the device immediately after submitting this bio.
> + * bit 8 -- metadata request
>   *	Used for tracing to differentiate metadata and data IO. May also
>   *	get some preferential treatment in the IO scheduler
> - * bit 6 -- discard sectors
> + * bit 9 -- discard sectors
>   *	Informs the lower level device that this range of sectors is no longer
>   *	used by the file system and may thus be freed by the device. Used
>   *	for flash based storage.
> - * bit 7 -- fail fast device errors
> - * bit 8 -- fail fast transport errors
> - * bit 9 -- fail fast driver errors
>   *	Don't want driver retries for any fast fail whatever the reason.
>   * bit 10 -- Tell the IO scheduler not to wait for more requests after this
>  	one has been submitted, even if it is a SYNC request.
>   */
> -#define BIO_RW		0	/* Must match RW in req flags (blkdev.h) */
> -#define BIO_RW_AHEAD	1	/* Must match FAILFAST in req flags */
> -#define BIO_RW_BARRIER	2
> -#define BIO_RW_SYNCIO	3
> -#define BIO_RW_UNPLUG	4
> -#define BIO_RW_META	5
> -#define BIO_RW_DISCARD	6
> -#define BIO_RW_FAILFAST_DEV		7
> -#define BIO_RW_FAILFAST_TRANSPORT	8
> -#define BIO_RW_FAILFAST_DRIVER		9
> -#define BIO_RW_NOIDLE	10
> +enum bio_rw_flags {
> +	BIO_RW,
> +	BIO_RW_FAILFAST_DEV,
> +	BIO_RW_FAILFAST_TRANSPORT,
> +	BIO_RW_FAILFAST_DRIVER,
> +	/* above flags must match REQ_* */
> +	BIO_RW_AHEAD,
> +	BIO_RW_BARRIER,
> +	BIO_RW_SYNCIO,
> +	BIO_RW_UNPLUG,
> +	BIO_RW_META,
> +	BIO_RW_DISCARD,
> +	BIO_RW_NOIDLE,
> +};
>  
>  #define bio_rw_flagged(bio, flag)	((bio)->bi_rw & (1 << (flag)))
>  

I wish there was also an helper to set these bits. it gives me an heart attack
every time I need to:
	bio->bi_rw &= ~(1 << BIO_RW);

> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 49ae079..a0e5ce1 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -98,6 +98,7 @@ enum rq_flag_bits {
>  	__REQ_FAILFAST_DEV,	/* no driver retries of device errors */
>  	__REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
>  	__REQ_FAILFAST_DRIVER,	/* no driver retries of driver errors */
> +	/* above flags must match BIO_RW_* */
>  	__REQ_DISCARD,		/* request to discard sectors */
>  	__REQ_SORTED,		/* elevator knows about this request */
>  	__REQ_SOFTBARRIER,	/* may not be passed by ioscheduler */
> @@ -148,6 +149,9 @@ enum rq_flag_bits {
>  #define REQ_NOIDLE	(1 << __REQ_NOIDLE)
>  #define REQ_IO_STAT	(1 << __REQ_IO_STAT)
>  
> +#define REQ_FAILFAST_MASK	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | \
> +				 REQ_FAILFAST_DRIVER)
> +
>  #define BLK_MAX_CDB	16
>  
>  /*

Thanks
Boaz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] block: implement mixed merge of different failfast requests
  2009-07-03  8:48   ` Tejun Heo
  (?)
@ 2009-07-05  9:27   ` Boaz Harrosh
  2009-07-09  0:47     ` Tejun Heo
  -1 siblings, 1 reply; 24+ messages in thread
From: Boaz Harrosh @ 2009-07-05  9:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori, Jens Axboe

On 07/03/2009 11:48 AM, Tejun Heo wrote:
> Failfast has characteristics from other attributes.  When issuing,
> executing and successuflly completing requests, failfast doesn't make
> any difference.  It only affects how a request is handled on failure.
> Allowing requests with different failfast settings to be merged cause
> normal IOs to fail prematurely while not allowing has performance
> penalties as failfast is used for read aheads which are likely to be
> located near in-flight or to-be-issued normal IOs.
> 
> This patch introduces the concept of 'mixed merge'.  A request is a
> mixed merge if it is merge of segments which require different
> handling on failure.  Currently the only mixable attributes are
> failfast ones (or lack thereof).
> 
> When a bio with different failfast settings is added to an existing
> request or requests of different failfast settings are merged, the
> merged request is marked mixed.  Each bio carries failfast settings
> and the request always tracks failfast state of the first bio.  When
> the request fails, blk_rq_err_bytes() can be used to determine how
> many bytes can be safely failed without crossing into an area which
> requires further retrials.
> 
> This allows request merging regardless of failfast settings while
> keeping the failure handling correct.
> 
> This patch only implements mixed merge but doesn't enable it.  The
> next one will update SCSI to make use of mixed merge.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
> ---
>  block/blk-core.c       |   99 ++++++++++++++++++++++++++++++++++++++++++++++++
>  block/blk-merge.c      |   43 +++++++++++++++++++++
>  block/blk.h            |    1 +
>  include/linux/blkdev.h |   23 +++++++++--
>  4 files changed, 161 insertions(+), 5 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index cd3b265..0214837 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1165,6 +1165,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
>  	const unsigned short prio = bio_prio(bio);
>  	const int sync = bio_sync(bio);
>  	const int unplug = bio_unplug(bio);
> +	const unsigned int ff = bio->bi_rw & REQ_FAILFAST_MASK;

Perhaps a bio_fail_fast(bio)
and also an blk_failfast(rq).

Also blk_noretry_request() could see some love now

>  	int rw_flags;
>  
>  	if (bio_barrier(bio) && bio_has_data(bio) &&
> @@ -1194,6 +1195,9 @@ static int __make_request(struct request_queue *q, struct bio *bio)
>  
>  		trace_block_bio_backmerge(q, bio);
>  
> +		if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
> +			blk_rq_set_mixed_merge(req);
> +
>  		req->biotail->bi_next = bio;
>  		req->biotail = bio;
>  		req->__data_len += bytes;
> @@ -1213,6 +1217,12 @@ static int __make_request(struct request_queue *q, struct bio *bio)
>  
>  		trace_block_bio_frontmerge(q, bio);
>  
> +		if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) {
> +			blk_rq_set_mixed_merge(req);
> +			req->cmd_flags &= ~REQ_FAILFAST_MASK;
> +			req->cmd_flags |= ff;
> +		}
> +
>  		bio->bi_next = req->bio;
>  		req->bio = bio;
>  
> @@ -1657,6 +1667,50 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
>  }
>  EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
>  
> +/**
> + * blk_rq_err_bytes - determine number of bytes till the next failure boundary
> + * @rq: request to examine
> + *
> + * Description:
> + *     A request could be merge of IOs which require different failure
> + *     handling.  This function determines the number of bytes which
> + *     can be failed from the beginning of the request without
> + *     crossing into area which need to be retried further.
> + *
> + * Return:
> + *     The number of bytes to fail.
> + *
> + * Context:
> + *     queue_lock must be held.
> + */
> +unsigned int blk_rq_err_bytes(const struct request *rq)
> +{
> +	unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
> +	unsigned int bytes = 0;
> +	struct bio *bio;
> +
> +	if (!(rq->cmd_flags & REQ_MIXED_MERGE))
> +		return blk_rq_bytes(rq);
> +
> +	/*
> +	 * Currently the only 'mixing' which can happen is between
> +	 * different fastfail types.  We can safely fail portions
> +	 * which have all the failfast bits that the first one has -
> +	 * the ones which are at least as eager to fail as the first
> +	 * one.
> +	 */
> +	for (bio = rq->bio; bio; bio = bio->bi_next) {
> +		if ((bio->bi_rw & ff) != ff)
> +			break;
> +		bytes += bio->bi_size;
> +	}
> +
> +	/* this could lead to infinite loop */
> +	BUG_ON(blk_rq_bytes(rq) && !bytes);
> +	return bytes;
> +}
> +EXPORT_SYMBOL_GPL(blk_rq_err_bytes);
> +
>  static void blk_account_io_completion(struct request *req, unsigned int bytes)
>  {
>  	if (blk_do_io_stat(req)) {
> @@ -2003,6 +2057,12 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
>  	if (blk_fs_request(req) || blk_discard_rq(req))
>  		req->__sector += total_bytes >> 9;
>  
> +	/* mixed attributes always follow the first bio */
> +	if (req->cmd_flags & REQ_MIXED_MERGE) {
> +		req->cmd_flags &= ~REQ_FAILFAST_MASK;
> +		req->cmd_flags |= req->bio->bi_rw & REQ_FAILFAST_MASK;
> +	}
> +
>  	/*
>  	 * If total number of sectors is less than the first segment
>  	 * size, something has gone terribly wrong.
> @@ -2182,6 +2242,25 @@ bool blk_end_request_cur(struct request *rq, int error)
>  EXPORT_SYMBOL_GPL(blk_end_request_cur);
>  
>  /**
> + * blk_end_request_err - Finish a request till the next failure boundary.
> + * @rq: the request to finish till the next failure boundary for
> + * @error: must be negative errno
> + *
> + * Description:
> + *     Complete @rq till the next failure boundary.
> + *
> + * Return:
> + *     %false - we are done with this request
> + *     %true  - still buffers pending for this request
> + */
> +bool blk_end_request_err(struct request *rq, int error)
> +{
> +	WARN_ON(error >= 0);
> +	return blk_end_request(rq, error, blk_rq_err_bytes(rq));
> +}
> +EXPORT_SYMBOL_GPL(blk_end_request_err);
> +
> +/**
>   * __blk_end_request - Helper function for drivers to complete the request.
>   * @rq:       the request being processed
>   * @error:    %0 for success, < %0 for error
> @@ -2240,6 +2319,26 @@ bool __blk_end_request_cur(struct request *rq, int error)
>  }
>  EXPORT_SYMBOL_GPL(__blk_end_request_cur);
>  
> +/**
> + * __blk_end_request_err - Finish a request till the next failure boundary.
> + * @rq: the request to finish till the next failure boundary for
> + * @error: must be negative errno
> + *
> + * Description:
> + *     Complete @rq till the next failure boundary.  Must be called
> + *     with queue lock held.
> + *
> + * Return:
> + *     %false - we are done with this request
> + *     %true  - still buffers pending for this request
> + */
> +bool __blk_end_request_err(struct request *rq, int error)
> +{
> +	WARN_ON(error >= 0);
> +	return __blk_end_request(rq, error, blk_rq_err_bytes(rq));
> +}
> +EXPORT_SYMBOL_GPL(__blk_end_request_err);
> +
>  void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
>  		     struct bio *bio)
>  {
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index e199967..7c9ca01 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -311,6 +311,36 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  	return 1;
>  }
>  
> +/**
> + * blk_rq_set_mixed_merge - mark a request as mixed merge
> + * @rq: request to mark as mixed merge
> + *
> + * Description:
> + *     @rq is about to be mixed merged.  Make sure the attributes
> + *     which can be mixed are set in each bio and mark @rq as mixed
> + *     merged.
> + */
> +void blk_rq_set_mixed_merge(struct request *rq)
> +{
> +	unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
> +	struct bio *bio;
> +
> +	if (rq->cmd_flags & REQ_MIXED_MERGE)
> +		return;
> +
> +	/*
> +	 * @rq will no longer represent mixable attributes for all the
> +	 * contained bios.  It will just track those of the first one.
> +	 * Distributes the attributs to each bio.
> +	 */
> +	for (bio = rq->bio; bio; bio = bio->bi_next) {
> +		WARN_ON_ONCE((bio->bi_rw & REQ_FAILFAST_MASK) &&
> +			     (bio->bi_rw & REQ_FAILFAST_MASK) != ff);
> +		bio->bi_rw |= ff;
> +	}
> +	rq->cmd_flags |= REQ_MIXED_MERGE;
> +}
> +
>  static void blk_account_io_merge(struct request *req)
>  {
>  	if (blk_do_io_stat(req)) {
> @@ -366,6 +396,19 @@ static int attempt_merge(struct request_queue *q, struct request *req,
>  		return 0;
>  
>  	/*
> +	 * If failfast settings disagree or any of the two is already
> +	 * a mixed merge, mark both as mixed before proceeding.  This
> +	 * makes sure that all involved bios have mixable attributes
> +	 * set properly.
> +	 */
> +	if ((req->cmd_flags | next->cmd_flags) & REQ_MIXED_MERGE ||
> +	    (req->cmd_flags & REQ_FAILFAST_MASK) !=
> +	    (next->cmd_flags & REQ_FAILFAST_MASK)) {
> +		blk_rq_set_mixed_merge(req);
> +		blk_rq_set_mixed_merge(next);
> +	}
> +
> +	/*
>  	 * At this point we have either done a back merge
>  	 * or front merge. We need the smaller start_time of
>  	 * the merged requests to be the current request
> diff --git a/block/blk.h b/block/blk.h
> index 3fae6ad..5ee3d7e 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -104,6 +104,7 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
>  int attempt_back_merge(struct request_queue *q, struct request *rq);
>  int attempt_front_merge(struct request_queue *q, struct request *rq);
>  void blk_recalc_rq_segments(struct request *rq);
> +void blk_rq_set_mixed_merge(struct request *rq);
>  
>  void blk_queue_congestion_threshold(struct request_queue *q);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index a0e5ce1..e58079f 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -120,6 +120,7 @@ enum rq_flag_bits {
>  	__REQ_INTEGRITY,	/* integrity metadata has been remapped */
>  	__REQ_NOIDLE,		/* Don't anticipate more IO after this one */
>  	__REQ_IO_STAT,		/* account I/O stat */
> +	__REQ_MIXED_MERGE,	/* merge of different types, fail separately */
>  	__REQ_NR_BITS,		/* stops here */
>  };
>  
> @@ -148,6 +149,7 @@ enum rq_flag_bits {
>  #define REQ_INTEGRITY	(1 << __REQ_INTEGRITY)
>  #define REQ_NOIDLE	(1 << __REQ_NOIDLE)
>  #define REQ_IO_STAT	(1 << __REQ_IO_STAT)
> +#define REQ_MIXED_MERGE	(1 << __REQ_MIXED_MERGE)
>  
>  #define REQ_FAILFAST_MASK	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | \
>  				 REQ_FAILFAST_DRIVER)
> @@ -836,11 +838,13 @@ static inline void blk_run_address_space(struct address_space *mapping)
>  }
>  
>  /*
> - * blk_rq_pos()		: the current sector
> - * blk_rq_bytes()	: bytes left in the entire request
> - * blk_rq_cur_bytes()	: bytes left in the current segment
> - * blk_rq_sectors()	: sectors left in the entire request
> - * blk_rq_cur_sectors()	: sectors left in the current segment
> + * blk_rq_pos()			: the current sector
> + * blk_rq_bytes()		: bytes left in the entire request
> + * blk_rq_cur_bytes()		: bytes left in the current segment
> + * blk_rq_err_bytes()		: bytes left till the next error boundary
> + * blk_rq_sectors()		: sectors left in the entire request
> + * blk_rq_cur_sectors()		: sectors left in the current segment
> + * blk_rq_err_sectors()		: sectors left till the next error boundary
>   */
>  static inline sector_t blk_rq_pos(const struct request *rq)
>  {
> @@ -857,6 +861,8 @@ static inline int blk_rq_cur_bytes(const struct request *rq)
>  	return rq->bio ? bio_cur_bytes(rq->bio) : 0;
>  }
>  
> +extern unsigned int blk_rq_err_bytes(const struct request *rq);
> +
>  static inline unsigned int blk_rq_sectors(const struct request *rq)
>  {
>  	return blk_rq_bytes(rq) >> 9;
> @@ -867,6 +873,11 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
>  	return blk_rq_cur_bytes(rq) >> 9;
>  }
>  
> +static inline unsigned int blk_rq_err_sectors(const struct request *rq)
> +{
> +	return blk_rq_err_bytes(rq) >> 9;
> +}
> +
>  /*
>   * Request issue related functions.
>   */
> @@ -893,10 +904,12 @@ extern bool blk_end_request(struct request *rq, int error,
>  			    unsigned int nr_bytes);
>  extern void blk_end_request_all(struct request *rq, int error);
>  extern bool blk_end_request_cur(struct request *rq, int error);
> +extern bool blk_end_request_err(struct request *rq, int error);
>  extern bool __blk_end_request(struct request *rq, int error,
>  			      unsigned int nr_bytes);
>  extern void __blk_end_request_all(struct request *rq, int error);
>  extern bool __blk_end_request_cur(struct request *rq, int error);
> +extern bool __blk_end_request_err(struct request *rq, int error);
>  
>  extern void blk_complete_request(struct request *);
>  extern void __blk_complete_request(struct request *);

Boaz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-05  9:27   ` Boaz Harrosh
@ 2009-07-09  0:45     ` Tejun Heo
  2009-07-09  9:12       ` Boaz Harrosh
  2009-07-09 13:37       ` Christoph Hellwig
  0 siblings, 2 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-09  0:45 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori, Jens Axboe

Hello, Boaz.

Boaz Harrosh wrote:
> Thanks for doing this, it has been neglected for a long time.
> However, it will happen again, I don't like these implicit matches
> which are not enforced, They get to drift away. There are several ways
> to make sure two sets of enums stay in sync. (I'll have a try at it
> tomorrow. if you want). 

They don't share the exact same set of bits, so it's a bit blurry but
yeah it would be better if the bits are defined in more systematic
way.

>> @@ -142,37 +142,40 @@ struct bio {
>>   *
>>   * bit 0 -- data direction
>>   *	If not set, bio is a read from device. If set, it's a write to device.
>> - * bit 1 -- rw-ahead when set
>> - * bit 2 -- barrier
>> + * bit 1 -- fail fast device errors
>> + * bit 2 -- fail fast transport errors
>> + * bit 3 -- fail fast driver errors
>> + * bit 4 -- rw-ahead when set
>> + * bit 5 -- barrier
> 
> Please kill all these evil bit 1, bit 2 ,bit n comments. The ways we
> invent to torture ourselfs...
> 
> Just move all the comments to the enums declarations below. And be done
> with it, also for the next time.

Heh... I agree too.  Unless ABI is fixed, this type of comments are
often painful.  Care to submit a patch.  This series is already in
block#for-next.

>>  #define bio_rw_flagged(bio, flag)	((bio)->bi_rw & (1 << (flag)))
>>  
> 
> I wish there was also an helper to set these bits. it gives me an heart attack
> every time I need to:
> 	bio->bi_rw &= ~(1 << BIO_RW);

What's more disturbing to me is the different between RQ and BIO
flags.  __REQ_* are bit positions, REQ_* are masks while BIO_* are bit
positions.  Sadly it seems it's already too late to change that.  I
personally an not a big fan of simple accessors or flags defined as
bit positions.  They seem to obscure things without much benefit.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] block: implement mixed merge of different failfast requests
  2009-07-05  9:27   ` Boaz Harrosh
@ 2009-07-09  0:47     ` Tejun Heo
  2009-07-09  9:17       ` Boaz Harrosh
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2009-07-09  0:47 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori, Jens Axboe

Hello,

Boaz Harrosh wrote:
>> @@ -1165,6 +1165,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
>>  	const unsigned short prio = bio_prio(bio);
>>  	const int sync = bio_sync(bio);
>>  	const int unplug = bio_unplug(bio);
>> +	const unsigned int ff = bio->bi_rw & REQ_FAILFAST_MASK;
> 
> Perhaps a bio_fail_fast(bio)
> and also an blk_failfast(rq).

Me not being a big fan of those simple accessors, I want to avoid
adding those especially the use of bio ones are mostly confined to
block layer proper.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-09  0:45     ` Tejun Heo
@ 2009-07-09  9:12       ` Boaz Harrosh
  2009-07-09 13:37       ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Boaz Harrosh @ 2009-07-09  9:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori, Jens Axboe

On 07/09/2009 03:45 AM, Tejun Heo wrote:
> Hello, Boaz.
> 
> Boaz Harrosh wrote:
>> Thanks for doing this, it has been neglected for a long time.
>> However, it will happen again, I don't like these implicit matches
>> which are not enforced, They get to drift away. There are several ways
>> to make sure two sets of enums stay in sync. (I'll have a try at it
>> tomorrow. if you want). 
> 
> They don't share the exact same set of bits, so it's a bit blurry but
> yeah it would be better if the bits are defined in more systematic
> way.
> 

I meant something simple like:

	__REQ_RW = BIO_RW,
	__REQ_FAILFAST_DEV = BIO_RW_FAILFAST_DEV,
	__REQ_FAILFAST_TRANSPORT = BIO_RW_FAILFAST_TRANSPORT,
	__REQ_FAILFAST_DRIVER = BIO_RW_FAILFAST_DRIVER,
	...

And a fat comment which you did

>>> @@ -142,37 +142,40 @@ struct bio {
>>>   *
>>>   * bit 0 -- data direction
>>>   *	If not set, bio is a read from device. If set, it's a write to device.
>>> - * bit 1 -- rw-ahead when set
>>> - * bit 2 -- barrier
>>> + * bit 1 -- fail fast device errors
>>> + * bit 2 -- fail fast transport errors
>>> + * bit 3 -- fail fast driver errors
>>> + * bit 4 -- rw-ahead when set
>>> + * bit 5 -- barrier
>> Please kill all these evil bit 1, bit 2 ,bit n comments. The ways we
>> invent to torture ourselfs...
>>
>> Just move all the comments to the enums declarations below. And be done
>> with it, also for the next time.
> 
> Heh... I agree too.  Unless ABI is fixed, this type of comments are
> often painful.  Care to submit a patch.  This series is already in
> block#for-next.
> 

It's becoming futile to comments on patches these days they get submitted
before and during any comments. ;-)

>>>  #define bio_rw_flagged(bio, flag)	((bio)->bi_rw & (1 << (flag)))
>>>  
>> I wish there was also an helper to set these bits. it gives me an heart attack
>> every time I need to:
>> 	bio->bi_rw &= ~(1 << BIO_RW);
> 
> What's more disturbing to me is the different between RQ and BIO
> flags.  __REQ_* are bit positions, REQ_* are masks while BIO_* are bit
> positions.  Sadly it seems it's already too late to change that.  I
> personally an not a big fan of simple accessors or flags defined as
> bit positions.  They seem to obscure things without much benefit.
> 

I think that everywhere we should use __set_bit() __clear_bit() and
test_bit() with enums defined as bit-positions. It is most clear readable
code wise, least error prone, and easiest to maintain.
Perhaps a new:
	test_bits(void *flag, unsigned bit1, ...);
for testing bunch of bits at once

Please note that with inlines and constant bits the generated code is
just as fast as bit-mask. Without slaving over double definitions.
(and accessors)

> Thanks.
> 

Boaz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] block: implement mixed merge of different failfast requests
  2009-07-09  0:47     ` Tejun Heo
@ 2009-07-09  9:17       ` Boaz Harrosh
  2009-07-15  9:41         ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Boaz Harrosh @ 2009-07-09  9:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori, Jens Axboe

On 07/09/2009 03:47 AM, Tejun Heo wrote:
> Hello,
> 
> Boaz Harrosh wrote:
>>> @@ -1165,6 +1165,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
>>>  	const unsigned short prio = bio_prio(bio);
>>>  	const int sync = bio_sync(bio);
>>>  	const int unplug = bio_unplug(bio);
>>> +	const unsigned int ff = bio->bi_rw & REQ_FAILFAST_MASK;
>> Perhaps a bio_fail_fast(bio)
>> and also an blk_failfast(rq).
> 
> Me not being a big fan of those simple accessors, I want to avoid
> adding those especially the use of bio ones are mostly confined to
> block layer proper.
> 

OK but at least take care of blk_noretry_request(), at the minimum
kill it, and use req->cmd_flags & REQ_FAILFAST_MASK everywhere.

> Thanks.
> 

Thanks
Boaz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-09  0:45     ` Tejun Heo
  2009-07-09  9:12       ` Boaz Harrosh
@ 2009-07-09 13:37       ` Christoph Hellwig
  2009-07-09 17:20         ` Jeff Garzik
  2009-07-10 13:18         ` Tejun Heo
  1 sibling, 2 replies; 24+ messages in thread
From: Christoph Hellwig @ 2009-07-09 13:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Boaz Harrosh, Jens Axboe, Linux Kernel, James Bottomley,
	linux-scsi, Niel Lambrechts, FUJITA Tomonori, Jens Axboe

On Thu, Jul 09, 2009 at 09:45:24AM +0900, Tejun Heo wrote:
> What's more disturbing to me is the different between RQ and BIO
> flags.  __REQ_* are bit positions, REQ_* are masks while BIO_* are bit
> positions.  Sadly it seems it's already too late to change that.  I
> personally an not a big fan of simple accessors or flags defined as
> bit positions.  They seem to obscure things without much benefit.

flags as bit positions generally only make sense if you use
test/set/clear_bit, otherwise they just confuse things.  And the
accessors are pretty annoying, especially in the block layer.  Trying to
find the places where a BIO flag has an actual effect is pretty painful
due to the mix of the different flags and the accessors.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-09 13:37       ` Christoph Hellwig
@ 2009-07-09 17:20         ` Jeff Garzik
  2009-07-09 17:39           ` Jens Axboe
  2009-07-10 13:18         ` Tejun Heo
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff Garzik @ 2009-07-09 17:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tejun Heo, Boaz Harrosh, Jens Axboe, Linux Kernel,
	James Bottomley, linux-scsi, Niel Lambrechts, FUJITA Tomonori,
	Jens Axboe

Christoph Hellwig wrote:
> On Thu, Jul 09, 2009 at 09:45:24AM +0900, Tejun Heo wrote:
>> What's more disturbing to me is the different between RQ and BIO
>> flags.  __REQ_* are bit positions, REQ_* are masks while BIO_* are bit
>> positions.  Sadly it seems it's already too late to change that.  I
>> personally an not a big fan of simple accessors or flags defined as
>> bit positions.  They seem to obscure things without much benefit.
> 
> flags as bit positions generally only make sense if you use
> test/set/clear_bit, otherwise they just confuse things.  And the
> accessors are pretty annoying, especially in the block layer.  Trying to
> find the places where a BIO flag has an actual effect is pretty painful
> due to the mix of the different flags and the accessors.

Indeed -- the accessors mean in practice that you always have at least 
_two_ things to grep for, just to catch all accesses.  Block layer is 
pretty bad about that style of usage :/

	Jeff





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-09 17:20         ` Jeff Garzik
@ 2009-07-09 17:39           ` Jens Axboe
  0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2009-07-09 17:39 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Christoph Hellwig, Tejun Heo, Boaz Harrosh, Linux Kernel,
	James Bottomley, linux-scsi, Niel Lambrechts, FUJITA Tomonori

On Thu, Jul 09 2009, Jeff Garzik wrote:
> Christoph Hellwig wrote:
>> On Thu, Jul 09, 2009 at 09:45:24AM +0900, Tejun Heo wrote:
>>> What's more disturbing to me is the different between RQ and BIO
>>> flags.  __REQ_* are bit positions, REQ_* are masks while BIO_* are bit
>>> positions.  Sadly it seems it's already too late to change that.  I
>>> personally an not a big fan of simple accessors or flags defined as
>>> bit positions.  They seem to obscure things without much benefit.
>>
>> flags as bit positions generally only make sense if you use
>> test/set/clear_bit, otherwise they just confuse things.  And the
>> accessors are pretty annoying, especially in the block layer.  Trying to
>> find the places where a BIO flag has an actual effect is pretty painful
>> due to the mix of the different flags and the accessors.
>
> Indeed -- the accessors mean in practice that you always have at least  
> _two_ things to grep for, just to catch all accesses.  Block layer is  
> pretty bad about that style of usage :/

Yeah, I'm not too fond of it either, it does add to the confusion. I'll
sanitize it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-09 13:37       ` Christoph Hellwig
  2009-07-09 17:20         ` Jeff Garzik
@ 2009-07-10 13:18         ` Tejun Heo
  2009-07-12 12:06           ` Boaz Harrosh
  1 sibling, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2009-07-10 13:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Boaz Harrosh, Jens Axboe, Linux Kernel, James Bottomley,
	linux-scsi, Niel Lambrechts, FUJITA Tomonori, Jens Axboe

Hello, Christoph.

Christoph Hellwig wrote:
> On Thu, Jul 09, 2009 at 09:45:24AM +0900, Tejun Heo wrote:
>> What's more disturbing to me is the different between RQ and BIO
>> flags.  __REQ_* are bit positions, REQ_* are masks while BIO_* are bit
>> positions.  Sadly it seems it's already too late to change that.  I
>> personally an not a big fan of simple accessors or flags defined as
>> bit positions.  They seem to obscure things without much benefit.
> 
> flags as bit positions generally only make sense if you use
> test/set/clear_bit, otherwise they just confuse things.

Another shortcoming of bit position flags is masking / multi flag
operations.  It's just awful.  I think it's always better to define
flags as masks even when it's used with test/set/clear_bit().  If such
usages are common enough, we can easily add test/set/clear_bit_mask().
The conversion from mask to bit would be constant most of the time and
it's not like fls/ffs() are expensive.

> And the accessors are pretty annoying, especially in the block
> layer.  Trying to find the places where a BIO flag has an actual
> effect is pretty painful due to the mix of the different flags and
> the accessors.

Yeap, fully agreed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-10 13:18         ` Tejun Heo
@ 2009-07-12 12:06           ` Boaz Harrosh
  2009-07-15  9:27             ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Boaz Harrosh @ 2009-07-12 12:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel, James Bottomley,
	linux-scsi, Niel Lambrechts, FUJITA Tomonori, Jens Axboe

On 07/10/2009 04:18 PM, Tejun Heo wrote:
> Hello, Christoph.
> 
> Christoph Hellwig wrote:
>> On Thu, Jul 09, 2009 at 09:45:24AM +0900, Tejun Heo wrote:
>>> What's more disturbing to me is the different between RQ and BIO
>>> flags.  __REQ_* are bit positions, REQ_* are masks while BIO_* are bit
>>> positions.  Sadly it seems it's already too late to change that.  I
>>> personally an not a big fan of simple accessors or flags defined as
>>> bit positions.  They seem to obscure things without much benefit.
>> flags as bit positions generally only make sense if you use
>> test/set/clear_bit, otherwise they just confuse things.
> 

first please make a distinction between test/set/clear_bit and
test/__set/__clear_bit the former is not an option since it's not what
we need.

I too, do not like the lower-case accessors for upper-case bits like:
blk_failfast_dev() && blk_failfast_transport() which give nothing
and confuse the grepping of sets-vs-clears.

But I do like the use of __set/__clear_bit of flags. grepping is clear
and code semantics are more correct. Actually I prefer when a construct
like bio or request have two accessors set/clear_flags, which abstract
out not the bits but the flags member. Say when things evolve in the future
it is easer to adapted.

What can be more clear then rq_set_flags(req, QUEUE_FLAG_QUEUED) then
rq_clear_flags(req, QUEUE_FLAG_QUEUED) later.

> Another shortcoming of bit position flags is masking / multi flag
> operations.  It's just awful.  I think it's always better to define
> flags as masks even when it's used with test/set/clear_bit().  If such
> usages are common enough, we can easily add test/set/clear_bit_mask().
> The conversion from mask to bit would be constant most of the time and
> it's not like fls/ffs() are expensive.
> 

That's why I suggested the set/clear_flags() variable size macro
which can  set/clear multiple bit-flags at same cost of masks, only
that the compiler calculates the mask in compile time.

This can also be good for the greps above. .eg:
test_flags(&rq->cmd_flags, REQ_FAILFAST_DEV, REQ_FAILFAST_TRANSPORT, REQ_FAILFAST_DRIVER);

>> And the accessors are pretty annoying, especially in the block
>> layer.  Trying to find the places where a BIO flag has an actual
>> effect is pretty painful due to the mix of the different flags and
>> the accessors.
> 
> Yeap, fully agreed.
> 

As said, yes, the the lower-case accessors for upper-case bits does nothing,
but use __set/__clear/test is a different matter that can also replace the
sugary need of these.

> Thanks.
> 

Thanks
Boaz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] block: use the same failfast bits for bio and request
  2009-07-12 12:06           ` Boaz Harrosh
@ 2009-07-15  9:27             ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-15  9:27 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel, James Bottomley,
	linux-scsi, Niel Lambrechts, FUJITA Tomonori, Jens Axboe

Hello, Boaz.

Boaz Harrosh wrote:
>>> flags as bit positions generally only make sense if you use
>>> test/set/clear_bit, otherwise they just confuse things.
> 
> first please make a distinction between test/set/clear_bit and
> test/__set/__clear_bit the former is not an option since it's not what
> we need.

Block flags look like the way they do today because a while back they
were accessed with atomic bitops (the versions w/o the underbars).
Now that they're all inside spinlocks, it all became moot.

> What can be more clear then rq_set_flags(req, QUEUE_FLAG_QUEUED) then
> rq_clear_flags(req, QUEUE_FLAG_QUEUED) later.

req->cmd_flags |= QUEUE_FLAG_QUEUED / &= ~QUEUE_FLAG_QUEUED might not
be as clear but should be sufficient, I suppose.

> That's why I suggested the set/clear_flags() variable size macro
> which can  set/clear multiple bit-flags at same cost of masks, only
> that the compiler calculates the mask in compile time.
> 
> This can also be good for the greps above. .eg:
> test_flags(&rq->cmd_flags, REQ_FAILFAST_DEV, REQ_FAILFAST_TRANSPORT, REQ_FAILFAST_DRIVER);
...
> As said, yes, the the lower-case accessors for upper-case bits does nothing,
> but use __set/__clear/test is a different matter that can also replace the
> sugary need of these.

Heh.. I don't know.  What about things like flags & mask == mask2
test?  The vararg macros would work for most cases and I wouldn't be
violently against them if they were already in place but I don't see
much benefit of all those when people are already very accustomed to
using c bitops to handle flags.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] block: implement mixed merge of different failfast requests
  2009-07-09  9:17       ` Boaz Harrosh
@ 2009-07-15  9:41         ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2009-07-15  9:41 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jens Axboe, Linux Kernel, James Bottomley, linux-scsi,
	Niel Lambrechts, FUJITA Tomonori, Jens Axboe

Boaz Harrosh wrote:
> On 07/09/2009 03:47 AM, Tejun Heo wrote:
>> Hello,
>>
>> Boaz Harrosh wrote:
>>>> @@ -1165,6 +1165,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
>>>>  	const unsigned short prio = bio_prio(bio);
>>>>  	const int sync = bio_sync(bio);
>>>>  	const int unplug = bio_unplug(bio);
>>>> +	const unsigned int ff = bio->bi_rw & REQ_FAILFAST_MASK;
>>> Perhaps a bio_fail_fast(bio)
>>> and also an blk_failfast(rq).
>> Me not being a big fan of those simple accessors, I want to avoid
>> adding those especially the use of bio ones are mostly confined to
>> block layer proper.
>>
> 
> OK but at least take care of blk_noretry_request(), at the minimum
> kill it, and use req->cmd_flags & REQ_FAILFAST_MASK everywhere.

Hmmm... looking at it now, looks like we can kill them all actually.
There are a lot of inconsistencies in this space.

* BIO_RW_* are bit positions and accessed with liberal combination of
  bio_rw_flagged() and manual bitops.

* QUEUE_FLAG_* are bit positions and has rather large collection of
  accessors.  The accessors have locking asserts.

* __REQ_* are bit positions but have counter part REQ_* bit masks
  defined and there are whole bunch of simple specific accessors.

I think all these can just be converted to bit masks (and I'll be
happy with that) but whether that would be desirable is a different
matter.  I think it would be best to hear what Jens' plan about flag
cleanup is (who BTW is away till the end of the next week) before
proceeding.

Jens, can you please let us know what your plan is?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-07-15  9:42 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-03  8:48 [PATCHSET] block: fix merge of requests with different failfast settings Tejun Heo
2009-07-03  8:48 ` Tejun Heo
2009-07-03  8:48 ` [PATCH 1/4] block: don't merge requests of " Tejun Heo
2009-07-03  8:48   ` Tejun Heo
2009-07-03  8:48 ` [PATCH 2/4] block: use the same failfast bits for bio and request Tejun Heo
2009-07-03  8:48   ` Tejun Heo
2009-07-05  9:27   ` Boaz Harrosh
2009-07-09  0:45     ` Tejun Heo
2009-07-09  9:12       ` Boaz Harrosh
2009-07-09 13:37       ` Christoph Hellwig
2009-07-09 17:20         ` Jeff Garzik
2009-07-09 17:39           ` Jens Axboe
2009-07-10 13:18         ` Tejun Heo
2009-07-12 12:06           ` Boaz Harrosh
2009-07-15  9:27             ` Tejun Heo
2009-07-03  8:48 ` [PATCH 3/4] block: implement mixed merge of different failfast requests Tejun Heo
2009-07-03  8:48   ` Tejun Heo
2009-07-05  9:27   ` Boaz Harrosh
2009-07-09  0:47     ` Tejun Heo
2009-07-09  9:17       ` Boaz Harrosh
2009-07-15  9:41         ` Tejun Heo
2009-07-03  8:48 ` [PATCH 4/4] scsi,block: update SCSI to handle mixed merge failures Tejun Heo
2009-07-03  8:48   ` Tejun Heo
2009-07-03 10:54 ` [PATCHSET] block: fix merge of requests with different failfast settings Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.