All of lore.kernel.org
 help / color / mirror / Atom feed
* bvec_iter rewind API
@ 2018-08-28 23:33 Kent Overstreet
  2018-08-29 14:56 ` Jens Axboe
  2018-08-30 11:24 ` Ming Lei
  0 siblings, 2 replies; 4+ messages in thread
From: Kent Overstreet @ 2018-08-28 23:33 UTC (permalink / raw)
  To: Dmitry Monakhov
  Cc: Jens Axboe, Christoph Hellwig, Hannes Reinecke, linux-block

I just came across bi_done and your patch that added it -
f9df1cd99e bio: add bvec_iter rewind API

I invite you to think through what happens to a bio that gets split by something
further down the stack, and then rewound after completion:

To create the first half of the split, you just truncate the iterator by setting
bio->bi_iter.bi_size to be the length you want. So if the original bio becomes
the first half of the split, we've lost the original value for bi_size. After
completion, bio_rewind_iter will rewind the bio to the start position the
original code expects, but the bio will _not_ have the same size as it did when
that code submitted it.

So if bio_rewind_iter() is being used because we want to restore the bio to what
it was when we submitted it, this is bogus.

To create the second half of the split, we just advance the bio's iterator up to
where we want the split to start. So this shouldn't prevent us from restoring
the bio to the state it was in when it was created by rewinding it.

Except if you look at bio_split(), it sets bi_done to 0, and git blame reveals
an interesting commit message...

    block: reset bi_iter.bi_done after splitting bio
    
    After the bio has been updated to represent the remaining sectors, reset
    bi_done so bio_rewind_iter() does not rewind further than it should.
    
    This resolves a bio_integrity_process() failure on reads where the
    original request was split.

*sigh*

The original code was bogus, and so was the fix. If a bio is split _AFTER_ a
chunk of code in the stack sees it - and wants to restore it to the state it was
it when it saw it after it completes - then not touching bi_done gives us the
result we want. But if the bio was split _BEFORE_ that chunk of code sees it,
rewinding it without ever touching bi_done will rewind it to be _LARGER_ than
the bio that chunk of code saw.

So what should bi_done be? There is no correct answer - fundamentally, bi_done
_cannot_ do what you're trying to do with it. Advancing or truncating an
iterator destroys information, and stacked layers can manipulate bio iterators
in arbitrary ways.

The correct way, the ONLY way, to restore a bio's iterator to a previous state
after it's been submitted and other code has messed with it, is to save the
entire iterator state: before you submit the bio, you save a copy of
bio->bi_iter in your driver's private state, and then you restore it after
completion.

I know you want to "save memory" and avoid the hassle of having to allocate
state for your layer of the stack, but fundamentally what you're trying to do
_does not work_. You have to save the iterator, you cannot play games and assume
you can restore it.

Jens, can you please make sure I get CC'd on patches that touch bio/bvec
iterators in the future? This is not the first time I've seen people make this
mistake...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bvec_iter rewind API
  2018-08-28 23:33 bvec_iter rewind API Kent Overstreet
@ 2018-08-29 14:56 ` Jens Axboe
  2018-08-30 11:24 ` Ming Lei
  1 sibling, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2018-08-29 14:56 UTC (permalink / raw)
  To: Kent Overstreet, Dmitry Monakhov
  Cc: Christoph Hellwig, Hannes Reinecke, linux-block

On 8/28/18 5:33 PM, Kent Overstreet wrote:
> I just came across bi_done and your patch that added it -
> f9df1cd99e bio: add bvec_iter rewind API
> 
> I invite you to think through what happens to a bio that gets split by something
> further down the stack, and then rewound after completion:
> 
> To create the first half of the split, you just truncate the iterator by setting
> bio->bi_iter.bi_size to be the length you want. So if the original bio becomes
> the first half of the split, we've lost the original value for bi_size. After
> completion, bio_rewind_iter will rewind the bio to the start position the
> original code expects, but the bio will _not_ have the same size as it did when
> that code submitted it.
> 
> So if bio_rewind_iter() is being used because we want to restore the bio to what
> it was when we submitted it, this is bogus.
> 
> To create the second half of the split, we just advance the bio's iterator up to
> where we want the split to start. So this shouldn't prevent us from restoring
> the bio to the state it was in when it was created by rewinding it.
> 
> Except if you look at bio_split(), it sets bi_done to 0, and git blame reveals
> an interesting commit message...
> 
>     block: reset bi_iter.bi_done after splitting bio
>     
>     After the bio has been updated to represent the remaining sectors, reset
>     bi_done so bio_rewind_iter() does not rewind further than it should.
>     
>     This resolves a bio_integrity_process() failure on reads where the
>     original request was split.
> 
> *sigh*
> 
> The original code was bogus, and so was the fix. If a bio is split _AFTER_ a
> chunk of code in the stack sees it - and wants to restore it to the state it was
> it when it saw it after it completes - then not touching bi_done gives us the
> result we want. But if the bio was split _BEFORE_ that chunk of code sees it,
> rewinding it without ever touching bi_done will rewind it to be _LARGER_ than
> the bio that chunk of code saw.
> 
> So what should bi_done be? There is no correct answer - fundamentally, bi_done
> _cannot_ do what you're trying to do with it. Advancing or truncating an
> iterator destroys information, and stacked layers can manipulate bio iterators
> in arbitrary ways.
> 
> The correct way, the ONLY way, to restore a bio's iterator to a previous state
> after it's been submitted and other code has messed with it, is to save the
> entire iterator state: before you submit the bio, you save a copy of
> bio->bi_iter in your driver's private state, and then you restore it after
> completion.
> 
> I know you want to "save memory" and avoid the hassle of having to allocate
> state for your layer of the stack, but fundamentally what you're trying to do
> _does not work_. You have to save the iterator, you cannot play games and assume
> you can restore it.
> 
> Jens, can you please make sure I get CC'd on patches that touch bio/bvec
> iterators in the future? This is not the first time I've seen people make this
> mistake...

Yeah, will do.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bvec_iter rewind API
  2018-08-28 23:33 bvec_iter rewind API Kent Overstreet
  2018-08-29 14:56 ` Jens Axboe
@ 2018-08-30 11:24 ` Ming Lei
  2018-09-04 18:46   ` Christoph Hellwig
  1 sibling, 1 reply; 4+ messages in thread
From: Ming Lei @ 2018-08-30 11:24 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Dmitry Monakhov, Jens Axboe, Christoph Hellwig, Hannes Reinecke,
	linux-block

On Tue, Aug 28, 2018 at 07:33:25PM -0400, Kent Overstreet wrote:
> I just came across bi_done and your patch that added it -
> f9df1cd99e bio: add bvec_iter rewind API
> 
> I invite you to think through what happens to a bio that gets split by something
> further down the stack, and then rewound after completion:
> 
> To create the first half of the split, you just truncate the iterator by setting
> bio->bi_iter.bi_size to be the length you want. So if the original bio becomes
> the first half of the split, we've lost the original value for bi_size. After
> completion, bio_rewind_iter will rewind the bio to the start position the
> original code expects, but the bio will _not_ have the same size as it did when
> that code submitted it.
> 
> So if bio_rewind_iter() is being used because we want to restore the bio to what
> it was when we submitted it, this is bogus.
> 
> To create the second half of the split, we just advance the bio's iterator up to
> where we want the split to start. So this shouldn't prevent us from restoring
> the bio to the state it was in when it was created by rewinding it.
> 
> Except if you look at bio_split(), it sets bi_done to 0, and git blame reveals
> an interesting commit message...
> 
>     block: reset bi_iter.bi_done after splitting bio
>     
>     After the bio has been updated to represent the remaining sectors, reset
>     bi_done so bio_rewind_iter() does not rewind further than it should.
>     
>     This resolves a bio_integrity_process() failure on reads where the
>     original request was split.
> 
> *sigh*
> 
> The original code was bogus, and so was the fix. If a bio is split _AFTER_ a
> chunk of code in the stack sees it - and wants to restore it to the state it was
> it when it saw it after it completes - then not touching bi_done gives us the
> result we want. But if the bio was split _BEFORE_ that chunk of code sees it,
> rewinding it without ever touching bi_done will rewind it to be _LARGER_ than
> the bio that chunk of code saw.
> 
> So what should bi_done be? There is no correct answer - fundamentally, bi_done
> _cannot_ do what you're trying to do with it. Advancing or truncating an
> iterator destroys information, and stacked layers can manipulate bio iterators
> in arbitrary ways.
> 
> The correct way, the ONLY way, to restore a bio's iterator to a previous state
> after it's been submitted and other code has messed with it, is to save the
> entire iterator state: before you submit the bio, you save a copy of
> bio->bi_iter in your driver's private state, and then you restore it after
> completion.

I agree, and it isn't good to introduce extra update of .bi_done in fast path
too, and it just makes things complicated in concept.

Maybe we can do the following way, and looks it works in scsi_debug/dix
test.

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 67b5fb861a51..839c332069a9 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -306,6 +306,8 @@ bool bio_integrity_prep(struct bio *bio)
 	if (bio_data_dir(bio) == WRITE) {
 		bio_integrity_process(bio, &bio->bi_iter,
 				      bi->profile->generate_fn);
+	} else {
+		bip->bio_iter = bio->bi_iter;
 	}
 	return true;
 
@@ -331,19 +333,13 @@ static void bio_integrity_verify_fn(struct work_struct *work)
 		container_of(work, struct bio_integrity_payload, bip_work);
 	struct bio *bio = bip->bip_bio;
 	struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
-	struct bvec_iter iter = bio->bi_iter;
 
 	/*
 	 * At the moment verify is called bio's iterator was advanced
 	 * during split and completion, we need to rewind iterator to
 	 * it's original position.
 	 */
-	if (bio_rewind_iter(bio, &iter, iter.bi_done)) {
-		bio->bi_status = bio_integrity_process(bio, &iter,
-						       bi->profile->verify_fn);
-	} else {
-		bio->bi_status = BLK_STS_IOERR;
-	}
+	bio->bi_status = bio_integrity_process(bio, &bip->bio_iter, bi->profile->verify_fn);
 
 	bio_integrity_free(bio);
 	bio_endio(bio);
diff --git a/block/bio.c b/block/bio.c
index b12966e415d3..529c1a65b1e8 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -283,6 +283,8 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 	atomic_set(&bio->__bi_remaining, 1);
 	atomic_set(&bio->__bi_cnt, 1);
 
+	WARN_ON(max_vecs > UIO_MAXIOV);
+
 	bio->bi_io_vec = table;
 	bio->bi_max_vecs = max_vecs;
 }
@@ -1807,7 +1809,6 @@ struct bio *bio_split(struct bio *bio, int sectors,
 		bio_integrity_trim(split);
 
 	bio_advance(bio, split->bi_iter.bi_size);
-	bio->bi_iter.bi_done = 0;
 
 	if (bio_flagged(bio, BIO_TRACE_COMPLETION))
 		bio_set_flag(split, BIO_TRACE_COMPLETION);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 51371740d2a8..14b4fa266357 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -170,27 +170,11 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
 {
 	iter->bi_sector += bytes >> 9;
 
-	if (bio_no_advance_iter(bio)) {
+	if (bio_no_advance_iter(bio))
 		iter->bi_size -= bytes;
-		iter->bi_done += bytes;
-	} else {
+	else
 		bvec_iter_advance(bio->bi_io_vec, iter, bytes);
 		/* TODO: It is reasonable to complete bio with error here. */
-	}
-}
-
-static inline bool bio_rewind_iter(struct bio *bio, struct bvec_iter *iter,
-		unsigned int bytes)
-{
-	iter->bi_sector -= bytes >> 9;
-
-	if (bio_no_advance_iter(bio)) {
-		iter->bi_size += bytes;
-		iter->bi_done -= bytes;
-		return true;
-	}
-
-	return bvec_iter_rewind(bio->bi_io_vec, iter, bytes);
 }
 
 #define __bio_for_each_segment(bvl, bio, iter, start)			\
@@ -353,6 +337,8 @@ struct bio_integrity_payload {
 	unsigned short		bip_max_vcnt;	/* integrity bio_vec slots */
 	unsigned short		bip_flags;	/* control flags */
 
+	struct bvec_iter	bio_iter;	/* for rewinding parent bio */
+
 	struct work_struct	bip_work;	/* I/O completion */
 
 	struct bio_vec		*bip_vec;
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index fe7a22dd133b..5ebfd0500fe7 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -38,11 +38,8 @@ struct bvec_iter {
 						   sectors */
 	unsigned int		bi_size;	/* residual I/O count */
 
-	unsigned int		bi_idx;		/* current index into bvl_vec */
-
-	unsigned int            bi_done;	/* number of bytes completed */
-
-	unsigned int            bi_bvec_done;	/* number of bytes completed in
+	unsigned int		bi_idx:10;	/* current index into bvl_vec */
+	unsigned int            bi_bvec_done:22;	/* number of bytes completed in
 						   current bvec */
 };
 
@@ -85,7 +82,6 @@ static inline bool bvec_iter_advance(const struct bio_vec *bv,
 		bytes -= len;
 		iter->bi_size -= len;
 		iter->bi_bvec_done += len;
-		iter->bi_done += len;
 
 		if (iter->bi_bvec_done == __bvec_iter_bvec(bv, *iter)->bv_len) {
 			iter->bi_bvec_done = 0;
@@ -95,30 +91,6 @@ static inline bool bvec_iter_advance(const struct bio_vec *bv,
 	return true;
 }
 
-static inline bool bvec_iter_rewind(const struct bio_vec *bv,
-				     struct bvec_iter *iter,
-				     unsigned int bytes)
-{
-	while (bytes) {
-		unsigned len = min(bytes, iter->bi_bvec_done);
-
-		if (iter->bi_bvec_done == 0) {
-			if (WARN_ONCE(iter->bi_idx == 0,
-				      "Attempted to rewind iter beyond "
-				      "bvec's boundaries\n")) {
-				return false;
-			}
-			iter->bi_idx--;
-			iter->bi_bvec_done = __bvec_iter_bvec(bv, *iter)->bv_len;
-			continue;
-		}
-		bytes -= len;
-		iter->bi_size += len;
-		iter->bi_bvec_done -= len;
-	}
-	return true;
-}
-
 #define for_each_bvec(bvl, bio_vec, iter, start)			\
 	for (iter = (start);						\
 	     (iter).bi_size &&						\
Thanks,
Ming

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: bvec_iter rewind API
  2018-08-30 11:24 ` Ming Lei
@ 2018-09-04 18:46   ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2018-09-04 18:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: Kent Overstreet, Dmitry Monakhov, Jens Axboe, Christoph Hellwig,
	Hannes Reinecke, linux-block

On Thu, Aug 30, 2018 at 07:24:15PM +0800, Ming Lei wrote:
> I agree, and it isn't good to introduce extra update of .bi_done in fast path
> too, and it just makes things complicated in concept.
> 
> Maybe we can do the following way, and looks it works in scsi_debug/dix
> test.

Can you send that as a proper patch series?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-09-04 18:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-28 23:33 bvec_iter rewind API Kent Overstreet
2018-08-29 14:56 ` Jens Axboe
2018-08-30 11:24 ` Ming Lei
2018-09-04 18:46   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.