linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block: return the correct bvec when checking for gaps
@ 2021-06-03 22:34 longli
  2021-06-04  1:53 ` Ming Lei
  0 siblings, 1 reply; 5+ messages in thread
From: longli @ 2021-06-03 22:34 UTC (permalink / raw)
  To: linux-block
  Cc: Long Li, Jens Axboe, Johannes Thumshirn, Pavel Begunkov,
	Ming Lei, Tejun Heo, Matthew Wilcox (Oracle),
	Jeffle Xu, linux-kernel, stable

From: Long Li <longli@microsoft.com>

After commit 07173c3ec276 ("block: enable multipage bvecs"), a bvec can
have multiple pages. But bio_will_gap() still assumes one page bvec while
checking for merging. This causes data corruption on drivers relying on
the correct merging on virt_boundary_mask.

Fix this by returning the multi-page bvec for testing gaps for merging.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 07173c3ec276 ("block: enable multipage bvecs")
Signed-off-by: Long Li <longli@microsoft.com>
---
 include/linux/bio.h | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index a0b4cfdf62a4..6b2f609ccfbf 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -44,9 +44,6 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs)
 #define bio_offset(bio)		bio_iter_offset((bio), (bio)->bi_iter)
 #define bio_iovec(bio)		bio_iter_iovec((bio), (bio)->bi_iter)
 
-#define bio_multiple_segments(bio)				\
-	((bio)->bi_iter.bi_size != bio_iovec(bio).bv_len)
-
 #define bvec_iter_sectors(iter)	((iter).bi_size >> 9)
 #define bvec_iter_end_sector(iter) ((iter).bi_sector + bvec_iter_sectors((iter)))
 
@@ -271,7 +268,7 @@ static inline void bio_clear_flag(struct bio *bio, unsigned int bit)
 
 static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
 {
-	*bv = bio_iovec(bio);
+	*bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
 }
 
 static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
@@ -279,10 +276,10 @@ static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
 	struct bvec_iter iter = bio->bi_iter;
 	int idx;
 
-	if (unlikely(!bio_multiple_segments(bio))) {
-		*bv = bio_iovec(bio);
+	/* this bio has only one bvec */
+	*bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+	if (bv->bv_len == bio->bi_iter.bi_size)
 		return;
-	}
 
 	bio_advance_iter(bio, &iter, iter.bi_size);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: return the correct bvec when checking for gaps
  2021-06-03 22:34 [PATCH] block: return the correct bvec when checking for gaps longli
@ 2021-06-04  1:53 ` Ming Lei
  2021-06-04  6:38   ` Long Li
  0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2021-06-04  1:53 UTC (permalink / raw)
  To: longli
  Cc: linux-block, Long Li, Jens Axboe, Johannes Thumshirn,
	Pavel Begunkov, Tejun Heo, Matthew Wilcox (Oracle),
	Jeffle Xu, linux-kernel, stable

Hello Long,

On Thu, Jun 03, 2021 at 03:34:31PM -0700, longli@linuxonhyperv.com wrote:
> From: Long Li <longli@microsoft.com>
> 
> After commit 07173c3ec276 ("block: enable multipage bvecs"), a bvec can
> have multiple pages. But bio_will_gap() still assumes one page bvec while
> checking for merging. This causes data corruption on drivers relying on
> the correct merging on virt_boundary_mask.

Can you explain the data corruption a bit? 

IMO, either single page bvec or multipage bvec should be fine, because
bio_will_gap() just checks if the last bvec of prev bio and the 1st bvec
of next bio can be merged.

> 
> Fix this by returning the multi-page bvec for testing gaps for merging.
> 
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Cc: Pavel Begunkov <asml.silence@gmail.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> Fixes: 07173c3ec276 ("block: enable multipage bvecs")
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>  include/linux/bio.h | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index a0b4cfdf62a4..6b2f609ccfbf 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -44,9 +44,6 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs)
>  #define bio_offset(bio)		bio_iter_offset((bio), (bio)->bi_iter)
>  #define bio_iovec(bio)		bio_iter_iovec((bio), (bio)->bi_iter)
>  
> -#define bio_multiple_segments(bio)				\
> -	((bio)->bi_iter.bi_size != bio_iovec(bio).bv_len)
> -
>  #define bvec_iter_sectors(iter)	((iter).bi_size >> 9)
>  #define bvec_iter_end_sector(iter) ((iter).bi_sector + bvec_iter_sectors((iter)))
>  
> @@ -271,7 +268,7 @@ static inline void bio_clear_flag(struct bio *bio, unsigned int bit)
>  
>  static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
>  {
> -	*bv = bio_iovec(bio);
> +	*bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
>  }
>  
>  static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
> @@ -279,10 +276,10 @@ static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
>  	struct bvec_iter iter = bio->bi_iter;
>  	int idx;
>  
> -	if (unlikely(!bio_multiple_segments(bio))) {
> -		*bv = bio_iovec(bio);
> +	/* this bio has only one bvec */
> +	*bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> +	if (bv->bv_len == bio->bi_iter.bi_size)
>  		return;
> -	}
>  
>  	bio_advance_iter(bio, &iter, iter.bi_size);

The patch itself looks fine, given both bio_get_first_bvec() and bio_get_last_bvec()
are used in bio_will_gap() only.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] block: return the correct bvec when checking for gaps
  2021-06-04  1:53 ` Ming Lei
@ 2021-06-04  6:38   ` Long Li
  2021-06-04  8:38     ` Ming Lei
  0 siblings, 1 reply; 5+ messages in thread
From: Long Li @ 2021-06-04  6:38 UTC (permalink / raw)
  To: Ming Lei, longli
  Cc: linux-block, Jens Axboe, Johannes Thumshirn, Pavel Begunkov,
	Tejun Heo, Matthew Wilcox (Oracle),
	Jeffle Xu, linux-kernel, stable

> Subject: Re: [PATCH] block: return the correct bvec when checking for gaps
> 
> Hello Long,
> 
> On Thu, Jun 03, 2021 at 03:34:31PM -0700, longli@linuxonhyperv.com wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > After commit 07173c3ec276 ("block: enable multipage bvecs"), a bvec
> > can have multiple pages. But bio_will_gap() still assumes one page
> > bvec while checking for merging. This causes data corruption on
> > drivers relying on the correct merging on virt_boundary_mask.
> 
> Can you explain the data corruption a bit?
> 
> IMO, either single page bvec or multipage bvec should be fine, because
> bio_will_gap() just checks if the last bvec of prev bio and the 1st bvec of next
> bio can be merged.

Hi Ming,

When bio_will_gap() calls into biovec_phys_mergeable (), seg_boundary_mask (queue_segment_boundary()) is used to test if the two bio_vecs can be merged. This test can succeed if only the 1st page in bvec is used, but at the same time it can fail if all the pages in bvec are used. In other words, if the pages in bvec go across the seg_boundary_mask, the test can potentially succeed if only the 1st page is tested, but can fail if all the pages are tested.

Later, when SCSI builds the SG list from BIOs (that calls into __blk_bios_map_sg), __blk_segment_map_sg_merge() calls biovec_phys_mergeable() doing the same test . This time it may fail if the pages in bvec go across the seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those two BIOs were merged). If __blk_segment_map_sg_merge() fails, we end up with a broken SG list for drivers assuming the SG list not having offsets in intermediate pages.

In practice, usually a duplicate page (because merging fails) is put to the SG list. This page and all the pages afterwards in the SG list end up writing to the wrong sectors on disk.

Thanks,
Long

> 
> >
> > Fix this by returning the multi-page bvec for testing gaps for merging.
> >
> > Cc: Jens Axboe <axboe@kernel.dk>
> > Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> > Cc: Pavel Begunkov <asml.silence@gmail.com>
> > Cc: Ming Lei <ming.lei@redhat.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: stable@vger.kernel.org
> > Fixes: 07173c3ec276 ("block: enable multipage bvecs")
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >  include/linux/bio.h | 11 ++++-------
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/bio.h b/include/linux/bio.h index
> > a0b4cfdf62a4..6b2f609ccfbf 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -44,9 +44,6 @@ static inline unsigned int bio_max_segs(unsigned int
> nr_segs)
> >  #define bio_offset(bio)		bio_iter_offset((bio), (bio)->bi_iter)
> >  #define bio_iovec(bio)		bio_iter_iovec((bio), (bio)->bi_iter)
> >
> > -#define bio_multiple_segments(bio)				\
> > -	((bio)->bi_iter.bi_size != bio_iovec(bio).bv_len)
> > -
> >  #define bvec_iter_sectors(iter)	((iter).bi_size >> 9)
> >  #define bvec_iter_end_sector(iter) ((iter).bi_sector +
> > bvec_iter_sectors((iter)))
> >
> > @@ -271,7 +268,7 @@ static inline void bio_clear_flag(struct bio *bio,
> > unsigned int bit)
> >
> >  static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec
> > *bv)  {
> > -	*bv = bio_iovec(bio);
> > +	*bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> >  }
> >
> >  static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec
> > *bv) @@ -279,10 +276,10 @@ static inline void bio_get_last_bvec(struct
> bio *bio, struct bio_vec *bv)
> >  	struct bvec_iter iter = bio->bi_iter;
> >  	int idx;
> >
> > -	if (unlikely(!bio_multiple_segments(bio))) {
> > -		*bv = bio_iovec(bio);
> > +	/* this bio has only one bvec */
> > +	*bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> > +	if (bv->bv_len == bio->bi_iter.bi_size)
> >  		return;
> > -	}
> >
> >  	bio_advance_iter(bio, &iter, iter.bi_size);
> 
> The patch itself looks fine, given both bio_get_first_bvec() and
> bio_get_last_bvec() are used in bio_will_gap() only.
> 
> 
> Thanks,
> Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: return the correct bvec when checking for gaps
  2021-06-04  6:38   ` Long Li
@ 2021-06-04  8:38     ` Ming Lei
  2021-06-04  8:58       ` Long Li
  0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2021-06-04  8:38 UTC (permalink / raw)
  To: Long Li
  Cc: longli, linux-block, Jens Axboe, Johannes Thumshirn,
	Pavel Begunkov, Tejun Heo, Matthew Wilcox (Oracle),
	Jeffle Xu, linux-kernel, stable

On Fri, Jun 04, 2021 at 06:38:45AM +0000, Long Li wrote:
> > Subject: Re: [PATCH] block: return the correct bvec when checking for gaps
> > 
> > Hello Long,
> > 
> > On Thu, Jun 03, 2021 at 03:34:31PM -0700, longli@linuxonhyperv.com wrote:
> > > From: Long Li <longli@microsoft.com>
> > >
> > > After commit 07173c3ec276 ("block: enable multipage bvecs"), a bvec
> > > can have multiple pages. But bio_will_gap() still assumes one page
> > > bvec while checking for merging. This causes data corruption on
> > > drivers relying on the correct merging on virt_boundary_mask.
> > 
> > Can you explain the data corruption a bit?
> > 
> > IMO, either single page bvec or multipage bvec should be fine, because
> > bio_will_gap() just checks if the last bvec of prev bio and the 1st bvec of next
> > bio can be merged.
> 
> Hi Ming,
> 
> When bio_will_gap() calls into biovec_phys_mergeable (), seg_boundary_mask (queue_segment_boundary()) is used to test if the two bio_vecs can be merged. This test can succeed if only the 1st page in bvec is used, but at the same time it can fail if all the pages in bvec are used. In other words, if the pages in bvec go across the seg_boundary_mask, the test can potentially succeed if only the 1st page is tested, but can fail if all the pages are tested.
> 
> Later, when SCSI builds the SG list from BIOs (that calls into __blk_bios_map_sg), __blk_segment_map_sg_merge() calls biovec_phys_mergeable() doing the same test . This time it may fail if the pages in bvec go across the seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those two BIOs were merged). If __blk_segment_map_sg_merge() fails, we end up with a broken SG list for drivers assuming the SG list not having offsets in intermediate pages.
> 

OK, the reason is that both bio_will_gap() and __blk_segment_map_sg_merge()
have to use same approach to check if two bvecs from two bios can be
mergeable.

Now __blk_segment_map_sg_merge() won't merge the 1st bvec of next bio into
previous bio if the 1st bvec of next bio crosses segment boundary, so bio_will_gap()
has to take same way to check if the two bvecs can be merged.

Please add the segment boundary and map SG list story in commit log,
then the patch looks fine.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] block: return the correct bvec when checking for gaps
  2021-06-04  8:38     ` Ming Lei
@ 2021-06-04  8:58       ` Long Li
  0 siblings, 0 replies; 5+ messages in thread
From: Long Li @ 2021-06-04  8:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: longli, linux-block, Jens Axboe, Johannes Thumshirn,
	Pavel Begunkov, Tejun Heo, Matthew Wilcox (Oracle),
	Jeffle Xu, linux-kernel, stable

> Subject: Re: [PATCH] block: return the correct bvec when checking for gaps
> 
> On Fri, Jun 04, 2021 at 06:38:45AM +0000, Long Li wrote:
> > > Subject: Re: [PATCH] block: return the correct bvec when checking
> > > for gaps
> > >
> > > Hello Long,
> > >
> > > On Thu, Jun 03, 2021 at 03:34:31PM -0700, longli@linuxonhyperv.com
> wrote:
> > > > From: Long Li <longli@microsoft.com>
> > > >
> > > > After commit 07173c3ec276 ("block: enable multipage bvecs"), a
> > > > bvec can have multiple pages. But bio_will_gap() still assumes one
> > > > page bvec while checking for merging. This causes data corruption
> > > > on drivers relying on the correct merging on virt_boundary_mask.
> > >
> > > Can you explain the data corruption a bit?
> > >
> > > IMO, either single page bvec or multipage bvec should be fine,
> > > because
> > > bio_will_gap() just checks if the last bvec of prev bio and the 1st
> > > bvec of next bio can be merged.
> >
> > Hi Ming,
> >
> > When bio_will_gap() calls into biovec_phys_mergeable (),
> seg_boundary_mask (queue_segment_boundary()) is used to test if the two
> bio_vecs can be merged. This test can succeed if only the 1st page in bvec is
> used, but at the same time it can fail if all the pages in bvec are used. In other
> words, if the pages in bvec go across the seg_boundary_mask, the test can
> potentially succeed if only the 1st page is tested, but can fail if all the pages
> are tested.
> >
> > Later, when SCSI builds the SG list from BIOs (that calls into
> __blk_bios_map_sg), __blk_segment_map_sg_merge() calls
> biovec_phys_mergeable() doing the same test . This time it may fail if the
> pages in bvec go across the seg_boundary_mask (but tested okay in
> bio_will_gap() earlier, so those two BIOs were merged). If
> __blk_segment_map_sg_merge() fails, we end up with a broken SG list for
> drivers assuming the SG list not having offsets in intermediate pages.
> >
> 
> OK, the reason is that both bio_will_gap() and
> __blk_segment_map_sg_merge() have to use same approach to check if
> two bvecs from two bios can be mergeable.
> 
> Now __blk_segment_map_sg_merge() won't merge the 1st bvec of next bio
> into previous bio if the 1st bvec of next bio crosses segment boundary, so
> bio_will_gap() has to take same way to check if the two bvecs can be merged.
> 
> Please add the segment boundary and map SG list story in commit log, then
> the patch looks fine.

Sure, I will send v2.

Long


> 
> 
> Thanks,
> Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-04  8:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 22:34 [PATCH] block: return the correct bvec when checking for gaps longli
2021-06-04  1:53 ` Ming Lei
2021-06-04  6:38   ` Long Li
2021-06-04  8:38     ` Ming Lei
2021-06-04  8:58       ` Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).