All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Re: dm-crypt: Fix error with too large bios
@ 2016-08-11  3:56 Eric Wheeler
  2016-08-13 17:45 ` Mikulas Patocka
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wheeler @ 2016-08-11  3:56 UTC (permalink / raw)
  To: mpatocka; +Cc: dm-devel, johnstonj.public

Hello Mikulas and dm-devel list,

The simple patch below with is confirmed to fix James Johnston's issue and 
doesn't appear to be in v4.8-rc1:

This references the following patchwork entry:
  https://patchwork.kernel.org/patch/9138595/

Can we get this pushed upstream for v4.8?

--
Eric Wheeler


On Fri, 27 May 2016, Mikulas Patocka wrote:
> dm-crypt: Fix error with too large bios
> 
> When dm-crypt processes writes, it allocates a new bio in the function
> crypt_alloc_buffer. The bio is allocated from a bio set and it can have at
> most BIO_MAX_PAGES vector entries, however the incoming bio can be larger
> if it was allocated by other means. For example, bcache creates bios
> larger than BIO_MAX_PAGES. If the incoming bio is larger, bio_alloc_bioset
> fails and error is returned.
> 
> To avoid the error, we test for too large bio in the function crypt_map
> and dm_accept_partial_bio to split the bio. dm_accept_partial_bio trims
> the current bio to the desired size and requests that the device mapper
> core sends another bio with the rest of the data.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org	# v3.16+

Tested-by: James Johnston <johnstonj.public@codenest.com>

I tested this patch by:

1.  Building v4.7-rc1 from Torvalds git repo.  Confirmed that original bug
    still occurs on Ubuntu 15.10.

2.  Applying your patch to v4.7-rc1.  My kill sequence no longer works,
    and the writeback cache is now successfully flushed to disk, and the
    cache can be detached from the backing device.

3.  To check data integrity, copied 250 MB of /dev/urandom to some file
    on main volume.  Then, dd copy this file to /dev/bcache0.  Then,
    detached the cache device from the backing device.  Then, rebooted.
    Then, dd copy /dev/bcache0 to another file on main volume.  Then,
    diff the files and confirm no changes.

So it looks like it works, based on this admittedly brief testing.  Thanks!

Best regards,

James Johnston


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Patch

Index: linux-4.6/drivers/md/dm-crypt.c
===================================================================
--- linux-4.6.orig/drivers/md/dm-crypt.c
+++ linux-4.6/drivers/md/dm-crypt.c
@@ -2137,6 +2137,10 @@  static int crypt_map(struct dm_target *t
 	struct dm_crypt_io *io;
 	struct crypt_config *cc = ti->private;
 
+	if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
+	    (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
+		dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
+
 	/*
 	 * If bio is REQ_FLUSH or REQ_DISCARD, just bypass crypt queues.
 	 * - for REQ_FLUSH device-mapper core ensures that no IO is in-flight

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: dm-crypt: Fix error with too large bios
  2016-08-11  3:56 [PATCH] Re: dm-crypt: Fix error with too large bios Eric Wheeler
@ 2016-08-13 17:45 ` Mikulas Patocka
  2016-08-14  0:07   ` Mike Snitzer
  0 siblings, 1 reply; 20+ messages in thread
From: Mikulas Patocka @ 2016-08-13 17:45 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Eric Wheeler, dm-devel, johnstonj.public

Yes, this should be backported. It was lost somehow.

Mike, please put it to your git.

Mikulas


On Wed, 10 Aug 2016, Eric Wheeler wrote:

> Hello Mikulas and dm-devel list,
> 
> The simple patch below with is confirmed to fix James Johnston's issue and 
> doesn't appear to be in v4.8-rc1:
> 
> This references the following patchwork entry:
>   https://patchwork.kernel.org/patch/9138595/
> 
> Can we get this pushed upstream for v4.8?
> 
> --
> Eric Wheeler
> 
> 
> On Fri, 27 May 2016, Mikulas Patocka wrote:
> > dm-crypt: Fix error with too large bios
> > 
> > When dm-crypt processes writes, it allocates a new bio in the function
> > crypt_alloc_buffer. The bio is allocated from a bio set and it can have at
> > most BIO_MAX_PAGES vector entries, however the incoming bio can be larger
> > if it was allocated by other means. For example, bcache creates bios
> > larger than BIO_MAX_PAGES. If the incoming bio is larger, bio_alloc_bioset
> > fails and error is returned.
> > 
> > To avoid the error, we test for too large bio in the function crypt_map
> > and dm_accept_partial_bio to split the bio. dm_accept_partial_bio trims
> > the current bio to the desired size and requests that the device mapper
> > core sends another bio with the rest of the data.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > Cc: stable@vger.kernel.org	# v3.16+
> 
> Tested-by: James Johnston <johnstonj.public@codenest.com>
> 
> I tested this patch by:
> 
> 1.  Building v4.7-rc1 from Torvalds git repo.  Confirmed that original bug
>     still occurs on Ubuntu 15.10.
> 
> 2.  Applying your patch to v4.7-rc1.  My kill sequence no longer works,
>     and the writeback cache is now successfully flushed to disk, and the
>     cache can be detached from the backing device.
> 
> 3.  To check data integrity, copied 250 MB of /dev/urandom to some file
>     on main volume.  Then, dd copy this file to /dev/bcache0.  Then,
>     detached the cache device from the backing device.  Then, rebooted.
>     Then, dd copy /dev/bcache0 to another file on main volume.  Then,
>     diff the files and confirm no changes.
> 
> So it looks like it works, based on this admittedly brief testing.  Thanks!
> 
> Best regards,
> 
> James Johnston
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> Patch
> 
> Index: linux-4.6/drivers/md/dm-crypt.c
> ===================================================================
> --- linux-4.6.orig/drivers/md/dm-crypt.c
> +++ linux-4.6/drivers/md/dm-crypt.c
> @@ -2137,6 +2137,10 @@  static int crypt_map(struct dm_target *t
>  	struct dm_crypt_io *io;
>  	struct crypt_config *cc = ti->private;
>  
> +	if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
> +	    (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
> +		dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
> +
>  	/*
>  	 * If bio is REQ_FLUSH or REQ_DISCARD, just bypass crypt queues.
>  	 * - for REQ_FLUSH device-mapper core ensures that no IO is in-flight
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-13 17:45 ` Mikulas Patocka
@ 2016-08-14  0:07   ` Mike Snitzer
  2016-08-15 17:03     ` Mikulas Patocka
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-08-14  0:07 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Eric Wheeler, dm-devel, hch, johnstonj.public

[top-posting just because others went wild with it]

I don't have a strong opinion but I just assumed the local dm-crypt
workaround wasn't the way forward.  I didn't stage it because Christoph
disagreed with it:
https://lkml.org/lkml/2016/6/1/456
https://lkml.org/lkml/2016/6/1/477

Also, this would appear to be a more generic fix:
"block: make sure big bio is splitted into at most 256 bvecs
https://lkml.org/lkml/2016/8/12/154
(but Christoph disagrees there too, so the way forward isn't clear)

Mike

On Sat, Aug 13 2016 at  1:45pm -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> Yes, this should be backported. It was lost somehow.
> 
> Mike, please put it to your git.
> 
> Mikulas
> 
> 
> On Wed, 10 Aug 2016, Eric Wheeler wrote:
> 
> > Hello Mikulas and dm-devel list,
> > 
> > The simple patch below with is confirmed to fix James Johnston's issue and 
> > doesn't appear to be in v4.8-rc1:
> > 
> > This references the following patchwork entry:
> >   https://patchwork.kernel.org/patch/9138595/
> > 
> > Can we get this pushed upstream for v4.8?
> > 
> > --
> > Eric Wheeler
> > 
> > 
> > On Fri, 27 May 2016, Mikulas Patocka wrote:
> > > dm-crypt: Fix error with too large bios
> > > 
> > > When dm-crypt processes writes, it allocates a new bio in the function
> > > crypt_alloc_buffer. The bio is allocated from a bio set and it can have at
> > > most BIO_MAX_PAGES vector entries, however the incoming bio can be larger
> > > if it was allocated by other means. For example, bcache creates bios
> > > larger than BIO_MAX_PAGES. If the incoming bio is larger, bio_alloc_bioset
> > > fails and error is returned.
> > > 
> > > To avoid the error, we test for too large bio in the function crypt_map
> > > and dm_accept_partial_bio to split the bio. dm_accept_partial_bio trims
> > > the current bio to the desired size and requests that the device mapper
> > > core sends another bio with the rest of the data.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > Cc: stable@vger.kernel.org	# v3.16+
> > 
> > Tested-by: James Johnston <johnstonj.public@codenest.com>
> > 
> > I tested this patch by:
> > 
> > 1.  Building v4.7-rc1 from Torvalds git repo.  Confirmed that original bug
> >     still occurs on Ubuntu 15.10.
> > 
> > 2.  Applying your patch to v4.7-rc1.  My kill sequence no longer works,
> >     and the writeback cache is now successfully flushed to disk, and the
> >     cache can be detached from the backing device.
> > 
> > 3.  To check data integrity, copied 250 MB of /dev/urandom to some file
> >     on main volume.  Then, dd copy this file to /dev/bcache0.  Then,
> >     detached the cache device from the backing device.  Then, rebooted.
> >     Then, dd copy /dev/bcache0 to another file on main volume.  Then,
> >     diff the files and confirm no changes.
> > 
> > So it looks like it works, based on this admittedly brief testing.  Thanks!
> > 
> > Best regards,
> > 
> > James Johnston
> > 
> > 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> > Patch
> > 
> > Index: linux-4.6/drivers/md/dm-crypt.c
> > ===================================================================
> > --- linux-4.6.orig/drivers/md/dm-crypt.c
> > +++ linux-4.6/drivers/md/dm-crypt.c
> > @@ -2137,6 +2137,10 @@  static int crypt_map(struct dm_target *t
> >  	struct dm_crypt_io *io;
> >  	struct crypt_config *cc = ti->private;
> >  
> > +	if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
> > +	    (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
> > +		dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
> > +
> >  	/*
> >  	 * If bio is REQ_FLUSH or REQ_DISCARD, just bypass crypt queues.
> >  	 * - for REQ_FLUSH device-mapper core ensures that no IO is in-flight
> > 
> > 
> > 
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-14  0:07   ` Mike Snitzer
@ 2016-08-15 17:03     ` Mikulas Patocka
  2016-08-19  0:47       ` Eric Wheeler
  0 siblings, 1 reply; 20+ messages in thread
From: Mikulas Patocka @ 2016-08-15 17:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Eric Wheeler, dm-devel, Mike Snitzer, johnstonj.public



On Sat, 13 Aug 2016, Mike Snitzer wrote:

> [top-posting just because others went wild with it]
> 
> I don't have a strong opinion but I just assumed the local dm-crypt
> workaround wasn't the way forward.  I didn't stage it because Christoph
> disagreed with it:
> https://lkml.org/lkml/2016/6/1/456
> https://lkml.org/lkml/2016/6/1/477
> 
> Also, this would appear to be a more generic fix:
> "block: make sure big bio is splitted into at most 256 bvecs
> https://lkml.org/lkml/2016/8/12/154
> (but Christoph disagrees there too, so the way forward isn't clear)
> 
> Mike


On Wed, Jun 01 2016 at  9:44am -0400, Christoph Hellwig <hch@infradead.org> wrote:

> > > be dm-crypt.c.  Maybe you've identified some indirect use of
> > > BIO_MAX_SIZE?
> >
> > I mean the recently introduced BIO_MAX_SIZE in -next tree:
> >
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486
>
> The crazy bcache bios striking back once again.  I really think it's
> harmful having a _MAX value and then having a minor driver
> reinterpreting it and sending larger ones.  Until we can lift the
> maximum limit in general nad have common code exercise it we really need
> to stop bcache from sending these instead of littering the tree with
> workarounds.

The bio_kmalloc function allocates bios with up to 1024 vector entries (as 
opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector 
entries).

Device mapper is using bio_alloc_bioset with a bio set, so it is limited 
to 256 vector entries, but other kernel users may use bio_kmalloc and 
create larger bios.

So, if you don't want bios with more than 256 vector entries to exist, you 
should impose this limit in bio_kmalloc (and fix all the callers that use 
it).

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-15 17:03     ` Mikulas Patocka
@ 2016-08-19  0:47       ` Eric Wheeler
  2016-08-25 18:34           ` Mikulas Patocka
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wheeler @ 2016-08-19  0:47 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: dm-devel, Christoph Hellwig, Mike Snitzer, johnstonj.public

On Mon, 15 Aug 2016, Mikulas Patocka wrote:
> On Sat, 13 Aug 2016, Mike Snitzer wrote:
> 
> > [top-posting just because others went wild with it]
> > 
> > I don't have a strong opinion but I just assumed the local dm-crypt
> > workaround wasn't the way forward.  I didn't stage it because Christoph
> > disagreed with it:
> > https://lkml.org/lkml/2016/6/1/456
> > https://lkml.org/lkml/2016/6/1/477
> > 
> > Also, this would appear to be a more generic fix:
> > "block: make sure big bio is splitted into at most 256 bvecs
> > https://lkml.org/lkml/2016/8/12/154
> > (but Christoph disagrees there too, so the way forward isn't clear)
> > 
> > Mike
> 
> 
> On Wed, Jun 01 2016 at  9:44am -0400, Christoph Hellwig <hch@infradead.org> wrote:
> 
> > > > be dm-crypt.c.  Maybe you've identified some indirect use of
> > > > BIO_MAX_SIZE?
> > >
> > > I mean the recently introduced BIO_MAX_SIZE in -next tree:
> > >
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486
> >
> > The crazy bcache bios striking back once again.  I really think it's
> > harmful having a _MAX value and then having a minor driver
> > reinterpreting it and sending larger ones.  Until we can lift the
> > maximum limit in general nad have common code exercise it we really need
> > to stop bcache from sending these instead of littering the tree with
> > workarounds.
> 
> The bio_kmalloc function allocates bios with up to 1024 vector entries (as 
> opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector 
> entries).
> 
> Device mapper is using bio_alloc_bioset with a bio set, so it is limited 
> to 256 vector entries, but other kernel users may use bio_kmalloc and 
> create larger bios.
> 
> So, if you don't want bios with more than 256 vector entries to exist, you 
> should impose this limit in bio_kmalloc (and fix all the callers that use 
> it).

FYI, Kent Overstreet notes this about bcache from the other thread here:
	https://lkml.org/lkml/2016/8/15/620

[paste]
>> bcache originally had workaround code to split too-large bios when it 
>> first went upstream - that was dropped only after the patches to make 
>> generic_make_request() handle arbitrary size bios went in. So to do what 
>> you're suggesting would mean reverting that bcache patch and bringing that 
>> code back, which from my perspective would be a step in the wrong 
>> direction. I just want to get this over and done with.
>> 
>> re: interactions with other drivers - bio_clone() has already been changed 
>> to only clone biovecs that are live for current bi_iter, so there 
>> shouldn't be any safety issues. A driver would have to be intentionally 
>> doing its own open coded bio cloning that clones all of bi_io_vec, not 
>> just the active ones - but if they're doing that, they're already broken 
>> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that 
>> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were 
>> created by bio_clone_fast).
>> 
>> And the cloning and bi_vcnt usage stuff I audited very thoroughly back 
>> when I was working on immutable biovecs and such back in the day, and I 
>> had to do a fair amount of cleanup/refactoring before that stuff could go 
>> in. 
[/paste]

They are making progress in the patch-v3 thread, so perhaps this can be 
fixed for now in generic_make_request().

--
Eric Wheeler

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] dm-crypt: Fix error with too large bios
  2016-08-19  0:47       ` Eric Wheeler
@ 2016-08-25 18:34           ` Mikulas Patocka
  0 siblings, 0 replies; 20+ messages in thread
From: Mikulas Patocka @ 2016-08-25 18:34 UTC (permalink / raw)
  To: Eric Wheeler
  Cc: Christoph Hellwig, dm-devel, Mike Snitzer, johnstonj.public,
	Jens Axboe, linux-block



On Thu, 18 Aug 2016, Eric Wheeler wrote:

> > On Wed, Jun 01 2016 at  9:44am -0400, Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > > > > be dm-crypt.c.  Maybe you've identified some indirect use of
> > > > > BIO_MAX_SIZE?
> > > >
> > > > I mean the recently introduced BIO_MAX_SIZE in -next tree:
> > > >
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486
> > >
> > > The crazy bcache bios striking back once again.  I really think it's
> > > harmful having a _MAX value and then having a minor driver
> > > reinterpreting it and sending larger ones.  Until we can lift the
> > > maximum limit in general nad have common code exercise it we really need
> > > to stop bcache from sending these instead of littering the tree with
> > > workarounds.
> > 
> > The bio_kmalloc function allocates bios with up to 1024 vector entries (as 
> > opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector 
> > entries).
> > 
> > Device mapper is using bio_alloc_bioset with a bio set, so it is limited 
> > to 256 vector entries, but other kernel users may use bio_kmalloc and 
> > create larger bios.
> > 
> > So, if you don't want bios with more than 256 vector entries to exist, you 
> > should impose this limit in bio_kmalloc (and fix all the callers that use 
> > it).
> 
> FYI, Kent Overstreet notes this about bcache from the other thread here:
> 	https://lkml.org/lkml/2016/8/15/620
> 
> [paste]
> >> bcache originally had workaround code to split too-large bios when it 
> >> first went upstream - that was dropped only after the patches to make 
> >> generic_make_request() handle arbitrary size bios went in. So to do what 
> >> you're suggesting would mean reverting that bcache patch and bringing that 
> >> code back, which from my perspective would be a step in the wrong 
> >> direction. I just want to get this over and done with.
> >> 
> >> re: interactions with other drivers - bio_clone() has already been changed 
> >> to only clone biovecs that are live for current bi_iter, so there 
> >> shouldn't be any safety issues. A driver would have to be intentionally 
> >> doing its own open coded bio cloning that clones all of bi_io_vec, not 
> >> just the active ones - but if they're doing that, they're already broken 
> >> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that 
> >> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were 
> >> created by bio_clone_fast).
> >> 
> >> And the cloning and bi_vcnt usage stuff I audited very thoroughly back 
> >> when I was working on immutable biovecs and such back in the day, and I 
> >> had to do a fair amount of cleanup/refactoring before that stuff could go 
> >> in. 
> [/paste]
> 
> They are making progress in the patch-v3 thread, so perhaps this can be 
> fixed for now in generic_make_request().
> 
> --
> Eric Wheeler

Device mapper can't split the bio in generic_make_request - it frees the 
md->queue->bio_split bioset, to save one kernel thread per device. Device 
mapper uses its own splitting mechanism.

So what is the final decision? - should device mapper split the big bio or 
should bcache not submit big bios?

I think splitting big bios in the device mapper is better - simply because 
it is much less code than reworking bcache to split bios internally.

BTW. In the device mapper, we have a layer dm-io, that was created to work 
around bio size limitations - it accepts unlimited I/O request and splits 
it to several bios. When bio size limitations are gone, we could simplify 
dm-io too.

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
@ 2016-08-25 18:34           ` Mikulas Patocka
  0 siblings, 0 replies; 20+ messages in thread
From: Mikulas Patocka @ 2016-08-25 18:34 UTC (permalink / raw)
  To: Eric Wheeler
  Cc: Jens Axboe, Mike Snitzer, linux-block, dm-devel,
	Christoph Hellwig, johnstonj.public



On Thu, 18 Aug 2016, Eric Wheeler wrote:

> > On Wed, Jun 01 2016 at  9:44am -0400, Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > > > > be dm-crypt.c.  Maybe you've identified some indirect use of
> > > > > BIO_MAX_SIZE?
> > > >
> > > > I mean the recently introduced BIO_MAX_SIZE in -next tree:
> > > >
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486
> > >
> > > The crazy bcache bios striking back once again.  I really think it's
> > > harmful having a _MAX value and then having a minor driver
> > > reinterpreting it and sending larger ones.  Until we can lift the
> > > maximum limit in general nad have common code exercise it we really need
> > > to stop bcache from sending these instead of littering the tree with
> > > workarounds.
> > 
> > The bio_kmalloc function allocates bios with up to 1024 vector entries (as 
> > opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector 
> > entries).
> > 
> > Device mapper is using bio_alloc_bioset with a bio set, so it is limited 
> > to 256 vector entries, but other kernel users may use bio_kmalloc and 
> > create larger bios.
> > 
> > So, if you don't want bios with more than 256 vector entries to exist, you 
> > should impose this limit in bio_kmalloc (and fix all the callers that use 
> > it).
> 
> FYI, Kent Overstreet notes this about bcache from the other thread here:
> 	https://lkml.org/lkml/2016/8/15/620
> 
> [paste]
> >> bcache originally had workaround code to split too-large bios when it 
> >> first went upstream - that was dropped only after the patches to make 
> >> generic_make_request() handle arbitrary size bios went in. So to do what 
> >> you're suggesting would mean reverting that bcache patch and bringing that 
> >> code back, which from my perspective would be a step in the wrong 
> >> direction. I just want to get this over and done with.
> >> 
> >> re: interactions with other drivers - bio_clone() has already been changed 
> >> to only clone biovecs that are live for current bi_iter, so there 
> >> shouldn't be any safety issues. A driver would have to be intentionally 
> >> doing its own open coded bio cloning that clones all of bi_io_vec, not 
> >> just the active ones - but if they're doing that, they're already broken 
> >> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that 
> >> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were 
> >> created by bio_clone_fast).
> >> 
> >> And the cloning and bi_vcnt usage stuff I audited very thoroughly back 
> >> when I was working on immutable biovecs and such back in the day, and I 
> >> had to do a fair amount of cleanup/refactoring before that stuff could go 
> >> in. 
> [/paste]
> 
> They are making progress in the patch-v3 thread, so perhaps this can be 
> fixed for now in generic_make_request().
> 
> --
> Eric Wheeler

Device mapper can't split the bio in generic_make_request - it frees the 
md->queue->bio_split bioset, to save one kernel thread per device. Device 
mapper uses its own splitting mechanism.

So what is the final decision? - should device mapper split the big bio or 
should bcache not submit big bios?

I think splitting big bios in the device mapper is better - simply because 
it is much less code than reworking bcache to split bios internally.

BTW. In the device mapper, we have a layer dm-io, that was created to work 
around bio size limitations - it accepts unlimited I/O request and splits 
it to several bios. When bio size limitations are gone, we could simplify 
dm-io too.

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] dm-crypt: Fix error with too large bios
  2016-08-25 18:34           ` Mikulas Patocka
  (?)
@ 2016-08-25 20:13           ` Jens Axboe
  2016-08-26 14:05             ` Mike Snitzer
  -1 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2016-08-25 20:13 UTC (permalink / raw)
  To: Mikulas Patocka, Eric Wheeler
  Cc: Christoph Hellwig, dm-devel, Mike Snitzer, johnstonj.public, linux-block

On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
>
>
> On Thu, 18 Aug 2016, Eric Wheeler wrote:
>
>>> On Wed, Jun 01 2016 at  9:44am -0400, Christoph Hellwig <hch@infradead.org> wrote:
>>>
>>>>>> be dm-crypt.c.  Maybe you've identified some indirect use of
>>>>>> BIO_MAX_SIZE?
>>>>>
>>>>> I mean the recently introduced BIO_MAX_SIZE in -next tree:
>>>>>
>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486
>>>>
>>>> The crazy bcache bios striking back once again.  I really think it's
>>>> harmful having a _MAX value and then having a minor driver
>>>> reinterpreting it and sending larger ones.  Until we can lift the
>>>> maximum limit in general nad have common code exercise it we really need
>>>> to stop bcache from sending these instead of littering the tree with
>>>> workarounds.
>>>
>>> The bio_kmalloc function allocates bios with up to 1024 vector entries (as
>>> opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector
>>> entries).
>>>
>>> Device mapper is using bio_alloc_bioset with a bio set, so it is limited
>>> to 256 vector entries, but other kernel users may use bio_kmalloc and
>>> create larger bios.
>>>
>>> So, if you don't want bios with more than 256 vector entries to exist, you
>>> should impose this limit in bio_kmalloc (and fix all the callers that use
>>> it).
>>
>> FYI, Kent Overstreet notes this about bcache from the other thread here:
>> 	https://lkml.org/lkml/2016/8/15/620
>>
>> [paste]
>>>> bcache originally had workaround code to split too-large bios when it
>>>> first went upstream - that was dropped only after the patches to make
>>>> generic_make_request() handle arbitrary size bios went in. So to do what
>>>> you're suggesting would mean reverting that bcache patch and bringing that
>>>> code back, which from my perspective would be a step in the wrong
>>>> direction. I just want to get this over and done with.
>>>>
>>>> re: interactions with other drivers - bio_clone() has already been changed
>>>> to only clone biovecs that are live for current bi_iter, so there
>>>> shouldn't be any safety issues. A driver would have to be intentionally
>>>> doing its own open coded bio cloning that clones all of bi_io_vec, not
>>>> just the active ones - but if they're doing that, they're already broken
>>>> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that
>>>> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were
>>>> created by bio_clone_fast).
>>>>
>>>> And the cloning and bi_vcnt usage stuff I audited very thoroughly back
>>>> when I was working on immutable biovecs and such back in the day, and I
>>>> had to do a fair amount of cleanup/refactoring before that stuff could go
>>>> in.
>> [/paste]
>>
>> They are making progress in the patch-v3 thread, so perhaps this can be
>> fixed for now in generic_make_request().
>>
>> --
>> Eric Wheeler
>
> Device mapper can't split the bio in generic_make_request - it frees the
> md->queue->bio_split bioset, to save one kernel thread per device. Device
> mapper uses its own splitting mechanism.
>
> So what is the final decision? - should device mapper split the big bio or
> should bcache not submit big bios?
>
> I think splitting big bios in the device mapper is better - simply because
> it is much less code than reworking bcache to split bios internally.
>
> BTW. In the device mapper, we have a layer dm-io, that was created to work
> around bio size limitations - it accepts unlimited I/O request and splits
> it to several bios. When bio size limitations are gone, we could simplify
> dm-io too.

The patch from Ming Lei was applied for 4.8 the other day.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-25 20:13           ` [dm-devel] " Jens Axboe
@ 2016-08-26 14:05             ` Mike Snitzer
  2016-08-27 15:09               ` Mikulas Patocka
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-08-26 14:05 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel,
	johnstonj.public, linux-block

On Thu, Aug 25 2016 at  4:13pm -0400,
Jens Axboe <axboe@kernel.dk> wrote:

> On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> >
> >Device mapper can't split the bio in generic_make_request - it frees the
> >md->queue->bio_split bioset, to save one kernel thread per device. Device
> >mapper uses its own splitting mechanism.
> >
> >So what is the final decision? - should device mapper split the big bio or
> >should bcache not submit big bios?
> >
> >I think splitting big bios in the device mapper is better - simply because
> >it is much less code than reworking bcache to split bios internally.
> >
> >BTW. In the device mapper, we have a layer dm-io, that was created to work
> >around bio size limitations - it accepts unlimited I/O request and splits
> >it to several bios. When bio size limitations are gone, we could simplify
> >dm-io too.
> 
> The patch from Ming Lei was applied for 4.8 the other day.

See linux-block commit:
4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-26 14:05             ` Mike Snitzer
@ 2016-08-27 15:09               ` Mikulas Patocka
  2016-08-29  5:22                 ` Ming Lei
  2016-08-29 18:19                 ` Mike Snitzer
  0 siblings, 2 replies; 20+ messages in thread
From: Mikulas Patocka @ 2016-08-27 15:09 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel,
	johnstonj.public, linux-block



On Fri, 26 Aug 2016, Mike Snitzer wrote:

> On Thu, Aug 25 2016 at  4:13pm -0400,
> Jens Axboe <axboe@kernel.dk> wrote:
> 
> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> > >
> > >Device mapper can't split the bio in generic_make_request - it frees the
> > >md->queue->bio_split bioset, to save one kernel thread per device. Device
> > >mapper uses its own splitting mechanism.
> > >
> > >So what is the final decision? - should device mapper split the big bio or
> > >should bcache not submit big bios?
> > >
> > >I think splitting big bios in the device mapper is better - simply because
> > >it is much less code than reworking bcache to split bios internally.
> > >
> > >BTW. In the device mapper, we have a layer dm-io, that was created to work
> > >around bio size limitations - it accepts unlimited I/O request and splits
> > >it to several bios. When bio size limitations are gone, we could simplify
> > >dm-io too.
> > 
> > The patch from Ming Lei was applied for 4.8 the other day.
> 
> See linux-block commit:
> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c

But this patch won't work for device mapper, blk_bio_segment_split is 
called from blk_queue_split and device mapper doesn't use blk_queue_split 
(it can't because it frees queue->bio_split).

We still need these two patches:
https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-27 15:09               ` Mikulas Patocka
@ 2016-08-29  5:22                 ` Ming Lei
  2016-08-29 21:57                   ` Mikulas Patocka
  2016-08-29 18:19                 ` Mike Snitzer
  1 sibling, 1 reply; 20+ messages in thread
From: Ming Lei @ 2016-08-29  5:22 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig,
	open list:DEVICE-MAPPER (LVM),
	johnstonj.public, linux-block

On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Fri, 26 Aug 2016, Mike Snitzer wrote:
>
>> On Thu, Aug 25 2016 at  4:13pm -0400,
>> Jens Axboe <axboe@kernel.dk> wrote:
>>
>> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
>> > >
>> > >Device mapper can't split the bio in generic_make_request - it frees the
>> > >md->queue->bio_split bioset, to save one kernel thread per device. Device
>> > >mapper uses its own splitting mechanism.
>> > >
>> > >So what is the final decision? - should device mapper split the big bio or
>> > >should bcache not submit big bios?
>> > >
>> > >I think splitting big bios in the device mapper is better - simply because
>> > >it is much less code than reworking bcache to split bios internally.
>> > >
>> > >BTW. In the device mapper, we have a layer dm-io, that was created to work
>> > >around bio size limitations - it accepts unlimited I/O request and splits
>> > >it to several bios. When bio size limitations are gone, we could simplify
>> > >dm-io too.
>> >
>> > The patch from Ming Lei was applied for 4.8 the other day.
>>
>> See linux-block commit:
>> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
>
> But this patch won't work for device mapper, blk_bio_segment_split is
> called from blk_queue_split and device mapper doesn't use blk_queue_split
> (it can't because it frees queue->bio_split).
>
> We still need these two patches:
> https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html

About the 2nd patch, it might not be good enough because in theory
a small size bio still may include big bvecs, such as, each bvec points
to 512byte buffer, so strictly speaking the bvec number should
be checked instead of bio size.


-- 
Ming Lei

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-27 15:09               ` Mikulas Patocka
  2016-08-29  5:22                 ` Ming Lei
@ 2016-08-29 18:19                 ` Mike Snitzer
  2016-08-29 22:01                   ` Mikulas Patocka
  1 sibling, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-08-29 18:19 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel,
	johnstonj.public, linux-block

On Sat, Aug 27 2016 at 11:09am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Fri, 26 Aug 2016, Mike Snitzer wrote:
> 
> > On Thu, Aug 25 2016 at  4:13pm -0400,
> > Jens Axboe <axboe@kernel.dk> wrote:
> > 
> > > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> > > >
> > > >Device mapper can't split the bio in generic_make_request - it frees the
> > > >md->queue->bio_split bioset, to save one kernel thread per device. Device
> > > >mapper uses its own splitting mechanism.
> > > >
> > > >So what is the final decision? - should device mapper split the big bio or
> > > >should bcache not submit big bios?
> > > >
> > > >I think splitting big bios in the device mapper is better - simply because
> > > >it is much less code than reworking bcache to split bios internally.
> > > >
> > > >BTW. In the device mapper, we have a layer dm-io, that was created to work
> > > >around bio size limitations - it accepts unlimited I/O request and splits
> > > >it to several bios. When bio size limitations are gone, we could simplify
> > > >dm-io too.
> > > 
> > > The patch from Ming Lei was applied for 4.8 the other day.
> > 
> > See linux-block commit:
> > 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
> 
> But this patch won't work for device mapper, blk_bio_segment_split is 
> called from blk_queue_split and device mapper doesn't use blk_queue_split 
> (it can't because it frees queue->bio_split).
> 
> We still need these two patches:
> https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html

I'm left wondering whether it'd be better to revert commit dbba42d8a9e
("dm: eliminate unused "bioset" process for each bio-based DM device")

And also look for other areas of DM that could leverage the block core's
relatively new splitting capabilities.

Otherwise, we'll just be sprinking more BIO_MAX_PAGES-based splitting
code in DM targets because DM is so "special" that it doesn't need the
block core's splitting (even though it clearly would now benefit).

Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-29  5:22                 ` Ming Lei
@ 2016-08-29 21:57                   ` Mikulas Patocka
  2016-08-30  7:33                     ` Ming Lei
  0 siblings, 1 reply; 20+ messages in thread
From: Mikulas Patocka @ 2016-08-29 21:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig,
	open list:DEVICE-MAPPER (LVM),
	johnstonj.public, linux-block



On Mon, 29 Aug 2016, Ming Lei wrote:

> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >
> > On Fri, 26 Aug 2016, Mike Snitzer wrote:
> >
> >> On Thu, Aug 25 2016 at  4:13pm -0400,
> >> Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> >> > >
> >> > >Device mapper can't split the bio in generic_make_request - it frees the
> >> > >md->queue->bio_split bioset, to save one kernel thread per device. Device
> >> > >mapper uses its own splitting mechanism.
> >> > >
> >> > >So what is the final decision? - should device mapper split the big bio or
> >> > >should bcache not submit big bios?
> >> > >
> >> > >I think splitting big bios in the device mapper is better - simply because
> >> > >it is much less code than reworking bcache to split bios internally.
> >> > >
> >> > >BTW. In the device mapper, we have a layer dm-io, that was created to work
> >> > >around bio size limitations - it accepts unlimited I/O request and splits
> >> > >it to several bios. When bio size limitations are gone, we could simplify
> >> > >dm-io too.
> >> >
> >> > The patch from Ming Lei was applied for 4.8 the other day.
> >>
> >> See linux-block commit:
> >> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
> >
> > But this patch won't work for device mapper, blk_bio_segment_split is
> > called from blk_queue_split and device mapper doesn't use blk_queue_split
> > (it can't because it frees queue->bio_split).
> >
> > We still need these two patches:
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
> 
> About the 2nd patch, it might not be good enough because in theory
> a small size bio still may include big bvecs, such as, each bvec points
> to 512byte buffer, so strictly speaking the bvec number should
> be checked instead of bio size.
> 
> Ming Lei

This is not a problem.

dm-crypt allocates new pages for the bio and copies (and encrypts) the 
data from the original location to the new pages.

So yes, the original bio may have a long vector with many small fragments, 
but the newly allocated memory will be allocated in full pages and the 
outgoing bio's vector will have full pages.

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-29 18:19                 ` Mike Snitzer
@ 2016-08-29 22:01                   ` Mikulas Patocka
  0 siblings, 0 replies; 20+ messages in thread
From: Mikulas Patocka @ 2016-08-29 22:01 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Eric Wheeler, Christoph Hellwig, dm-devel,
	johnstonj.public, linux-block



On Mon, 29 Aug 2016, Mike Snitzer wrote:

> On Sat, Aug 27 2016 at 11:09am -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > 
> > 
> > On Fri, 26 Aug 2016, Mike Snitzer wrote:
> > 
> > > On Thu, Aug 25 2016 at  4:13pm -0400,
> > > Jens Axboe <axboe@kernel.dk> wrote:
> > > 
> > > > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
> > > > >
> > > > >Device mapper can't split the bio in generic_make_request - it frees the
> > > > >md->queue->bio_split bioset, to save one kernel thread per device. Device
> > > > >mapper uses its own splitting mechanism.
> > > > >
> > > > >So what is the final decision? - should device mapper split the big bio or
> > > > >should bcache not submit big bios?
> > > > >
> > > > >I think splitting big bios in the device mapper is better - simply because
> > > > >it is much less code than reworking bcache to split bios internally.
> > > > >
> > > > >BTW. In the device mapper, we have a layer dm-io, that was created to work
> > > > >around bio size limitations - it accepts unlimited I/O request and splits
> > > > >it to several bios. When bio size limitations are gone, we could simplify
> > > > >dm-io too.
> > > > 
> > > > The patch from Ming Lei was applied for 4.8 the other day.
> > > 
> > > See linux-block commit:
> > > 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
> > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
> > 
> > But this patch won't work for device mapper, blk_bio_segment_split is 
> > called from blk_queue_split and device mapper doesn't use blk_queue_split 
> > (it can't because it frees queue->bio_split).
> > 
> > We still need these two patches:
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
> 
> I'm left wondering whether it'd be better to revert commit dbba42d8a9e
> ("dm: eliminate unused "bioset" process for each bio-based DM device")

That would create one more process per dm device.

There is no need to create another process if the bio can be split in the 
device mapper.

> And also look for other areas of DM that could leverage the block core's
> relatively new splitting capabilities.
> 
> Otherwise, we'll just be sprinking more BIO_MAX_PAGES-based splitting
> code in DM targets because DM is so "special" that it doesn't need the
> block core's splitting (even though it clearly would now benefit).

It would be possible to split bios on BIO_MAX_PAGES boundary in the dm 
core. But I checked all the dm targets and I found problem only in 
dm-crypt and dm-log-writes. So there is no need to split bio bios for the 
other targets.

> Mike

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-29 21:57                   ` Mikulas Patocka
@ 2016-08-30  7:33                     ` Ming Lei
  2016-08-30 12:19                       ` Mikulas Patocka
  0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2016-08-30  7:33 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig,
	open list:DEVICE-MAPPER (LVM),
	johnstonj.public, linux-block

On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Mon, 29 Aug 2016, Ming Lei wrote:
>
>> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> >
>> >
>> > On Fri, 26 Aug 2016, Mike Snitzer wrote:
>> >
>> >> On Thu, Aug 25 2016 at  4:13pm -0400,
>> >> Jens Axboe <axboe@kernel.dk> wrote:
>> >>
>> >> > On 08/25/2016 12:34 PM, Mikulas Patocka wrote:
>> >> > >
>> >> > >Device mapper can't split the bio in generic_make_request - it frees the
>> >> > >md->queue->bio_split bioset, to save one kernel thread per device. Device
>> >> > >mapper uses its own splitting mechanism.
>> >> > >
>> >> > >So what is the final decision? - should device mapper split the big bio or
>> >> > >should bcache not submit big bios?
>> >> > >
>> >> > >I think splitting big bios in the device mapper is better - simply because
>> >> > >it is much less code than reworking bcache to split bios internally.
>> >> > >
>> >> > >BTW. In the device mapper, we have a layer dm-io, that was created to work
>> >> > >around bio size limitations - it accepts unlimited I/O request and splits
>> >> > >it to several bios. When bio size limitations are gone, we could simplify
>> >> > >dm-io too.
>> >> >
>> >> > The patch from Ming Lei was applied for 4.8 the other day.
>> >>
>> >> See linux-block commit:
>> >> 4d70dca4eadf2f block: make sure a big bio is split into at most 256 bvecs
>> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=4d70dca4eadf2f95abe389116ac02b8439c2d16c
>> >
>> > But this patch won't work for device mapper, blk_bio_segment_split is
>> > called from blk_queue_split and device mapper doesn't use blk_queue_split
>> > (it can't because it frees queue->bio_split).
>> >
>> > We still need these two patches:
>> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
>> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
>>
>> About the 2nd patch, it might not be good enough because in theory
>> a small size bio still may include big bvecs, such as, each bvec points
>> to 512byte buffer, so strictly speaking the bvec number should
>> be checked instead of bio size.
>>
>> Ming Lei
>
> This is not a problem.

I meant the following code in your 2nd patch:

+ if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
+    (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
+ dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);

And the check on .bi_size may not work.

>
> dm-crypt allocates new pages for the bio and copies (and encrypts) the
> data from the original location to the new pages.
>
> So yes, the original bio may have a long vector with many small fragments,
> but the newly allocated memory will be allocated in full pages and the
> outgoing bio's vector will have full pages.
>
> Mikulas



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-30  7:33                     ` Ming Lei
@ 2016-08-30 12:19                       ` Mikulas Patocka
  2016-08-30 20:57                         ` Mike Snitzer
  0 siblings, 1 reply; 20+ messages in thread
From: Mikulas Patocka @ 2016-08-30 12:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: Mike Snitzer, Jens Axboe, Eric Wheeler, Christoph Hellwig,
	open list:DEVICE-MAPPER (LVM),
	johnstonj.public, linux-block



On Tue, 30 Aug 2016, Ming Lei wrote:

> On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >
> > On Mon, 29 Aug 2016, Ming Lei wrote:
> >
> >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >> >
> >> > But this patch won't work for device mapper, blk_bio_segment_split is
> >> > called from blk_queue_split and device mapper doesn't use blk_queue_split
> >> > (it can't because it frees queue->bio_split).
> >> >
> >> > We still need these two patches:
> >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
> >>
> >> About the 2nd patch, it might not be good enough because in theory
> >> a small size bio still may include big bvecs, such as, each bvec points
> >> to 512byte buffer, so strictly speaking the bvec number should
> >> be checked instead of bio size.
> >>
> >> Ming Lei
> >
> > This is not a problem.
> 
> I meant the following code in your 2nd patch:
> 
> + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
> +    (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
> + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
> 
> And the check on .bi_size may not work.

kcryptd_crypt_write_convert calls:
crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size)

crypt_alloc_buffer does:
unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs);

So, if io->base_bio->bi_iter.bi_size <= BIO_MAX_SIZE, then nr_iovecs will 
be less or equal than BIO_MAX_PAGES and the function bio_alloc_bioset will 
succeed.

(BTW. BIO_MAX_SIZE was removed in the current kernel, we should use 
(BIO_MAX_PAGES << PAGE_SHIFT) instead).

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-30 12:19                       ` Mikulas Patocka
@ 2016-08-30 20:57                         ` Mike Snitzer
  2016-08-30 22:27                           ` Mikulas Patocka
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-08-30 20:57 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig,
	open list:DEVICE-MAPPER (LVM),
	johnstonj.public, linux-block

On Tue, Aug 30 2016 at  8:19P -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Tue, 30 Aug 2016, Ming Lei wrote:
> 
> > On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > >
> > > On Mon, 29 Aug 2016, Ming Lei wrote:
> > >
> > >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >> >
> > >> > But this patch won't work for device mapper, blk_bio_segment_split is
> > >> > called from blk_queue_split and device mapper doesn't use blk_queue_split
> > >> > (it can't because it frees queue->bio_split).
> > >> >
> > >> > We still need these two patches:
> > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
> > >>
> > >> About the 2nd patch, it might not be good enough because in theory
> > >> a small size bio still may include big bvecs, such as, each bvec points
> > >> to 512byte buffer, so strictly speaking the bvec number should
> > >> be checked instead of bio size.
> > >>
> > >> Ming Lei
> > >
> > > This is not a problem.
> > 
> > I meant the following code in your 2nd patch:
> > 
> > + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
> > +    (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
> > + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
> > 
> > And the check on .bi_size may not work.
> 
> kcryptd_crypt_write_convert calls:
> crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size)
> 
> crypt_alloc_buffer does:
> unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs);
> 
> So, if io->base_bio->bi_iter.bi_size <= BIO_MAX_SIZE, then nr_iovecs will 
> be less or equal than BIO_MAX_PAGES and the function bio_alloc_bioset will 
> succeed.
> 
> (BTW. BIO_MAX_SIZE was removed in the current kernel, we should use 
> (BIO_MAX_PAGES << PAGE_SHIFT) instead).

Is this revised patch OK with you?

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 30 Aug 2016 16:38:42 -0400
Subject: [PATCH] dm crypt: fix error with too large bcache bios

When dm-crypt processes writes, it allocates a new bio in
crypt_alloc_buffer().  The bio is allocated from a bio set and it can
have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
larger if it was allocated by bcache.  If the incoming bio is larger,
bio_alloc_bioset() fails and an error is returned.

To avoid the error, we test for a too large bio in the function
crypt_map() and use dm_accept_partial_bio() to split the bio.
dm_accept_partial_bio() trims the current bio to the desired size and
asks DM core to send another bio with the rest of the data.

This fix is wrapped with a check for CONFIG_BCACHE because there
currently isn't any other code that generates too large bios.  So unless
bcache is configured there is no point wasting time making this check.

Signed-off-by: Mikulas Patocka <mpatocka redhat com>
Cc: stable@vger.kernel.org # v3.16+
---
 drivers/md/dm-crypt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index eedba67..743f548 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1924,6 +1924,12 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_REMAPPED;
 	}
 
+#ifdef CONFIG_BCACHE
+	if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) &&
+	    bio_data_dir(bio) == WRITE)
+		dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT));
+#endif
+
 	io = dm_per_bio_data(bio, cc->per_bio_data_size);
 	crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
 	io->ctx.req = (struct skcipher_request *)(io + 1);
-- 
2.7.4 (Apple Git-66)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-30 20:57                         ` Mike Snitzer
@ 2016-08-30 22:27                           ` Mikulas Patocka
  2016-08-31  6:26                             ` Milan Broz
  0 siblings, 1 reply; 20+ messages in thread
From: Mikulas Patocka @ 2016-08-30 22:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig,
	open list:DEVICE-MAPPER (LVM),
	johnstonj.public, linux-block



On Tue, 30 Aug 2016, Mike Snitzer wrote:

> On Tue, Aug 30 2016 at  8:19P -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > 
> > 
> > On Tue, 30 Aug 2016, Ming Lei wrote:
> > 
> > > On Tue, Aug 30, 2016 at 5:57 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > >
> > > >
> > > > On Mon, 29 Aug 2016, Ming Lei wrote:
> > > >
> > > >> On Sat, Aug 27, 2016 at 11:09 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > >> >
> > > >> > But this patch won't work for device mapper, blk_bio_segment_split is
> > > >> > called from blk_queue_split and device mapper doesn't use blk_queue_split
> > > >> > (it can't because it frees queue->bio_split).
> > > >> >
> > > >> > We still need these two patches:
> > > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00211.html
> > > >> > https://www.redhat.com/archives/dm-devel/2016-May/msg00210.html
> > > >>
> > > >> About the 2nd patch, it might not be good enough because in theory
> > > >> a small size bio still may include big bvecs, such as, each bvec points
> > > >> to 512byte buffer, so strictly speaking the bvec number should
> > > >> be checked instead of bio size.
> > > >>
> > > >> Ming Lei
> > > >
> > > > This is not a problem.
> > > 
> > > I meant the following code in your 2nd patch:
> > > 
> > > + if (unlikely(bio->bi_iter.bi_size > BIO_MAX_SIZE) &&
> > > +    (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD | REQ_WRITE)) == REQ_WRITE)
> > > + dm_accept_partial_bio(bio, BIO_MAX_SIZE >> SECTOR_SHIFT);
> > > 
> > > And the check on .bi_size may not work.
> > 
> > kcryptd_crypt_write_convert calls:
> > crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size)
> > 
> > crypt_alloc_buffer does:
> > unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs);
> > 
> > So, if io->base_bio->bi_iter.bi_size <= BIO_MAX_SIZE, then nr_iovecs will 
> > be less or equal than BIO_MAX_PAGES and the function bio_alloc_bioset will 
> > succeed.
> > 
> > (BTW. BIO_MAX_SIZE was removed in the current kernel, we should use 
> > (BIO_MAX_PAGES << PAGE_SHIFT) instead).
> 
> Is this revised patch OK with you?

Drop that "#ifdef CONFIG_BCACHE". Anyone should be allowed to create a big 
bio, not just bcache.

That one test has no performance impact, there is no need to hide it 
behind #ifdef.

Mikulas

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 30 Aug 2016 16:38:42 -0400
> Subject: [PATCH] dm crypt: fix error with too large bcache bios
> 
> When dm-crypt processes writes, it allocates a new bio in
> crypt_alloc_buffer().  The bio is allocated from a bio set and it can
> have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
> larger if it was allocated by bcache.  If the incoming bio is larger,
> bio_alloc_bioset() fails and an error is returned.
> 
> To avoid the error, we test for a too large bio in the function
> crypt_map() and use dm_accept_partial_bio() to split the bio.
> dm_accept_partial_bio() trims the current bio to the desired size and
> asks DM core to send another bio with the rest of the data.
> 
> This fix is wrapped with a check for CONFIG_BCACHE because there
> currently isn't any other code that generates too large bios.  So unless
> bcache is configured there is no point wasting time making this check.
> 
> Signed-off-by: Mikulas Patocka <mpatocka redhat com>
> Cc: stable@vger.kernel.org # v3.16+
> ---
>  drivers/md/dm-crypt.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index eedba67..743f548 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -1924,6 +1924,12 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
>  		return DM_MAPIO_REMAPPED;
>  	}
>  
> +#ifdef CONFIG_BCACHE
> +	if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) &&
> +	    bio_data_dir(bio) == WRITE)
> +		dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT));
> +#endif
> +
>  	io = dm_per_bio_data(bio, cc->per_bio_data_size);
>  	crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
>  	io->ctx.req = (struct skcipher_request *)(io + 1);
> -- 
> 2.7.4 (Apple Git-66)
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-30 22:27                           ` Mikulas Patocka
@ 2016-08-31  6:26                             ` Milan Broz
  2016-08-31 13:39                               ` Mike Snitzer
  0 siblings, 1 reply; 20+ messages in thread
From: Milan Broz @ 2016-08-31  6:26 UTC (permalink / raw)
  To: Mikulas Patocka, Mike Snitzer
  Cc: Ming Lei, Jens Axboe, Eric Wheeler, Christoph Hellwig,
	johnstonj.public, linux-block

On 08/31/2016 12:27 AM, Mikulas Patocka wrote:

...
> 
> Drop that "#ifdef CONFIG_BCACHE". Anyone should be allowed to create a big 
> bio, not just bcache.

Yes. Please, do not hide it behind #ifdef.
If it is in code, it should be enabled always.

There can third party modules or some new code appears and creating strange
config dependence only adds more problems later.

Milan

> 
> That one test has no performance impact, there is no need to hide it 
> behind #ifdef.
> 
> Mikulas
> 
>> From: Mikulas Patocka <mpatocka@redhat.com>
>> Date: Tue, 30 Aug 2016 16:38:42 -0400
>> Subject: [PATCH] dm crypt: fix error with too large bcache bios
>>
>> When dm-crypt processes writes, it allocates a new bio in
>> crypt_alloc_buffer().  The bio is allocated from a bio set and it can
>> have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
>> larger if it was allocated by bcache.  If the incoming bio is larger,
>> bio_alloc_bioset() fails and an error is returned.
>>
>> To avoid the error, we test for a too large bio in the function
>> crypt_map() and use dm_accept_partial_bio() to split the bio.
>> dm_accept_partial_bio() trims the current bio to the desired size and
>> asks DM core to send another bio with the rest of the data.
>>
>> This fix is wrapped with a check for CONFIG_BCACHE because there
>> currently isn't any other code that generates too large bios.  So unless
>> bcache is configured there is no point wasting time making this check.
>>
>> Signed-off-by: Mikulas Patocka <mpatocka redhat com>
>> Cc: stable@vger.kernel.org # v3.16+
>> ---
>>  drivers/md/dm-crypt.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
>> index eedba67..743f548 100644
>> --- a/drivers/md/dm-crypt.c
>> +++ b/drivers/md/dm-crypt.c
>> @@ -1924,6 +1924,12 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
>>  		return DM_MAPIO_REMAPPED;
>>  	}
>>  
>> +#ifdef CONFIG_BCACHE
>> +	if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) &&
>> +	    bio_data_dir(bio) == WRITE)
>> +		dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT));
>> +#endif
>> +
>>  	io = dm_per_bio_data(bio, cc->per_bio_data_size);
>>  	crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
>>  	io->ctx.req = (struct skcipher_request *)(io + 1);
>> -- 
>> 2.7.4 (Apple Git-66)
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dm-crypt: Fix error with too large bios
  2016-08-31  6:26                             ` Milan Broz
@ 2016-08-31 13:39                               ` Mike Snitzer
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Snitzer @ 2016-08-31 13:39 UTC (permalink / raw)
  To: Milan Broz
  Cc: Mikulas Patocka, Ming Lei, Jens Axboe, Eric Wheeler,
	Christoph Hellwig, johnstonj.public, linux-block

On Wed, Aug 31 2016 at  2:26am -0400,
Milan Broz <gmazyland@gmail.com> wrote:

> On 08/31/2016 12:27 AM, Mikulas Patocka wrote:
> 
> ...
> > 
> > Drop that "#ifdef CONFIG_BCACHE". Anyone should be allowed to create a big 
> > bio, not just bcache.
> 
> Yes. Please, do not hide it behind #ifdef.
> If it is in code, it should be enabled always.
> 
> There can third party modules or some new code appears and creating strange
> config dependence only adds more problems later.

I did last night, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=4e870e948fbabf62b78e8410f04c67703e7c816b

Will go to Linus for v4.8-rc5 by the end of the week.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-08-31 13:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11  3:56 [PATCH] Re: dm-crypt: Fix error with too large bios Eric Wheeler
2016-08-13 17:45 ` Mikulas Patocka
2016-08-14  0:07   ` Mike Snitzer
2016-08-15 17:03     ` Mikulas Patocka
2016-08-19  0:47       ` Eric Wheeler
2016-08-25 18:34         ` [dm-devel] " Mikulas Patocka
2016-08-25 18:34           ` Mikulas Patocka
2016-08-25 20:13           ` [dm-devel] " Jens Axboe
2016-08-26 14:05             ` Mike Snitzer
2016-08-27 15:09               ` Mikulas Patocka
2016-08-29  5:22                 ` Ming Lei
2016-08-29 21:57                   ` Mikulas Patocka
2016-08-30  7:33                     ` Ming Lei
2016-08-30 12:19                       ` Mikulas Patocka
2016-08-30 20:57                         ` Mike Snitzer
2016-08-30 22:27                           ` Mikulas Patocka
2016-08-31  6:26                             ` Milan Broz
2016-08-31 13:39                               ` Mike Snitzer
2016-08-29 18:19                 ` Mike Snitzer
2016-08-29 22:01                   ` Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.