linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible changes to bio cloning and some related stuff
@ 2011-12-06  5:11 Kent Overstreet
       [not found] ` <20111206051101.GA8171-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Kent Overstreet @ 2011-12-06  5:11 UTC (permalink / raw)
  To: neilb-l3A5Bk7waGM, tejun-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA

So, I finally got around to debugging various bcache on md issues, and I
ran into a rather sticky problem:

bio_alloc() can fail if nr_iovecs > BIO_MAX_PAGES. That itself is not a
problem, but then when a bio is cloned it's always done by cloning the
_entire_ original bio vec, from 0 to max_vecs - not the range from
bi_idx to bi_vcnt.

Basically, whenever bcache generates some io internally it uses a single
bio to describe the entire io - regardless of whether or not the bio
would be too big for the underlying device; it then splits the bio as
many times as need be when it's actually submitted.

This works beautifully for dumb drivers - I'm actually planning on
making my code generic and integrating it with the block layer so that
the same approach could be easily used by other code that generates
bios, it would allow a _lot_ of code to be deleted from the kernel.

But for stacking drivers, the mere existence of a bio with max_vecs >
BIO_MAX_PAGES breaks things, regardless of how many pages are actually
being used in this bio.

So, IMO __bio_clone(), bio_clone_mddev(), and whatever other code ought
to be changed to only copy bi_idx to bi_vcnt from the original bio -
it'd make it consistent with how bios are used elsewhere. Thoughts? The
actual patches should be trivial, it'll mostly just be a matter of
grepping around and finding everything, I think.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Possible changes to bio cloning and some related stuff
       [not found] ` <20111206051101.GA8171-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
@ 2011-12-06  5:36   ` NeilBrown
  0 siblings, 0 replies; 2+ messages in thread
From: NeilBrown @ 2011-12-06  5:36 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: tejun-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1887 bytes --]

On Mon, 5 Dec 2011 21:11:01 -0800 Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
wrote:

> So, I finally got around to debugging various bcache on md issues, and I
> ran into a rather sticky problem:
> 
> bio_alloc() can fail if nr_iovecs > BIO_MAX_PAGES. That itself is not a
> problem, but then when a bio is cloned it's always done by cloning the
> _entire_ original bio vec, from 0 to max_vecs - not the range from
> bi_idx to bi_vcnt.
> 
> Basically, whenever bcache generates some io internally it uses a single
> bio to describe the entire io - regardless of whether or not the bio
> would be too big for the underlying device; it then splits the bio as
> many times as need be when it's actually submitted.
> 
> This works beautifully for dumb drivers - I'm actually planning on
> making my code generic and integrating it with the block layer so that
> the same approach could be easily used by other code that generates
> bios, it would allow a _lot_ of code to be deleted from the kernel.

Sounds promising.

> 
> But for stacking drivers, the mere existence of a bio with max_vecs >
> BIO_MAX_PAGES breaks things, regardless of how many pages are actually
> being used in this bio.
> 
> So, IMO __bio_clone(), bio_clone_mddev(), and whatever other code ought
> to be changed to only copy bi_idx to bi_vcnt from the original bio -
> it'd make it consistent with how bios are used elsewhere. Thoughts? The
> actual patches should be trivial, it'll mostly just be a matter of
> grepping around and finding everything, I think.

I'm not against this.  There are a few places where md assumes there is a 1-1
mapping between original and cloned bios.  It probably wouldn't be too hard
to find those and adjust them to your new scheme.

So if you write patches I'll help make sure they work on md.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-12-06  5:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-06  5:11 Possible changes to bio cloning and some related stuff Kent Overstreet
     [not found] ` <20111206051101.GA8171-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2011-12-06  5:36   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).