All of lore.kernel.org
 help / color / mirror / Atom feed
* linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains"
@ 2015-05-13 19:28 Mike Snitzer
  2015-05-14  2:49 ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2015-05-13 19:28 UTC (permalink / raw)
  To: axboe; +Cc: dm-devel, hch

The device-mapper-test-suite (dmts) thinp tests no longer run due to
linux-block.git commit c4cf5261 ("bio: skip atomic inc/dec of
->bi_remaining for non-chains")

That commit has been the focus of some other work I have pending because
I was forced to adapt it due to bi_remaining no longer being accessible
directly.  So it was surprising/interesting for this commit to take
center stage in the context of dm-thinp regression.

bios don't get submitted (simplest reproducer is the 'dd_benchmark' test
in the dmts thin-provisioning suite which just uses dd with oflag=direct
immediately after the DM thin device is created).  The top-level DM
device stays at 100% utilization but no progress is made.  Reverting
the commit in question resolves the problem.

But I'm able to use lvm2 to create thin devices that allow IO to
complete; so I haven't figured out what is so special about dmts (which
uses dmsetup directly instead of lvm2).

I'll see if I can figure out what might be happening... but would
welcome more eyes on this to see if anything stands out relative to the
dm-thin.c changes.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains"
  2015-05-13 19:28 linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains" Mike Snitzer
@ 2015-05-14  2:49 ` Mike Snitzer
  2015-05-14  3:29   ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2015-05-14  2:49 UTC (permalink / raw)
  To: axboe; +Cc: Robert Elliott, dm-devel, Jan Kara, hch, Kent Overstreet

On Wed, May 13 2015 at  3:28pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> The device-mapper-test-suite (dmts) thinp tests no longer run due to
> linux-block.git commit c4cf5261 ("bio: skip atomic inc/dec of
> ->bi_remaining for non-chains")
> 
> That commit has been the focus of some other work I have pending because
> I was forced to adapt it due to bi_remaining no longer being accessible
> directly.  So it was surprising/interesting for this commit to take
> center stage in the context of dm-thinp regression.
> 
> bios don't get submitted (simplest reproducer is the 'dd_benchmark' test
> in the dmts thin-provisioning suite which just uses dd with oflag=direct
> immediately after the DM thin device is created).  The top-level DM
> device stays at 100% utilization but no progress is made.  Reverting
> the commit in question resolves the problem.
> 
> But I'm able to use lvm2 to create thin devices that allow IO to
> complete; so I haven't figured out what is so special about dmts (which
> uses dmsetup directly instead of lvm2).
> 
> I'll see if I can figure out what might be happening... but would
> welcome more eyes on this to see if anything stands out relative to the
> dm-thin.c changes.

I've added some tracing.  This issue is isolated to the bio-chain that
is used for dm-thinp's overwrite support -- and other code that follows
this bio-chain pattern (so all non-blk-core bio_inc_remaining() callers
in commit c4cf5261.

In dm-thin.c replace bi_end_io to point to overwrite_endio, at that time
__bi_remaining is 1 (but the new BIO_CHAIN flag is _not_ set):

 device-mapper: thin: save_and_set_endio: before overwrite sector=1536 bi_remaining=1

overwrite_endio() is called and the original end_io restored with
accompanying call to bio_inc_remaining():

 device-mapper: thin: overwrite_endio: before inc sector=1536 bi_remaining=1 
 device-mapper: thin: overwrite_endio:  after inc sector=1536 bi_remaining=2

Then bio_endio() is later called but the original endio won't get called
since __bi_remaining=2 (endio would only drop it to 1).

As you know commit c4cf5261 avoids the atomic operations during endio
unless BIO_CHAIN is set.  It only gets set if bio_inc_remaining() is
used.  But bio_inc_remaining() is only used _after_ the initial endio --
we needed BIO_CHAIN set earlier for these cases.  Seems we need a new
function to register the intent to cascade endios (to allow for proper
accounting)?

We could then switch callers to something like:

  bio_set_chain(bio); // establishes BIO_CHAIN iff __bio_remaining is 1?
  old = io->bi_end_io;
  bio->bi_end_io = new;

 in new():
  bio->bi_end_io = old;
  bio_inc_remaining(bio)

Anyway, not sure about the proper fix -- I do know this commit is a
definite regression (but it does nicely avoid the costly atomics ;)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains"
  2015-05-14  2:49 ` Mike Snitzer
@ 2015-05-14  3:29   ` Mike Snitzer
  2015-05-14  7:58     ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2015-05-14  3:29 UTC (permalink / raw)
  To: axboe; +Cc: Robert Elliott, dm-devel, Jan Kara, hch, Kent Overstreet

On Wed, May 13 2015 at 10:49pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> As you know commit c4cf5261 avoids the atomic operations during endio
> unless BIO_CHAIN is set.  It only gets set if bio_inc_remaining() is
> used.  But bio_inc_remaining() is only used _after_ the initial endio --
> we needed BIO_CHAIN set earlier for these cases.  Seems we need a new
> function to register the intent to cascade endios (to allow for proper
> accounting)?
> 
> We could then switch callers to something like:
> 
>   bio_set_chain(bio); // establishes BIO_CHAIN iff __bio_remaining is 1?
>   old = io->bi_end_io;
>   bio->bi_end_io = new;
> 
>  in new():
>   bio->bi_end_io = old;
>   bio_inc_remaining(bio)
> 
> Anyway, not sure about the proper fix -- I do know this commit is a
> definite regression (but it does nicely avoid the costly atomics ;)

Just adding this to dm-thin.c:save_and_set_endio() fixed it:
 bio->bi_flags |= (1 << BIO_CHAIN);

(setting a bit doesn't need a bio_set_chain fn like I suggested above).

But, all relevant callers would need this treatment (I'd imagine
bio_endio_nodec callers need help too? -- either that or __bi_remaining
is completely irrelevant for all old users except for bio_chain()).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains"
  2015-05-14  3:29   ` Mike Snitzer
@ 2015-05-14  7:58     ` Jan Kara
  2015-05-14 15:05       ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2015-05-14  7:58 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, Jan Kara, Kent Overstreet, dm-devel, Robert Elliott, hch

On Wed 13-05-15 23:29:53, Mike Snitzer wrote:
> On Wed, May 13 2015 at 10:49pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>  
> > As you know commit c4cf5261 avoids the atomic operations during endio
> > unless BIO_CHAIN is set.  It only gets set if bio_inc_remaining() is
> > used.  But bio_inc_remaining() is only used _after_ the initial endio --
> > we needed BIO_CHAIN set earlier for these cases.  Seems we need a new
> > function to register the intent to cascade endios (to allow for proper
> > accounting)?
> > 
> > We could then switch callers to something like:
> > 
> >   bio_set_chain(bio); // establishes BIO_CHAIN iff __bio_remaining is 1?
> >   old = io->bi_end_io;
> >   bio->bi_end_io = new;
> > 
> >  in new():
> >   bio->bi_end_io = old;
> >   bio_inc_remaining(bio)
> > 
> > Anyway, not sure about the proper fix -- I do know this commit is a
> > definite regression (but it does nicely avoid the costly atomics ;)
> 
> Just adding this to dm-thin.c:save_and_set_endio() fixed it:
>  bio->bi_flags |= (1 << BIO_CHAIN);
> 
> (setting a bit doesn't need a bio_set_chain fn like I suggested above).
> 
> But, all relevant callers would need this treatment (I'd imagine
> bio_endio_nodec callers need help too? -- either that or __bi_remaining
> is completely irrelevant for all old users except for bio_chain()).
  From my mostly outsider view of block layer and dm, it seems that
dm-thinp isn't really interested in bi_remaining value. It just needs
bio_endio() to call the original completion function and to achieve that
dm-thinp increments bi_remaining.

What we could do is the following:
1) bio_remaining_done() would clear BIO_CHAIN flag when __bi_remaining()
   drops to 0. That way bio_endio() will end up calling ->bi_end_io() always
   as soon as __bi_remaining drops to 0 or if bio was never chained. This
   wouldn't be needed if nobody can chain on bios previously modified by
   e.g. dm-thinp but AFAIU we cannot assume that.
2) remove bio_inc_remaining() from dm-thinp, dm-snap, dm-raid1,
   dm-cache-target as they all look like situations where we only want newly
   set bi_end_io function to be called by bio_endio().
3) as a bonus bi_remaining handling is now fully inside block/bio.c and
   bio_inc_remaining() can be static there.

Thoughts?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains"
  2015-05-14  7:58     ` Jan Kara
@ 2015-05-14 15:05       ` Mike Snitzer
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Snitzer @ 2015-05-14 15:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: axboe, Robert Elliott, dm-devel, hch, Kent Overstreet

On Thu, May 14 2015 at  3:58am -0400,
Jan Kara <jack@suse.cz> wrote:

> On Wed 13-05-15 23:29:53, Mike Snitzer wrote:
> > On Wed, May 13 2015 at 10:49pm -0400,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >  
> > > As you know commit c4cf5261 avoids the atomic operations during endio
> > > unless BIO_CHAIN is set.  It only gets set if bio_inc_remaining() is
> > > used.  But bio_inc_remaining() is only used _after_ the initial endio --
> > > we needed BIO_CHAIN set earlier for these cases.  Seems we need a new
> > > function to register the intent to cascade endios (to allow for proper
> > > accounting)?
> > > 
> > > We could then switch callers to something like:
> > > 
> > >   bio_set_chain(bio); // establishes BIO_CHAIN iff __bio_remaining is 1?
> > >   old = io->bi_end_io;
> > >   bio->bi_end_io = new;
> > > 
> > >  in new():
> > >   bio->bi_end_io = old;
> > >   bio_inc_remaining(bio)
> > > 
> > > Anyway, not sure about the proper fix -- I do know this commit is a
> > > definite regression (but it does nicely avoid the costly atomics ;)
> > 
> > Just adding this to dm-thin.c:save_and_set_endio() fixed it:
> >  bio->bi_flags |= (1 << BIO_CHAIN);
> > 
> > (setting a bit doesn't need a bio_set_chain fn like I suggested above).
> > 
> > But, all relevant callers would need this treatment (I'd imagine
> > bio_endio_nodec callers need help too? -- either that or __bi_remaining
> > is completely irrelevant for all old users except for bio_chain()).
>   From my mostly outsider view of block layer and dm, it seems that
> dm-thinp isn't really interested in bi_remaining value. It just needs
> bio_endio() to call the original completion function and to achieve that
> dm-thinp increments bi_remaining.

None of the existing users (including bio_endio_nodec) _really_ care
about the specifics of bi_remaining.  But they have to care because that
is how the implementation is.  It is unfortunate bi_remaining (or more
specifically the need to increment or not) was ever exposed to begin
with.

All these callers only ever care about their endio getting called.
 
> What we could do is the following:
> 1) bio_remaining_done() would clear BIO_CHAIN flag when __bi_remaining()
>    drops to 0. That way bio_endio() will end up calling ->bi_end_io() always
>    as soon as __bi_remaining drops to 0 or if bio was never chained. This
>    wouldn't be needed if nobody can chain on bios previously modified by
>    e.g. dm-thinp but AFAIU we cannot assume that.

Not completely following what you're proposing or how it helps fix this
regression considering 1) something will have needed to set BIO_CHAIN to
begin with (you're missing where that'd happen). 2) the entire intent
behind commit c4cf5261 is to optimize away the atomic operations.

So, given that is the intent.. do we just rip out all the callers' code
that tries to adjust __bi_remaining?

Then the only case that opts in to the BIO_CHAIN atomic operation
accounting is bio_chain().

> 2) remove bio_inc_remaining() from dm-thinp, dm-snap, dm-raid1,
>    dm-cache-target as they all look like situations where we only want newly
>    set bi_end_io function to be called by bio_endio().

Yes that is correct.

> 3) as a bonus bi_remaining handling is now fully inside block/bio.c and
>    bio_inc_remaining() can be static there.

Almost, we still have bio_endio_nodec() to deal with.  Which is the same
pattern as we've been discussing.  It just happens to be microoptimized
via api to:
1) reestablish the final endio
2) immediately call that final endio (rather than arm the bio to have
   endio called at a later time).

So I _think_ bio_endio_nodec() would go away completely.

The devil is in the details.  I'll have a go at implementing this and
see what falls out.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-05-14 15:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13 19:28 linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains" Mike Snitzer
2015-05-14  2:49 ` Mike Snitzer
2015-05-14  3:29   ` Mike Snitzer
2015-05-14  7:58     ` Jan Kara
2015-05-14 15:05       ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.