* linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains" @ 2015-05-13 19:28 Mike Snitzer 2015-05-14 2:49 ` Mike Snitzer 0 siblings, 1 reply; 5+ messages in thread From: Mike Snitzer @ 2015-05-13 19:28 UTC (permalink / raw) To: axboe; +Cc: dm-devel, hch The device-mapper-test-suite (dmts) thinp tests no longer run due to linux-block.git commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains") That commit has been the focus of some other work I have pending because I was forced to adapt it due to bi_remaining no longer being accessible directly. So it was surprising/interesting for this commit to take center stage in the context of dm-thinp regression. bios don't get submitted (simplest reproducer is the 'dd_benchmark' test in the dmts thin-provisioning suite which just uses dd with oflag=direct immediately after the DM thin device is created). The top-level DM device stays at 100% utilization but no progress is made. Reverting the commit in question resolves the problem. But I'm able to use lvm2 to create thin devices that allow IO to complete; so I haven't figured out what is so special about dmts (which uses dmsetup directly instead of lvm2). I'll see if I can figure out what might be happening... but would welcome more eyes on this to see if anything stands out relative to the dm-thin.c changes. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains" 2015-05-13 19:28 linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains" Mike Snitzer @ 2015-05-14 2:49 ` Mike Snitzer 2015-05-14 3:29 ` Mike Snitzer 0 siblings, 1 reply; 5+ messages in thread From: Mike Snitzer @ 2015-05-14 2:49 UTC (permalink / raw) To: axboe; +Cc: Robert Elliott, dm-devel, Jan Kara, hch, Kent Overstreet On Wed, May 13 2015 at 3:28pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > The device-mapper-test-suite (dmts) thinp tests no longer run due to > linux-block.git commit c4cf5261 ("bio: skip atomic inc/dec of > ->bi_remaining for non-chains") > > That commit has been the focus of some other work I have pending because > I was forced to adapt it due to bi_remaining no longer being accessible > directly. So it was surprising/interesting for this commit to take > center stage in the context of dm-thinp regression. > > bios don't get submitted (simplest reproducer is the 'dd_benchmark' test > in the dmts thin-provisioning suite which just uses dd with oflag=direct > immediately after the DM thin device is created). The top-level DM > device stays at 100% utilization but no progress is made. Reverting > the commit in question resolves the problem. > > But I'm able to use lvm2 to create thin devices that allow IO to > complete; so I haven't figured out what is so special about dmts (which > uses dmsetup directly instead of lvm2). > > I'll see if I can figure out what might be happening... but would > welcome more eyes on this to see if anything stands out relative to the > dm-thin.c changes. I've added some tracing. This issue is isolated to the bio-chain that is used for dm-thinp's overwrite support -- and other code that follows this bio-chain pattern (so all non-blk-core bio_inc_remaining() callers in commit c4cf5261. In dm-thin.c replace bi_end_io to point to overwrite_endio, at that time __bi_remaining is 1 (but the new BIO_CHAIN flag is _not_ set): device-mapper: thin: save_and_set_endio: before overwrite sector=1536 bi_remaining=1 overwrite_endio() is called and the original end_io restored with accompanying call to bio_inc_remaining(): device-mapper: thin: overwrite_endio: before inc sector=1536 bi_remaining=1 device-mapper: thin: overwrite_endio: after inc sector=1536 bi_remaining=2 Then bio_endio() is later called but the original endio won't get called since __bi_remaining=2 (endio would only drop it to 1). As you know commit c4cf5261 avoids the atomic operations during endio unless BIO_CHAIN is set. It only gets set if bio_inc_remaining() is used. But bio_inc_remaining() is only used _after_ the initial endio -- we needed BIO_CHAIN set earlier for these cases. Seems we need a new function to register the intent to cascade endios (to allow for proper accounting)? We could then switch callers to something like: bio_set_chain(bio); // establishes BIO_CHAIN iff __bio_remaining is 1? old = io->bi_end_io; bio->bi_end_io = new; in new(): bio->bi_end_io = old; bio_inc_remaining(bio) Anyway, not sure about the proper fix -- I do know this commit is a definite regression (but it does nicely avoid the costly atomics ;) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains" 2015-05-14 2:49 ` Mike Snitzer @ 2015-05-14 3:29 ` Mike Snitzer 2015-05-14 7:58 ` Jan Kara 0 siblings, 1 reply; 5+ messages in thread From: Mike Snitzer @ 2015-05-14 3:29 UTC (permalink / raw) To: axboe; +Cc: Robert Elliott, dm-devel, Jan Kara, hch, Kent Overstreet On Wed, May 13 2015 at 10:49pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > As you know commit c4cf5261 avoids the atomic operations during endio > unless BIO_CHAIN is set. It only gets set if bio_inc_remaining() is > used. But bio_inc_remaining() is only used _after_ the initial endio -- > we needed BIO_CHAIN set earlier for these cases. Seems we need a new > function to register the intent to cascade endios (to allow for proper > accounting)? > > We could then switch callers to something like: > > bio_set_chain(bio); // establishes BIO_CHAIN iff __bio_remaining is 1? > old = io->bi_end_io; > bio->bi_end_io = new; > > in new(): > bio->bi_end_io = old; > bio_inc_remaining(bio) > > Anyway, not sure about the proper fix -- I do know this commit is a > definite regression (but it does nicely avoid the costly atomics ;) Just adding this to dm-thin.c:save_and_set_endio() fixed it: bio->bi_flags |= (1 << BIO_CHAIN); (setting a bit doesn't need a bio_set_chain fn like I suggested above). But, all relevant callers would need this treatment (I'd imagine bio_endio_nodec callers need help too? -- either that or __bi_remaining is completely irrelevant for all old users except for bio_chain()). ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains" 2015-05-14 3:29 ` Mike Snitzer @ 2015-05-14 7:58 ` Jan Kara 2015-05-14 15:05 ` Mike Snitzer 0 siblings, 1 reply; 5+ messages in thread From: Jan Kara @ 2015-05-14 7:58 UTC (permalink / raw) To: Mike Snitzer Cc: axboe, Jan Kara, Kent Overstreet, dm-devel, Robert Elliott, hch On Wed 13-05-15 23:29:53, Mike Snitzer wrote: > On Wed, May 13 2015 at 10:49pm -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > > > As you know commit c4cf5261 avoids the atomic operations during endio > > unless BIO_CHAIN is set. It only gets set if bio_inc_remaining() is > > used. But bio_inc_remaining() is only used _after_ the initial endio -- > > we needed BIO_CHAIN set earlier for these cases. Seems we need a new > > function to register the intent to cascade endios (to allow for proper > > accounting)? > > > > We could then switch callers to something like: > > > > bio_set_chain(bio); // establishes BIO_CHAIN iff __bio_remaining is 1? > > old = io->bi_end_io; > > bio->bi_end_io = new; > > > > in new(): > > bio->bi_end_io = old; > > bio_inc_remaining(bio) > > > > Anyway, not sure about the proper fix -- I do know this commit is a > > definite regression (but it does nicely avoid the costly atomics ;) > > Just adding this to dm-thin.c:save_and_set_endio() fixed it: > bio->bi_flags |= (1 << BIO_CHAIN); > > (setting a bit doesn't need a bio_set_chain fn like I suggested above). > > But, all relevant callers would need this treatment (I'd imagine > bio_endio_nodec callers need help too? -- either that or __bi_remaining > is completely irrelevant for all old users except for bio_chain()). From my mostly outsider view of block layer and dm, it seems that dm-thinp isn't really interested in bi_remaining value. It just needs bio_endio() to call the original completion function and to achieve that dm-thinp increments bi_remaining. What we could do is the following: 1) bio_remaining_done() would clear BIO_CHAIN flag when __bi_remaining() drops to 0. That way bio_endio() will end up calling ->bi_end_io() always as soon as __bi_remaining drops to 0 or if bio was never chained. This wouldn't be needed if nobody can chain on bios previously modified by e.g. dm-thinp but AFAIU we cannot assume that. 2) remove bio_inc_remaining() from dm-thinp, dm-snap, dm-raid1, dm-cache-target as they all look like situations where we only want newly set bi_end_io function to be called by bio_endio(). 3) as a bonus bi_remaining handling is now fully inside block/bio.c and bio_inc_remaining() can be static there. Thoughts? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains" 2015-05-14 7:58 ` Jan Kara @ 2015-05-14 15:05 ` Mike Snitzer 0 siblings, 0 replies; 5+ messages in thread From: Mike Snitzer @ 2015-05-14 15:05 UTC (permalink / raw) To: Jan Kara; +Cc: axboe, Robert Elliott, dm-devel, hch, Kent Overstreet On Thu, May 14 2015 at 3:58am -0400, Jan Kara <jack@suse.cz> wrote: > On Wed 13-05-15 23:29:53, Mike Snitzer wrote: > > On Wed, May 13 2015 at 10:49pm -0400, > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > As you know commit c4cf5261 avoids the atomic operations during endio > > > unless BIO_CHAIN is set. It only gets set if bio_inc_remaining() is > > > used. But bio_inc_remaining() is only used _after_ the initial endio -- > > > we needed BIO_CHAIN set earlier for these cases. Seems we need a new > > > function to register the intent to cascade endios (to allow for proper > > > accounting)? > > > > > > We could then switch callers to something like: > > > > > > bio_set_chain(bio); // establishes BIO_CHAIN iff __bio_remaining is 1? > > > old = io->bi_end_io; > > > bio->bi_end_io = new; > > > > > > in new(): > > > bio->bi_end_io = old; > > > bio_inc_remaining(bio) > > > > > > Anyway, not sure about the proper fix -- I do know this commit is a > > > definite regression (but it does nicely avoid the costly atomics ;) > > > > Just adding this to dm-thin.c:save_and_set_endio() fixed it: > > bio->bi_flags |= (1 << BIO_CHAIN); > > > > (setting a bit doesn't need a bio_set_chain fn like I suggested above). > > > > But, all relevant callers would need this treatment (I'd imagine > > bio_endio_nodec callers need help too? -- either that or __bi_remaining > > is completely irrelevant for all old users except for bio_chain()). > From my mostly outsider view of block layer and dm, it seems that > dm-thinp isn't really interested in bi_remaining value. It just needs > bio_endio() to call the original completion function and to achieve that > dm-thinp increments bi_remaining. None of the existing users (including bio_endio_nodec) _really_ care about the specifics of bi_remaining. But they have to care because that is how the implementation is. It is unfortunate bi_remaining (or more specifically the need to increment or not) was ever exposed to begin with. All these callers only ever care about their endio getting called. > What we could do is the following: > 1) bio_remaining_done() would clear BIO_CHAIN flag when __bi_remaining() > drops to 0. That way bio_endio() will end up calling ->bi_end_io() always > as soon as __bi_remaining drops to 0 or if bio was never chained. This > wouldn't be needed if nobody can chain on bios previously modified by > e.g. dm-thinp but AFAIU we cannot assume that. Not completely following what you're proposing or how it helps fix this regression considering 1) something will have needed to set BIO_CHAIN to begin with (you're missing where that'd happen). 2) the entire intent behind commit c4cf5261 is to optimize away the atomic operations. So, given that is the intent.. do we just rip out all the callers' code that tries to adjust __bi_remaining? Then the only case that opts in to the BIO_CHAIN atomic operation accounting is bio_chain(). > 2) remove bio_inc_remaining() from dm-thinp, dm-snap, dm-raid1, > dm-cache-target as they all look like situations where we only want newly > set bi_end_io function to be called by bio_endio(). Yes that is correct. > 3) as a bonus bi_remaining handling is now fully inside block/bio.c and > bio_inc_remaining() can be static there. Almost, we still have bio_endio_nodec() to deal with. Which is the same pattern as we've been discussing. It just happens to be microoptimized via api to: 1) reestablish the final endio 2) immediately call that final endio (rather than arm the bio to have endio called at a later time). So I _think_ bio_endio_nodec() would go away completely. The devil is in the details. I'll have a go at implementing this and see what falls out. Thanks, Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-05-14 15:05 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-13 19:28 linux-next dm-thinp regression due to "bio: skip atomic inc/dec of ->bi_remaining for non-chains" Mike Snitzer 2015-05-14 2:49 ` Mike Snitzer 2015-05-14 3:29 ` Mike Snitzer 2015-05-14 7:58 ` Jan Kara 2015-05-14 15:05 ` Mike Snitzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.