* Re: block: be more careful about status in __bio_chain_endio [not found] ` <87eflmpqkb.fsf@notabene.neil.brown.name> @ 2019-02-22 21:10 ` Mike Snitzer 2019-02-22 22:46 ` Jens Axboe 0 siblings, 1 reply; 9+ messages in thread From: Mike Snitzer @ 2019-02-22 21:10 UTC (permalink / raw) To: NeilBrown Cc: Jens Axboe, linux-block, device-mapper development, Milan Broz, Linux Kernel Mailing List On Thu, Feb 15 2018 at 4:09am -0500, NeilBrown <neilb@suse.com> wrote: > > If two bios are chained under the one parent (with bio_chain()) > it is possible that one will succeed and the other will fail. > __bio_chain_endio must ensure that the failure error status > is reported for the whole, rather than the success. > > It currently tries to be careful, but this test is racy. > If both children finish at the same time, they might both see that > parent->bi_status as zero, and so will assign their own status. > If the assignment to parent->bi_status by the successful bio happens > last, the error status will be lost which can lead to silent data > corruption. > > Instead, __bio_chain_endio should only assign a non-zero status > to parent->bi_status. There is then no need to test the current > value of parent->bi_status - a test that would be racy anyway. > > Note that this bug hasn't been seen in practice. It was only discovered > by examination after a similar bug was found in dm.c > > Signed-off-by: NeilBrown <neilb@suse.com> > --- > block/bio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/bio.c b/block/bio.c > index e1708db48258..ad77140edc6f 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio) > { > struct bio *parent = bio->bi_private; > > - if (!parent->bi_status) > + if (bio->bi_status) > parent->bi_status = bio->bi_status; > bio_put(bio); > return parent; > -- > 2.14.0.rc0.dirty > Reviewed-by: Mike Snitzer <snitzer@redhat.com> Jens, this one slipped through the crack just over a year ago. It is available in patchwork here: https://patchwork.kernel.org/patch/10220727/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio 2019-02-22 21:10 ` block: be more careful about status in __bio_chain_endio Mike Snitzer @ 2019-02-22 22:46 ` Jens Axboe 2019-02-22 23:55 ` Mike Snitzer 0 siblings, 1 reply; 9+ messages in thread From: Jens Axboe @ 2019-02-22 22:46 UTC (permalink / raw) To: Mike Snitzer, NeilBrown Cc: linux-block, device-mapper development, Milan Broz, Linux Kernel Mailing List On 2/22/19 2:10 PM, Mike Snitzer wrote: > On Thu, Feb 15 2018 at 4:09am -0500, > NeilBrown <neilb@suse.com> wrote: > >> >> If two bios are chained under the one parent (with bio_chain()) >> it is possible that one will succeed and the other will fail. >> __bio_chain_endio must ensure that the failure error status >> is reported for the whole, rather than the success. >> >> It currently tries to be careful, but this test is racy. >> If both children finish at the same time, they might both see that >> parent->bi_status as zero, and so will assign their own status. >> If the assignment to parent->bi_status by the successful bio happens >> last, the error status will be lost which can lead to silent data >> corruption. >> >> Instead, __bio_chain_endio should only assign a non-zero status >> to parent->bi_status. There is then no need to test the current >> value of parent->bi_status - a test that would be racy anyway. >> >> Note that this bug hasn't been seen in practice. It was only discovered >> by examination after a similar bug was found in dm.c >> >> Signed-off-by: NeilBrown <neilb@suse.com> >> --- >> block/bio.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/block/bio.c b/block/bio.c >> index e1708db48258..ad77140edc6f 100644 >> --- a/block/bio.c >> +++ b/block/bio.c >> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio) >> { >> struct bio *parent = bio->bi_private; >> >> - if (!parent->bi_status) >> + if (bio->bi_status) >> parent->bi_status = bio->bi_status; >> bio_put(bio); >> return parent; >> -- >> 2.14.0.rc0.dirty >> > > Reviewed-by: Mike Snitzer <snitzer@redhat.com> > > Jens, this one slipped through the crack just over a year ago. > It is available in patchwork here: > https://patchwork.kernel.org/patch/10220727/ Should this be: if (!parent->bi_status && bio->bi_status) parent->bi_status = bio->bi_status; perhaps? -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio 2019-02-22 22:46 ` Jens Axboe @ 2019-02-22 23:55 ` Mike Snitzer 2019-02-23 2:02 ` John Dorminy 0 siblings, 1 reply; 9+ messages in thread From: Mike Snitzer @ 2019-02-22 23:55 UTC (permalink / raw) To: Jens Axboe Cc: NeilBrown, linux-block, device-mapper development, Milan Broz, Linux Kernel Mailing List On Fri, Feb 22 2019 at 5:46pm -0500, Jens Axboe <axboe@kernel.dk> wrote: > On 2/22/19 2:10 PM, Mike Snitzer wrote: > > On Thu, Feb 15 2018 at 4:09am -0500, > > NeilBrown <neilb@suse.com> wrote: > > > >> > >> If two bios are chained under the one parent (with bio_chain()) > >> it is possible that one will succeed and the other will fail. > >> __bio_chain_endio must ensure that the failure error status > >> is reported for the whole, rather than the success. > >> > >> It currently tries to be careful, but this test is racy. > >> If both children finish at the same time, they might both see that > >> parent->bi_status as zero, and so will assign their own status. > >> If the assignment to parent->bi_status by the successful bio happens > >> last, the error status will be lost which can lead to silent data > >> corruption. > >> > >> Instead, __bio_chain_endio should only assign a non-zero status > >> to parent->bi_status. There is then no need to test the current > >> value of parent->bi_status - a test that would be racy anyway. > >> > >> Note that this bug hasn't been seen in practice. It was only discovered > >> by examination after a similar bug was found in dm.c > >> > >> Signed-off-by: NeilBrown <neilb@suse.com> > >> --- > >> block/bio.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/block/bio.c b/block/bio.c > >> index e1708db48258..ad77140edc6f 100644 > >> --- a/block/bio.c > >> +++ b/block/bio.c > >> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio) > >> { > >> struct bio *parent = bio->bi_private; > >> > >> - if (!parent->bi_status) > >> + if (bio->bi_status) > >> parent->bi_status = bio->bi_status; > >> bio_put(bio); > >> return parent; > >> -- > >> 2.14.0.rc0.dirty > >> > > > > Reviewed-by: Mike Snitzer <snitzer@redhat.com> > > > > Jens, this one slipped through the crack just over a year ago. > > It is available in patchwork here: > > https://patchwork.kernel.org/patch/10220727/ > > Should this be: > > if (!parent->bi_status && bio->bi_status) > parent->bi_status = bio->bi_status; > > perhaps? Yeap, even better. Not seeing any reason to have the last error win, the first in the chain is likely the most important. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio 2019-02-22 23:55 ` Mike Snitzer @ 2019-02-23 2:02 ` John Dorminy 2019-02-23 2:44 ` Mike Snitzer 0 siblings, 1 reply; 9+ messages in thread From: John Dorminy @ 2019-02-23 2:02 UTC (permalink / raw) To: Mike Snitzer Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development, Milan Broz, Linux Kernel Mailing List I am perhaps not understanding the intricacies here, or not seeing a barrier protecting it, so forgive me if I'm off base. I think reading parent->bi_status here is unsafe. Consider the following sequence of events on two threads. Thread 0 Thread 1 In __bio_chain_endio: In __bio_chain_endio: [A] Child 0 reads parent->bi_status, no error. Child bio 1 reads parent, no error seen It sets parent->bi_status to an error It calls bio_put. Child bio 0 calls bio_put [end __bio_chain_endio] [end __bio_chain_endio] In bio_chain_endio(), bio_endio(parent) is called, calling bio_remaining_done() which decrements __bi_remaining to 1 and returns false, so no further endio stuff is done. In bio_chain_endio(), bio_endio(parent) is called, calling bio_remaining_done(), decrementing parent->__bi_remaining to 0, and continuing to finish parent. Either for block tracing or for parent's bi_end_io(), this thread tries to read parent->bi_status again. The compiler or the CPU may cache the read from [A], and since there are no intervening barriers, parent->bi_status is still believed on thread 0 to be success. Thus the bio may still be falsely believed to have completed successfully, even though child 1 set an error in it. Am I missing a subtlety here? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio 2019-02-23 2:02 ` John Dorminy @ 2019-02-23 2:44 ` Mike Snitzer 2019-02-23 3:10 ` John Dorminy 0 siblings, 1 reply; 9+ messages in thread From: Mike Snitzer @ 2019-02-23 2:44 UTC (permalink / raw) To: John Dorminy Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development, Milan Broz, Linux Kernel Mailing List On Fri, Feb 22 2019 at 9:02pm -0500, John Dorminy <jdorminy@redhat.com> wrote: > I am perhaps not understanding the intricacies here, or not seeing a > barrier protecting it, so forgive me if I'm off base. I think reading > parent->bi_status here is unsafe. > Consider the following sequence of events on two threads. > > Thread 0 Thread 1 > In __bio_chain_endio: In __bio_chain_endio: > [A] Child 0 reads parent->bi_status, > no error. > Child bio 1 reads parent, no error seen > It sets parent->bi_status to an error > It calls bio_put. > Child bio 0 calls bio_put > [end __bio_chain_endio] [end __bio_chain_endio] > In bio_chain_endio(), bio_endio(parent) > is called, calling bio_remaining_done() > which decrements __bi_remaining to 1 > and returns false, so no further endio > stuff is done. > In bio_chain_endio(), bio_endio(parent) > is called, calling bio_remaining_done(), > decrementing parent->__bi_remaining to > 0, and continuing to finish parent. > Either for block tracing or for parent's > bi_end_io(), this thread tries to read > parent->bi_status again. > > The compiler or the CPU may cache the read from [A], and since there > are no intervening barriers, parent->bi_status is still believed on > thread 0 to be success. Thus the bio may still be falsely believed to > have completed successfully, even though child 1 set an error in it. > > Am I missing a subtlety here? Either neilb's original or even Jens' suggestion would be fine though. > if (!parent->bi_status && bio->bi_status) > parent->bi_status = bio->bi_status; Even if your scenario did play out (which I agree it looks possible) it'd just degenerate to neilb's version: > if (bio->bi_status) > parent->bi_status = bio->bi_status; Which also accomplishes fixing what Neil originally detailed in his patch header. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio 2019-02-23 2:44 ` Mike Snitzer @ 2019-02-23 3:10 ` John Dorminy 2019-06-12 2:56 ` John Dorminy 0 siblings, 1 reply; 9+ messages in thread From: John Dorminy @ 2019-02-23 3:10 UTC (permalink / raw) To: Mike Snitzer Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development, Milan Broz, Linux Kernel Mailing List I'm also worried about the other two versions, though: memory-barriers.txt#1724: 1724 (*) The compiler is within its rights to invent stores to a variable, i.e. the compiler is free to decide __bio_chain_endio looks like this: static struct bio *__bio_chain_endio(struct bio *bio) { struct bio *parent = bio->bi_private; blk_status_t tmp = parent->bi_status; parent->bi_status = bio->bi_status; if (!bio->bi_status) parent->bi_status = tmp; bio_put(bio); return parent; } In which case, the read and later store on the two different threads may overlap in such a way that bio_endio sometimes sees success, even if one child had an error. As a result, I believe the setting of parent->bi_status needs to be a WRITE_ONCE() and the later reading needs to be a READ_ONCE() [although, since the later reading happens in many different functions, perhaps some other barrier to make sure all readers get the correct value is in order.] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio 2019-02-23 3:10 ` John Dorminy @ 2019-06-12 2:56 ` John Dorminy 2019-06-12 7:01 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: John Dorminy @ 2019-06-12 2:56 UTC (permalink / raw) To: Mike Snitzer Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development, Milan Broz, Linux Kernel Mailing List Having studied the code in question more thoroughly, my first email's scenario can't occur, I believe. bio_put() contains a atomic_dec_and_test(), which (according to Documentation/atomic_t.txt), having a return value, are fully ordered and therefore impose a general memory barrier. A general memory barrier means that no value set before bio_put() may be observed afterwards, if I accurately understand Documentation/memory_barriers.txt. Thus, the scenario I imagined, with a load from inside __bio_chain_endio() being visible in bi_end_io(), cannot occur. However, the second email's scenario, of a compiler making two stores out of a conditional store, is still accurate according to my understanding. I continue to believe setting parent->bi_status needs to be a WRITE_ONCE() and any reading of parent->bi_status before bio_put() needs to be at least a READ_ONCE(). The last thing in that email is wrong for the same reason that the first email couldn't happen: the bio_put() general barrier means later accesses to the field from a single thread will freshly read the field and thereby not get an incorrectly cached value. As a concrete proposal, I believe either of the following work and fix the race NeilB described, as well as the compiler (or CPU) race I described: - if (!parent->bi_status) - parent->bi_status = bio->bi_status; + if (bio->bi_status) + WRITE_ONCE(parent->bi_status, bio->bi_status); --or-- - if (!parent->bi_status) - parent->bi_status = bio->bi_status; + if (!READ_ONCE(parent->bi_status) && bio->bi_status) + WRITE_ONCE(parent->bi_status, bio->bi_status); I believe the second of these might, but is not guaranteed to, preserve the first error observed in a child; I believe if you want to definitely save the first error you need an atomic. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio 2019-06-12 2:56 ` John Dorminy @ 2019-06-12 7:01 ` Christoph Hellwig 2019-06-17 7:32 ` Hannes Reinecke 0 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2019-06-12 7:01 UTC (permalink / raw) To: John Dorminy Cc: Mike Snitzer, Jens Axboe, NeilBrown, linux-block, device-mapper development, Milan Broz, Linux Kernel Mailing List On Tue, Jun 11, 2019 at 10:56:42PM -0400, John Dorminy wrote: > I believe the second of these might, but is not guaranteed to, > preserve the first error observed in a child; I believe if you want to > definitely save the first error you need an atomic. Is there any reason not to simply use a cmpxchg? Yes, it is a relatively expensive operation, but once we are chaining bios we are out of the super hot path anyway. We do something similar in xfs and iomap already. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio 2019-06-12 7:01 ` Christoph Hellwig @ 2019-06-17 7:32 ` Hannes Reinecke 0 siblings, 0 replies; 9+ messages in thread From: Hannes Reinecke @ 2019-06-17 7:32 UTC (permalink / raw) To: Christoph Hellwig, John Dorminy Cc: Mike Snitzer, Jens Axboe, NeilBrown, linux-block, device-mapper development, Milan Broz, Linux Kernel Mailing List On 6/12/19 9:01 AM, Christoph Hellwig wrote: > On Tue, Jun 11, 2019 at 10:56:42PM -0400, John Dorminy wrote: >> I believe the second of these might, but is not guaranteed to, >> preserve the first error observed in a child; I believe if you want to >> definitely save the first error you need an atomic. > > Is there any reason not to simply use a cmpxchg? Yes, it is a > relatively expensive operation, but once we are chaining bios we are out > of the super hot path anyway. We do something similar in xfs and iomap > already. > Agree. Thing is, we need to check if the parent status is NULL, _and_ the parent status might be modified asynchronously. So even a READ_ONCE() wouldn't cut it, as it would tell us that the parent status _was_ NULL, not that the parent status _is_ NULL by the time we're setting it. So cmpxchg() is it. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-06-17 7:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <70cda2a3-f246-d45b-f600-1f9d15ba22ff@gmail.com> [not found] ` <87eflmpqkb.fsf@notabene.neil.brown.name> 2019-02-22 21:10 ` block: be more careful about status in __bio_chain_endio Mike Snitzer 2019-02-22 22:46 ` Jens Axboe 2019-02-22 23:55 ` Mike Snitzer 2019-02-23 2:02 ` John Dorminy 2019-02-23 2:44 ` Mike Snitzer 2019-02-23 3:10 ` John Dorminy 2019-06-12 2:56 ` John Dorminy 2019-06-12 7:01 ` Christoph Hellwig 2019-06-17 7:32 ` Hannes Reinecke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).