* Re: block: be more careful about status in __bio_chain_endio
[not found] ` <87eflmpqkb.fsf@notabene.neil.brown.name>
@ 2019-02-22 21:10 ` Mike Snitzer
2019-02-22 22:46 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2019-02-22 21:10 UTC (permalink / raw)
To: NeilBrown
Cc: Jens Axboe, linux-block, device-mapper development, Milan Broz,
Linux Kernel Mailing List
On Thu, Feb 15 2018 at 4:09am -0500,
NeilBrown <neilb@suse.com> wrote:
>
> If two bios are chained under the one parent (with bio_chain())
> it is possible that one will succeed and the other will fail.
> __bio_chain_endio must ensure that the failure error status
> is reported for the whole, rather than the success.
>
> It currently tries to be careful, but this test is racy.
> If both children finish at the same time, they might both see that
> parent->bi_status as zero, and so will assign their own status.
> If the assignment to parent->bi_status by the successful bio happens
> last, the error status will be lost which can lead to silent data
> corruption.
>
> Instead, __bio_chain_endio should only assign a non-zero status
> to parent->bi_status. There is then no need to test the current
> value of parent->bi_status - a test that would be racy anyway.
>
> Note that this bug hasn't been seen in practice. It was only discovered
> by examination after a similar bug was found in dm.c
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> block/bio.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index e1708db48258..ad77140edc6f 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
> {
> struct bio *parent = bio->bi_private;
>
> - if (!parent->bi_status)
> + if (bio->bi_status)
> parent->bi_status = bio->bi_status;
> bio_put(bio);
> return parent;
> --
> 2.14.0.rc0.dirty
>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Jens, this one slipped through the crack just over a year ago.
It is available in patchwork here:
https://patchwork.kernel.org/patch/10220727/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio
2019-02-22 21:10 ` block: be more careful about status in __bio_chain_endio Mike Snitzer
@ 2019-02-22 22:46 ` Jens Axboe
2019-02-22 23:55 ` Mike Snitzer
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2019-02-22 22:46 UTC (permalink / raw)
To: Mike Snitzer, NeilBrown
Cc: linux-block, device-mapper development, Milan Broz,
Linux Kernel Mailing List
On 2/22/19 2:10 PM, Mike Snitzer wrote:
> On Thu, Feb 15 2018 at 4:09am -0500,
> NeilBrown <neilb@suse.com> wrote:
>
>>
>> If two bios are chained under the one parent (with bio_chain())
>> it is possible that one will succeed and the other will fail.
>> __bio_chain_endio must ensure that the failure error status
>> is reported for the whole, rather than the success.
>>
>> It currently tries to be careful, but this test is racy.
>> If both children finish at the same time, they might both see that
>> parent->bi_status as zero, and so will assign their own status.
>> If the assignment to parent->bi_status by the successful bio happens
>> last, the error status will be lost which can lead to silent data
>> corruption.
>>
>> Instead, __bio_chain_endio should only assign a non-zero status
>> to parent->bi_status. There is then no need to test the current
>> value of parent->bi_status - a test that would be racy anyway.
>>
>> Note that this bug hasn't been seen in practice. It was only discovered
>> by examination after a similar bug was found in dm.c
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> block/bio.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/bio.c b/block/bio.c
>> index e1708db48258..ad77140edc6f 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
>> {
>> struct bio *parent = bio->bi_private;
>>
>> - if (!parent->bi_status)
>> + if (bio->bi_status)
>> parent->bi_status = bio->bi_status;
>> bio_put(bio);
>> return parent;
>> --
>> 2.14.0.rc0.dirty
>>
>
> Reviewed-by: Mike Snitzer <snitzer@redhat.com>
>
> Jens, this one slipped through the crack just over a year ago.
> It is available in patchwork here:
> https://patchwork.kernel.org/patch/10220727/
Should this be:
if (!parent->bi_status && bio->bi_status)
parent->bi_status = bio->bi_status;
perhaps?
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio
2019-02-22 22:46 ` Jens Axboe
@ 2019-02-22 23:55 ` Mike Snitzer
2019-02-23 2:02 ` John Dorminy
0 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2019-02-22 23:55 UTC (permalink / raw)
To: Jens Axboe
Cc: NeilBrown, linux-block, device-mapper development, Milan Broz,
Linux Kernel Mailing List
On Fri, Feb 22 2019 at 5:46pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:
> On 2/22/19 2:10 PM, Mike Snitzer wrote:
> > On Thu, Feb 15 2018 at 4:09am -0500,
> > NeilBrown <neilb@suse.com> wrote:
> >
> >>
> >> If two bios are chained under the one parent (with bio_chain())
> >> it is possible that one will succeed and the other will fail.
> >> __bio_chain_endio must ensure that the failure error status
> >> is reported for the whole, rather than the success.
> >>
> >> It currently tries to be careful, but this test is racy.
> >> If both children finish at the same time, they might both see that
> >> parent->bi_status as zero, and so will assign their own status.
> >> If the assignment to parent->bi_status by the successful bio happens
> >> last, the error status will be lost which can lead to silent data
> >> corruption.
> >>
> >> Instead, __bio_chain_endio should only assign a non-zero status
> >> to parent->bi_status. There is then no need to test the current
> >> value of parent->bi_status - a test that would be racy anyway.
> >>
> >> Note that this bug hasn't been seen in practice. It was only discovered
> >> by examination after a similar bug was found in dm.c
> >>
> >> Signed-off-by: NeilBrown <neilb@suse.com>
> >> ---
> >> block/bio.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/block/bio.c b/block/bio.c
> >> index e1708db48258..ad77140edc6f 100644
> >> --- a/block/bio.c
> >> +++ b/block/bio.c
> >> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
> >> {
> >> struct bio *parent = bio->bi_private;
> >>
> >> - if (!parent->bi_status)
> >> + if (bio->bi_status)
> >> parent->bi_status = bio->bi_status;
> >> bio_put(bio);
> >> return parent;
> >> --
> >> 2.14.0.rc0.dirty
> >>
> >
> > Reviewed-by: Mike Snitzer <snitzer@redhat.com>
> >
> > Jens, this one slipped through the crack just over a year ago.
> > It is available in patchwork here:
> > https://patchwork.kernel.org/patch/10220727/
>
> Should this be:
>
> if (!parent->bi_status && bio->bi_status)
> parent->bi_status = bio->bi_status;
>
> perhaps?
Yeap, even better. Not seeing any reason to have the last error win,
the first in the chain is likely the most important.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio
2019-02-22 23:55 ` Mike Snitzer
@ 2019-02-23 2:02 ` John Dorminy
2019-02-23 2:44 ` Mike Snitzer
0 siblings, 1 reply; 9+ messages in thread
From: John Dorminy @ 2019-02-23 2:02 UTC (permalink / raw)
To: Mike Snitzer
Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development,
Milan Broz, Linux Kernel Mailing List
I am perhaps not understanding the intricacies here, or not seeing a
barrier protecting it, so forgive me if I'm off base. I think reading
parent->bi_status here is unsafe.
Consider the following sequence of events on two threads.
Thread 0 Thread 1
In __bio_chain_endio: In __bio_chain_endio:
[A] Child 0 reads parent->bi_status,
no error.
Child bio 1 reads parent, no error seen
It sets parent->bi_status to an error
It calls bio_put.
Child bio 0 calls bio_put
[end __bio_chain_endio] [end __bio_chain_endio]
In bio_chain_endio(), bio_endio(parent)
is called, calling bio_remaining_done()
which decrements __bi_remaining to 1
and returns false, so no further endio
stuff is done.
In bio_chain_endio(), bio_endio(parent)
is called, calling bio_remaining_done(),
decrementing parent->__bi_remaining to
0, and continuing to finish parent.
Either for block tracing or for parent's
bi_end_io(), this thread tries to read
parent->bi_status again.
The compiler or the CPU may cache the read from [A], and since there
are no intervening barriers, parent->bi_status is still believed on
thread 0 to be success. Thus the bio may still be falsely believed to
have completed successfully, even though child 1 set an error in it.
Am I missing a subtlety here?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio
2019-02-23 2:02 ` John Dorminy
@ 2019-02-23 2:44 ` Mike Snitzer
2019-02-23 3:10 ` John Dorminy
0 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2019-02-23 2:44 UTC (permalink / raw)
To: John Dorminy
Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development,
Milan Broz, Linux Kernel Mailing List
On Fri, Feb 22 2019 at 9:02pm -0500,
John Dorminy <jdorminy@redhat.com> wrote:
> I am perhaps not understanding the intricacies here, or not seeing a
> barrier protecting it, so forgive me if I'm off base. I think reading
> parent->bi_status here is unsafe.
> Consider the following sequence of events on two threads.
>
> Thread 0 Thread 1
> In __bio_chain_endio: In __bio_chain_endio:
> [A] Child 0 reads parent->bi_status,
> no error.
> Child bio 1 reads parent, no error seen
> It sets parent->bi_status to an error
> It calls bio_put.
> Child bio 0 calls bio_put
> [end __bio_chain_endio] [end __bio_chain_endio]
> In bio_chain_endio(), bio_endio(parent)
> is called, calling bio_remaining_done()
> which decrements __bi_remaining to 1
> and returns false, so no further endio
> stuff is done.
> In bio_chain_endio(), bio_endio(parent)
> is called, calling bio_remaining_done(),
> decrementing parent->__bi_remaining to
> 0, and continuing to finish parent.
> Either for block tracing or for parent's
> bi_end_io(), this thread tries to read
> parent->bi_status again.
>
> The compiler or the CPU may cache the read from [A], and since there
> are no intervening barriers, parent->bi_status is still believed on
> thread 0 to be success. Thus the bio may still be falsely believed to
> have completed successfully, even though child 1 set an error in it.
>
> Am I missing a subtlety here?
Either neilb's original or even Jens' suggestion would be fine though.
> if (!parent->bi_status && bio->bi_status)
> parent->bi_status = bio->bi_status;
Even if your scenario did play out (which I agree it looks possible)
it'd just degenerate to neilb's version:
> if (bio->bi_status)
> parent->bi_status = bio->bi_status;
Which also accomplishes fixing what Neil originally detailed in his
patch header.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio
2019-02-23 2:44 ` Mike Snitzer
@ 2019-02-23 3:10 ` John Dorminy
2019-06-12 2:56 ` John Dorminy
0 siblings, 1 reply; 9+ messages in thread
From: John Dorminy @ 2019-02-23 3:10 UTC (permalink / raw)
To: Mike Snitzer
Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development,
Milan Broz, Linux Kernel Mailing List
I'm also worried about the other two versions, though:
memory-barriers.txt#1724:
1724 (*) The compiler is within its rights to invent stores to a variable,
i.e. the compiler is free to decide __bio_chain_endio looks like this:
static struct bio *__bio_chain_endio(struct bio *bio)
{
struct bio *parent = bio->bi_private;
blk_status_t tmp = parent->bi_status;
parent->bi_status = bio->bi_status;
if (!bio->bi_status)
parent->bi_status = tmp;
bio_put(bio);
return parent;
}
In which case, the read and later store on the two different threads
may overlap in such a way that bio_endio sometimes sees success, even
if one child had an error.
As a result, I believe the setting of parent->bi_status needs to be a
WRITE_ONCE() and the later reading needs to be a READ_ONCE()
[although, since the later reading happens in many different
functions, perhaps some other barrier to make sure all readers get the
correct value is in order.]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio
2019-02-23 3:10 ` John Dorminy
@ 2019-06-12 2:56 ` John Dorminy
2019-06-12 7:01 ` Christoph Hellwig
0 siblings, 1 reply; 9+ messages in thread
From: John Dorminy @ 2019-06-12 2:56 UTC (permalink / raw)
To: Mike Snitzer
Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development,
Milan Broz, Linux Kernel Mailing List
Having studied the code in question more thoroughly, my first email's
scenario can't occur, I believe. bio_put() contains a
atomic_dec_and_test(), which (according to
Documentation/atomic_t.txt), having a return value, are fully ordered
and therefore impose a general memory barrier. A general memory
barrier means that no value set before bio_put() may be observed
afterwards, if I accurately understand
Documentation/memory_barriers.txt. Thus, the scenario I imagined, with
a load from inside __bio_chain_endio() being visible in bi_end_io(),
cannot occur.
However, the second email's scenario, of a compiler making two stores
out of a conditional store, is still accurate according to my
understanding. I continue to believe setting parent->bi_status needs
to be a WRITE_ONCE() and any reading of parent->bi_status before
bio_put() needs to be at least a READ_ONCE(). The last thing in that
email is wrong for the same reason that the first email couldn't
happen: the bio_put() general barrier means later accesses to the
field from a single thread will freshly read the field and thereby not
get an incorrectly cached value.
As a concrete proposal, I believe either of the following work and fix
the race NeilB described, as well as the compiler (or CPU) race I
described:
- if (!parent->bi_status)
- parent->bi_status = bio->bi_status;
+ if (bio->bi_status)
+ WRITE_ONCE(parent->bi_status, bio->bi_status);
--or--
- if (!parent->bi_status)
- parent->bi_status = bio->bi_status;
+ if (!READ_ONCE(parent->bi_status) && bio->bi_status)
+ WRITE_ONCE(parent->bi_status, bio->bi_status);
I believe the second of these might, but is not guaranteed to,
preserve the first error observed in a child; I believe if you want to
definitely save the first error you need an atomic.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio
2019-06-12 2:56 ` John Dorminy
@ 2019-06-12 7:01 ` Christoph Hellwig
2019-06-17 7:32 ` Hannes Reinecke
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2019-06-12 7:01 UTC (permalink / raw)
To: John Dorminy
Cc: Mike Snitzer, Jens Axboe, NeilBrown, linux-block,
device-mapper development, Milan Broz, Linux Kernel Mailing List
On Tue, Jun 11, 2019 at 10:56:42PM -0400, John Dorminy wrote:
> I believe the second of these might, but is not guaranteed to,
> preserve the first error observed in a child; I believe if you want to
> definitely save the first error you need an atomic.
Is there any reason not to simply use a cmpxchg? Yes, it is a
relatively expensive operation, but once we are chaining bios we are out
of the super hot path anyway. We do something similar in xfs and iomap
already.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: block: be more careful about status in __bio_chain_endio
2019-06-12 7:01 ` Christoph Hellwig
@ 2019-06-17 7:32 ` Hannes Reinecke
0 siblings, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2019-06-17 7:32 UTC (permalink / raw)
To: Christoph Hellwig, John Dorminy
Cc: Mike Snitzer, Jens Axboe, NeilBrown, linux-block,
device-mapper development, Milan Broz, Linux Kernel Mailing List
On 6/12/19 9:01 AM, Christoph Hellwig wrote:
> On Tue, Jun 11, 2019 at 10:56:42PM -0400, John Dorminy wrote:
>> I believe the second of these might, but is not guaranteed to,
>> preserve the first error observed in a child; I believe if you want to
>> definitely save the first error you need an atomic.
>
> Is there any reason not to simply use a cmpxchg? Yes, it is a
> relatively expensive operation, but once we are chaining bios we are out
> of the super hot path anyway. We do something similar in xfs and iomap
> already.
>
Agree.
Thing is, we need to check if the parent status is NULL, _and_ the
parent status might be modified asynchronously.
So even a READ_ONCE() wouldn't cut it, as it would tell us that the
parent status _was_ NULL, not that the parent status _is_ NULL by the
time we're setting it.
So cmpxchg() is it.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-06-17 7:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <70cda2a3-f246-d45b-f600-1f9d15ba22ff@gmail.com>
[not found] ` <87eflmpqkb.fsf@notabene.neil.brown.name>
2019-02-22 21:10 ` block: be more careful about status in __bio_chain_endio Mike Snitzer
2019-02-22 22:46 ` Jens Axboe
2019-02-22 23:55 ` Mike Snitzer
2019-02-23 2:02 ` John Dorminy
2019-02-23 2:44 ` Mike Snitzer
2019-02-23 3:10 ` John Dorminy
2019-06-12 2:56 ` John Dorminy
2019-06-12 7:01 ` Christoph Hellwig
2019-06-17 7:32 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).