linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: block: be more careful about status in __bio_chain_endio
       [not found] ` <87eflmpqkb.fsf@notabene.neil.brown.name>
@ 2019-02-22 21:10   ` Mike Snitzer
  2019-02-22 22:46     ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2019-02-22 21:10 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, linux-block, device-mapper development, Milan Broz,
	Linux Kernel Mailing List

On Thu, Feb 15 2018 at  4:09am -0500,
NeilBrown <neilb@suse.com> wrote:

> 
> If two bios are chained under the one parent (with bio_chain())
> it is possible that one will succeed and the other will fail.
> __bio_chain_endio must ensure that the failure error status
> is reported for the whole, rather than the success.
> 
> It currently tries to be careful, but this test is racy.
> If both children finish at the same time, they might both see that
> parent->bi_status as zero, and so will assign their own status.
> If the assignment to parent->bi_status by the successful bio happens
> last, the error status will be lost which can lead to silent data
> corruption.
> 
> Instead, __bio_chain_endio should only assign a non-zero status
> to parent->bi_status.  There is then no need to test the current
> value of parent->bi_status - a test that would be racy anyway.
> 
> Note that this bug hasn't been seen in practice.  It was only discovered
> by examination after a similar bug was found in dm.c
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  block/bio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index e1708db48258..ad77140edc6f 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
>  {
>  	struct bio *parent = bio->bi_private;
>  
> -	if (!parent->bi_status)
> +	if (bio->bi_status)
>  		parent->bi_status = bio->bi_status;
>  	bio_put(bio);
>  	return parent;
> -- 
> 2.14.0.rc0.dirty
> 

Reviewed-by: Mike Snitzer <snitzer@redhat.com>

Jens, this one slipped through the crack just over a year ago.
It is available in patchwork here:
https://patchwork.kernel.org/patch/10220727/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: block: be more careful about status in __bio_chain_endio
  2019-02-22 21:10   ` block: be more careful about status in __bio_chain_endio Mike Snitzer
@ 2019-02-22 22:46     ` Jens Axboe
  2019-02-22 23:55       ` Mike Snitzer
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2019-02-22 22:46 UTC (permalink / raw)
  To: Mike Snitzer, NeilBrown
  Cc: linux-block, device-mapper development, Milan Broz,
	Linux Kernel Mailing List

On 2/22/19 2:10 PM, Mike Snitzer wrote:
> On Thu, Feb 15 2018 at  4:09am -0500,
> NeilBrown <neilb@suse.com> wrote:
> 
>>
>> If two bios are chained under the one parent (with bio_chain())
>> it is possible that one will succeed and the other will fail.
>> __bio_chain_endio must ensure that the failure error status
>> is reported for the whole, rather than the success.
>>
>> It currently tries to be careful, but this test is racy.
>> If both children finish at the same time, they might both see that
>> parent->bi_status as zero, and so will assign their own status.
>> If the assignment to parent->bi_status by the successful bio happens
>> last, the error status will be lost which can lead to silent data
>> corruption.
>>
>> Instead, __bio_chain_endio should only assign a non-zero status
>> to parent->bi_status.  There is then no need to test the current
>> value of parent->bi_status - a test that would be racy anyway.
>>
>> Note that this bug hasn't been seen in practice.  It was only discovered
>> by examination after a similar bug was found in dm.c
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  block/bio.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/bio.c b/block/bio.c
>> index e1708db48258..ad77140edc6f 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
>>  {
>>  	struct bio *parent = bio->bi_private;
>>  
>> -	if (!parent->bi_status)
>> +	if (bio->bi_status)
>>  		parent->bi_status = bio->bi_status;
>>  	bio_put(bio);
>>  	return parent;
>> -- 
>> 2.14.0.rc0.dirty
>>
> 
> Reviewed-by: Mike Snitzer <snitzer@redhat.com>
> 
> Jens, this one slipped through the crack just over a year ago.
> It is available in patchwork here:
> https://patchwork.kernel.org/patch/10220727/

Should this be:

	if (!parent->bi_status && bio->bi_status)
		parent->bi_status = bio->bi_status;

perhaps?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: block: be more careful about status in __bio_chain_endio
  2019-02-22 22:46     ` Jens Axboe
@ 2019-02-22 23:55       ` Mike Snitzer
  2019-02-23  2:02         ` John Dorminy
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2019-02-22 23:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: NeilBrown, linux-block, device-mapper development, Milan Broz,
	Linux Kernel Mailing List

On Fri, Feb 22 2019 at  5:46pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 2/22/19 2:10 PM, Mike Snitzer wrote:
> > On Thu, Feb 15 2018 at  4:09am -0500,
> > NeilBrown <neilb@suse.com> wrote:
> > 
> >>
> >> If two bios are chained under the one parent (with bio_chain())
> >> it is possible that one will succeed and the other will fail.
> >> __bio_chain_endio must ensure that the failure error status
> >> is reported for the whole, rather than the success.
> >>
> >> It currently tries to be careful, but this test is racy.
> >> If both children finish at the same time, they might both see that
> >> parent->bi_status as zero, and so will assign their own status.
> >> If the assignment to parent->bi_status by the successful bio happens
> >> last, the error status will be lost which can lead to silent data
> >> corruption.
> >>
> >> Instead, __bio_chain_endio should only assign a non-zero status
> >> to parent->bi_status.  There is then no need to test the current
> >> value of parent->bi_status - a test that would be racy anyway.
> >>
> >> Note that this bug hasn't been seen in practice.  It was only discovered
> >> by examination after a similar bug was found in dm.c
> >>
> >> Signed-off-by: NeilBrown <neilb@suse.com>
> >> ---
> >>  block/bio.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/block/bio.c b/block/bio.c
> >> index e1708db48258..ad77140edc6f 100644
> >> --- a/block/bio.c
> >> +++ b/block/bio.c
> >> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
> >>  {
> >>  	struct bio *parent = bio->bi_private;
> >>  
> >> -	if (!parent->bi_status)
> >> +	if (bio->bi_status)
> >>  		parent->bi_status = bio->bi_status;
> >>  	bio_put(bio);
> >>  	return parent;
> >> -- 
> >> 2.14.0.rc0.dirty
> >>
> > 
> > Reviewed-by: Mike Snitzer <snitzer@redhat.com>
> > 
> > Jens, this one slipped through the crack just over a year ago.
> > It is available in patchwork here:
> > https://patchwork.kernel.org/patch/10220727/
> 
> Should this be:
> 
> 	if (!parent->bi_status && bio->bi_status)
> 		parent->bi_status = bio->bi_status;
> 
> perhaps?

Yeap, even better.  Not seeing any reason to have the last error win,
the first in the chain is likely the most important.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: block: be more careful about status in __bio_chain_endio
  2019-02-22 23:55       ` Mike Snitzer
@ 2019-02-23  2:02         ` John Dorminy
  2019-02-23  2:44           ` Mike Snitzer
  0 siblings, 1 reply; 9+ messages in thread
From: John Dorminy @ 2019-02-23  2:02 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development,
	Milan Broz, Linux Kernel Mailing List

I am perhaps not understanding the intricacies here, or not seeing a
barrier protecting it, so forgive me if I'm off base. I think reading
parent->bi_status here is unsafe.
Consider the following sequence of events on two threads.

Thread 0                                 Thread 1
In __bio_chain_endio:                    In __bio_chain_endio:
[A] Child 0 reads parent->bi_status,
    no error.
                                         Child bio 1 reads parent, no error seen
                                         It sets parent->bi_status to an error
                                         It calls bio_put.
Child bio 0 calls bio_put
[end __bio_chain_endio]                  [end __bio_chain_endio]
                                         In bio_chain_endio(), bio_endio(parent)
                                         is called, calling bio_remaining_done()
                                         which decrements __bi_remaining to 1
                                         and returns false, so no further endio
                                         stuff is done.
In bio_chain_endio(), bio_endio(parent)
is called, calling bio_remaining_done(),
decrementing parent->__bi_remaining to
 0, and continuing to finish parent.
Either for block tracing or for parent's
bi_end_io(), this thread tries to read
parent->bi_status again.

The compiler or the CPU may cache the read from [A], and since there
are no intervening barriers, parent->bi_status is still believed on
thread 0 to be success. Thus the bio may still be falsely believed to
have completed successfully, even though child 1 set an error in it.

Am I missing a subtlety here?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: block: be more careful about status in __bio_chain_endio
  2019-02-23  2:02         ` John Dorminy
@ 2019-02-23  2:44           ` Mike Snitzer
  2019-02-23  3:10             ` John Dorminy
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2019-02-23  2:44 UTC (permalink / raw)
  To: John Dorminy
  Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development,
	Milan Broz, Linux Kernel Mailing List

On Fri, Feb 22 2019 at  9:02pm -0500,
John Dorminy <jdorminy@redhat.com> wrote:

> I am perhaps not understanding the intricacies here, or not seeing a
> barrier protecting it, so forgive me if I'm off base. I think reading
> parent->bi_status here is unsafe.
> Consider the following sequence of events on two threads.
> 
> Thread 0                                 Thread 1
> In __bio_chain_endio:                    In __bio_chain_endio:
> [A] Child 0 reads parent->bi_status,
>     no error.
>                                          Child bio 1 reads parent, no error seen
>                                          It sets parent->bi_status to an error
>                                          It calls bio_put.
> Child bio 0 calls bio_put
> [end __bio_chain_endio]                  [end __bio_chain_endio]
>                                          In bio_chain_endio(), bio_endio(parent)
>                                          is called, calling bio_remaining_done()
>                                          which decrements __bi_remaining to 1
>                                          and returns false, so no further endio
>                                          stuff is done.
> In bio_chain_endio(), bio_endio(parent)
> is called, calling bio_remaining_done(),
> decrementing parent->__bi_remaining to
>  0, and continuing to finish parent.
> Either for block tracing or for parent's
> bi_end_io(), this thread tries to read
> parent->bi_status again.
> 
> The compiler or the CPU may cache the read from [A], and since there
> are no intervening barriers, parent->bi_status is still believed on
> thread 0 to be success. Thus the bio may still be falsely believed to
> have completed successfully, even though child 1 set an error in it.
> 
> Am I missing a subtlety here?

Either neilb's original or even Jens' suggestion would be fine though.

>       if (!parent->bi_status && bio->bi_status)
>               parent->bi_status = bio->bi_status;

Even if your scenario did play out (which I agree it looks possible)
it'd just degenerate to neilb's version:

>       if (bio->bi_status)
>               parent->bi_status = bio->bi_status;

Which also accomplishes fixing what Neil originally detailed in his
patch header.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: block: be more careful about status in __bio_chain_endio
  2019-02-23  2:44           ` Mike Snitzer
@ 2019-02-23  3:10             ` John Dorminy
  2019-06-12  2:56               ` John Dorminy
  0 siblings, 1 reply; 9+ messages in thread
From: John Dorminy @ 2019-02-23  3:10 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development,
	Milan Broz, Linux Kernel Mailing List

I'm also worried about the other two versions, though:

memory-barriers.txt#1724:

1724 (*) The compiler is within its rights to invent stores to a variable,

i.e. the compiler is free to decide __bio_chain_endio looks like this:

static struct bio *__bio_chain_endio(struct bio *bio)
{
  struct bio *parent = bio->bi_private;
  blk_status_t tmp = parent->bi_status;
  parent->bi_status = bio->bi_status;
  if (!bio->bi_status)
    parent->bi_status = tmp;
  bio_put(bio);
  return parent;
}

In which case, the read and later store on the two different threads
may overlap in such a way that bio_endio sometimes sees success, even
if one child had an error.

As a result, I believe the setting of parent->bi_status needs to be a
WRITE_ONCE() and the later reading needs to be a READ_ONCE()
[although, since the later reading happens in many different
functions, perhaps some other barrier to make sure all readers get the
correct value is in order.]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: block: be more careful about status in __bio_chain_endio
  2019-02-23  3:10             ` John Dorminy
@ 2019-06-12  2:56               ` John Dorminy
  2019-06-12  7:01                 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: John Dorminy @ 2019-06-12  2:56 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, NeilBrown, linux-block, device-mapper development,
	Milan Broz, Linux Kernel Mailing List

Having studied the code in question more thoroughly, my first email's
scenario can't occur, I believe. bio_put() contains a
atomic_dec_and_test(), which (according to
Documentation/atomic_t.txt), having a return value, are fully ordered
and therefore impose a general memory barrier. A general memory
barrier means that no value set before bio_put() may be observed
afterwards, if I accurately understand
Documentation/memory_barriers.txt. Thus, the scenario I imagined, with
a load from inside __bio_chain_endio() being visible in bi_end_io(),
cannot occur.

However, the second email's scenario, of a compiler making two stores
out of a conditional store, is still accurate according to my
understanding. I continue to believe setting parent->bi_status needs
to be a WRITE_ONCE() and any reading of parent->bi_status before
bio_put() needs to be at least a READ_ONCE(). The last thing in that
email is wrong for the same reason that the first email couldn't
happen: the bio_put() general barrier means later accesses to the
field from a single thread will freshly read the field and thereby not
get an incorrectly cached value.

As a concrete proposal, I believe either of the following work and fix
the race NeilB described, as well as the compiler (or CPU) race I
described:

 -     if (!parent->bi_status)
 -               parent->bi_status = bio->bi_status;
 +     if (bio->bi_status)
 +               WRITE_ONCE(parent->bi_status, bio->bi_status);

--or--

 -     if (!parent->bi_status)
 -               parent->bi_status = bio->bi_status;
 +     if (!READ_ONCE(parent->bi_status) && bio->bi_status)
 +               WRITE_ONCE(parent->bi_status, bio->bi_status);

I believe the second of these might, but is not guaranteed to,
preserve the first error observed in a child; I believe if you want to
definitely save the first error you need an atomic.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: block: be more careful about status in __bio_chain_endio
  2019-06-12  2:56               ` John Dorminy
@ 2019-06-12  7:01                 ` Christoph Hellwig
  2019-06-17  7:32                   ` Hannes Reinecke
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2019-06-12  7:01 UTC (permalink / raw)
  To: John Dorminy
  Cc: Mike Snitzer, Jens Axboe, NeilBrown, linux-block,
	device-mapper development, Milan Broz, Linux Kernel Mailing List

On Tue, Jun 11, 2019 at 10:56:42PM -0400, John Dorminy wrote:
> I believe the second of these might, but is not guaranteed to,
> preserve the first error observed in a child; I believe if you want to
> definitely save the first error you need an atomic.

Is there any reason not to simply use a cmpxchg?  Yes, it is a
relatively expensive operation, but once we are chaining bios we are out
of the super hot path anyway.  We do something similar in xfs and iomap
already.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: block: be more careful about status in __bio_chain_endio
  2019-06-12  7:01                 ` Christoph Hellwig
@ 2019-06-17  7:32                   ` Hannes Reinecke
  0 siblings, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2019-06-17  7:32 UTC (permalink / raw)
  To: Christoph Hellwig, John Dorminy
  Cc: Mike Snitzer, Jens Axboe, NeilBrown, linux-block,
	device-mapper development, Milan Broz, Linux Kernel Mailing List

On 6/12/19 9:01 AM, Christoph Hellwig wrote:
> On Tue, Jun 11, 2019 at 10:56:42PM -0400, John Dorminy wrote:
>> I believe the second of these might, but is not guaranteed to,
>> preserve the first error observed in a child; I believe if you want to
>> definitely save the first error you need an atomic.
> 
> Is there any reason not to simply use a cmpxchg?  Yes, it is a
> relatively expensive operation, but once we are chaining bios we are out
> of the super hot path anyway.  We do something similar in xfs and iomap
> already.
> 
Agree.
Thing is, we need to check if the parent status is NULL, _and_ the
parent status might be modified asynchronously.
So even a READ_ONCE() wouldn't cut it, as it would tell us that the
parent status _was_ NULL, not that the parent status _is_ NULL by the
time we're setting it.
So cmpxchg() is it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-17  7:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <70cda2a3-f246-d45b-f600-1f9d15ba22ff@gmail.com>
     [not found] ` <87eflmpqkb.fsf@notabene.neil.brown.name>
2019-02-22 21:10   ` block: be more careful about status in __bio_chain_endio Mike Snitzer
2019-02-22 22:46     ` Jens Axboe
2019-02-22 23:55       ` Mike Snitzer
2019-02-23  2:02         ` John Dorminy
2019-02-23  2:44           ` Mike Snitzer
2019-02-23  3:10             ` John Dorminy
2019-06-12  2:56               ` John Dorminy
2019-06-12  7:01                 ` Christoph Hellwig
2019-06-17  7:32                   ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).