All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: raid56: use atomic_dec_and_test() in end io functions
@ 2022-11-09 23:39 Qu Wenruo
  2022-11-11  8:07 ` Qu Wenruo
  2022-11-11 14:07 ` David Sterba
  0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2022-11-09 23:39 UTC (permalink / raw)
  To: linux-btrfs, dsterba

!!! DON'T MERGE AS IS !!!

The latest btrfs raid56 refactor is using atomic_dec() then
unconditionally call wake_up() to let the main thread to check if all
the IO is done.

But atomic_dec() itself is not fully ordered, thus it can have an
impact on the lifespan of the rbio, which causes use-after-free and
crash in the raid6 path.

Unfortunately I don't have a solid environment to reproduce the problem
(even with KASAN enabled).
My guess is, since I'm always using the latest hardware (days ago it's
R9 5900X, now it's i7 13700K) they may have something a little more
strict on the ordering.

So this patch is still just for David to verify the behavior, and if
this one really solved the problem, it's better to be folded into
"btrfs: raid56: switch recovery path to a single function" and
"btrfs: raid56: introduce the a main entrance for rmw path".

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 4a7932240d42..11be5d0a7eab 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1437,8 +1437,8 @@ static void raid_wait_read_end_io(struct bio *bio)
 		set_bio_pages_uptodate(rbio, bio);
 
 	bio_put(bio);
-	atomic_dec(&rbio->stripes_pending);
-	wake_up(&rbio->io_wait);
+	if (atomic_dec_and_test(&rbio->stripes_pending))
+		wake_up(&rbio->io_wait);
 }
 
 static void submit_read_bios(struct btrfs_raid_bio *rbio,
@@ -2078,8 +2078,8 @@ static void raid_wait_write_end_io(struct bio *bio)
 	if (err)
 		rbio_update_error_bitmap(rbio, bio);
 	bio_put(bio);
-	atomic_dec(&rbio->stripes_pending);
-	wake_up(&rbio->io_wait);
+	if (atomic_dec_and_test(&rbio->stripes_pending))
+		wake_up(&rbio->io_wait);
 }
 
 static void submit_write_bios(struct btrfs_raid_bio *rbio,
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: raid56: use atomic_dec_and_test() in end io functions
  2022-11-09 23:39 [PATCH] btrfs: raid56: use atomic_dec_and_test() in end io functions Qu Wenruo
@ 2022-11-11  8:07 ` Qu Wenruo
  2022-11-11 14:07 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2022-11-11  8:07 UTC (permalink / raw)
  To: linux-btrfs, dsterba



On 2022/11/10 07:39, Qu Wenruo wrote:
> !!! DON'T MERGE AS IS !!!
> 
> The latest btrfs raid56 refactor is using atomic_dec() then
> unconditionally call wake_up() to let the main thread to check if all
> the IO is done.
> 
> But atomic_dec() itself is not fully ordered, thus it can have an
> impact on the lifespan of the rbio, which causes use-after-free and
> crash in the raid6 path.
> 
> Unfortunately I don't have a solid environment to reproduce the problem
> (even with KASAN enabled).
> My guess is, since I'm always using the latest hardware (days ago it's
> R9 5900X, now it's i7 13700K) they may have something a little more
> strict on the ordering.
> 
> So this patch is still just for David to verify the behavior, and if
> this one really solved the problem, it's better to be folded into
> "btrfs: raid56: switch recovery path to a single function" and
> "btrfs: raid56: introduce the a main entrance for rmw path".
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Some extra explanation on the possible problem of the old code.

For example, we a 3 disks raid5, and is doing an RMW for partial write.
Here we have submitted 3 bios for the final writes to all disks.

    IO thread 1  |   IO thread 2   |   IO thread 3   |  Main thread
----------------+-----------------+-----------------+--------------
atomic_dec()    |   atomic_dec()  |  atomic_dec()   |
wake_up()       |                 |                 |
                 |                 |                 | wait_event()
                 |                 |                 | returned.
                 |                 |                 | rbio_orig_end_io()
                 |                 |                 | |- free_raid_bio()
                 |                 |  wake_up()      |
                 |   wake_up()

In above case, the atomic_dec() from all IO threads have decreased the
rbio->stripes_pending to zero.

Then the IO thread1 wake up the main thread, and wait_event() returned 
before the other 2 IO threads even called wake_up().

And since the main thread is already near its end of lifespan, it calls
rbio_orig_end_io() and will free the rbio itself.
Such freeing can even happen before IO threads 2 and 3 calling wake_up().

Then in above case, wake_up() call will happen on some memory already 
freed, thus use-after-free.

And the new code will solve it by ensuring there is one and only one IO 
thread to call wake_up():

    IO thread 1  |   IO thread 2   |   IO thread 3   |  Main thread
----------------+-----------------+-----------------+--------------
dec_and_test()  |                 |                 |
                 | dec_and_test()  |                 |
                 |                 |  dec_and_test() |
                 |                 |  wake_up()      |
                 |                 |                 | wait_event()
                 |                 |                 | returned.
                 |                 |                 | rbio_orig_end_io()
                 |                 |                 | |- free_raid_bio()

By this, the first two threads calling atomic_dec_and_test() will only 
decrease the value of stripes_pending atomic, but not call wake_up() as 
the value is not yet 0.

Only the last one calling atomic_dec_and_test() will return true, and 
call wake_up() to wake up the main thread.

By this, we avoided the use-after-free.

Although I can not yet explain the stack dump reported by David yet, as 
according to my analyse, the KASAN should be triggered during wake_up() 
call.
At least this patch should close one race window, and I hope that's the 
only problem.

Thanks,
Qu

> ---
>   fs/btrfs/raid56.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 4a7932240d42..11be5d0a7eab 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1437,8 +1437,8 @@ static void raid_wait_read_end_io(struct bio *bio)
>   		set_bio_pages_uptodate(rbio, bio);
>   
>   	bio_put(bio);
> -	atomic_dec(&rbio->stripes_pending);
> -	wake_up(&rbio->io_wait);
> +	if (atomic_dec_and_test(&rbio->stripes_pending))
> +		wake_up(&rbio->io_wait);
>   }
>   
>   static void submit_read_bios(struct btrfs_raid_bio *rbio,
> @@ -2078,8 +2078,8 @@ static void raid_wait_write_end_io(struct bio *bio)
>   	if (err)
>   		rbio_update_error_bitmap(rbio, bio);
>   	bio_put(bio);
> -	atomic_dec(&rbio->stripes_pending);
> -	wake_up(&rbio->io_wait);
> +	if (atomic_dec_and_test(&rbio->stripes_pending))
> +		wake_up(&rbio->io_wait);
>   }
>   
>   static void submit_write_bios(struct btrfs_raid_bio *rbio,

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: raid56: use atomic_dec_and_test() in end io functions
  2022-11-09 23:39 [PATCH] btrfs: raid56: use atomic_dec_and_test() in end io functions Qu Wenruo
  2022-11-11  8:07 ` Qu Wenruo
@ 2022-11-11 14:07 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: David Sterba @ 2022-11-11 14:07 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, dsterba

On Thu, Nov 10, 2022 at 07:39:38AM +0800, Qu Wenruo wrote:
> !!! DON'T MERGE AS IS !!!
> 
> The latest btrfs raid56 refactor is using atomic_dec() then
> unconditionally call wake_up() to let the main thread to check if all
> the IO is done.
> 
> But atomic_dec() itself is not fully ordered, thus it can have an
> impact on the lifespan of the rbio, which causes use-after-free and
> crash in the raid6 path.
> 
> Unfortunately I don't have a solid environment to reproduce the problem
> (even with KASAN enabled).
> My guess is, since I'm always using the latest hardware (days ago it's
> R9 5900X, now it's i7 13700K) they may have something a little more
> strict on the ordering.
> 
> So this patch is still just for David to verify the behavior, and if
> this one really solved the problem, it's better to be folded into
> "btrfs: raid56: switch recovery path to a single function" and
> "btrfs: raid56: introduce the a main entrance for rmw path".

Fixups folded, thanks. So far several runs have passed so I'll push it
to misc-next and we'll see if it reproduces on other machines.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-11 14:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-09 23:39 [PATCH] btrfs: raid56: use atomic_dec_and_test() in end io functions Qu Wenruo
2022-11-11  8:07 ` Qu Wenruo
2022-11-11 14:07 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.