Re: [PATCH] md/bitmap: wait for bitmap writes to complete during the tear down sequence

From: "heming.zhao@suse.com" <heming.zhao@suse.com>
To: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>,
	linux-raid@vger.kernel.org, song@kernel.org
Cc: lidong.zhong@suse.com, xni@redhat.com, colyli@suse.com,
	martin.petersen@oracle.com
Subject: Re: [PATCH] md/bitmap: wait for bitmap writes to complete during the tear down sequence
Date: Sat, 10 Apr 2021 23:27:55 +0800	[thread overview]
Message-ID: <ba0f4827-83ae-b7e2-2230-5f4afca2538a@suse.com> (raw)
In-Reply-To: <20210408213917.GA3986@oracle.com>

On 4/9/21 5:39 AM, Sudhakar Panneerselvam wrote:
> NULL pointer dereference was observed in super_written() when it tries
> to access the mddev structure.
> 
> [The below stack trace is from an older kernel, but the problem described in
> this patch applies to the mainline kernel.]
> 
> ... ...
> 
> bio in the above stack is a bitmap write whose completion is invoked after the
> tear down sequence sets the mddev structure to NULL in rdev.
> 
> During tear down, there is an attempt to flush the bitmap writes, but it
> doesn't fully wait for all the bitmap writes to complete. For instance,
> md_bitmap_flush() is called to flush the bitmap writes, but the last call to
> md_bitmap_daemon_work() in md_bitmap_flush() could generate new bitmap writes
> for which there is no explicit wait to complete those writes. This results in a kernel
> panic when the completion routine, super_written() is called which tries to
> reference mddev in the rdev that has been set to
> NULL(in unbind_rdev_from_array() by tear down sequence).
> 
> The solution is to call md_bitmap_wait_writes() after the last call to
> md_bitmap_daemon_work() in md_bitmap_flush() to ensure there are no pending
> bitmap writes before proceeding with the tear down.
> 
> Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
> ---
>   drivers/md/md-bitmap.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index 200c5d0f08bf..e0fdc3a090c5 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -1722,6 +1722,7 @@ void md_bitmap_flush(struct mddev *mddev)
>   	md_bitmap_daemon_work(mddev);
>   	bitmap->daemon_lastrun -= sleep;
>   	md_bitmap_daemon_work(mddev);
> +	md_bitmap_wait_writes(mddev->bitmap);
>   	md_bitmap_update_sb(bitmap);
>   }
>   
> 

Hello Sudhakar,

First, let's discuss with master branch kernel.

What command or action stands for "tear down" ?
 From your description, it very like ioctl STOP_ARRAY.
Your crash was related with super_written, which is the callback for
updating array sb, not bitmap sb. in md_update_sb() there is a sync
point md_super_wait(), which will guarantee all sb bios finished successfully.

for your patch, do you check md_bitmap_free, which already done the your patch's job.

the call flow:
```
do_md_stop //by STOP_ARRAY
  + __md_stop_writes()
  |  md_bitmap_flush
  |  md_update_sb
  |   + md_super_write
  |   |  bio->bi_end_io = super_written
  |   + md_super_wait(mddev) //wait for all bios done
  + __md_stop(mddev)
  |  md_bitmap_destroy(mddev);
  |   + md_bitmap_free //see below
  + ...

md_bitmap_free
{
    ...
     //do your patch job.
     /* Shouldn't be needed - but just in case.... */
     wait_event(bitmap->write_wait,
            atomic_read(&bitmap->pending_writes) == 0);
    ...
}
```

Would you share more analysis or test results for your patch?

Thanks,
Heming