linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bdev: fix NULL pointer dereference in sync()/close() race
@ 2016-08-27  7:07 Vegard Nossum
  2016-08-27  9:03 ` Rabin Vincent
  0 siblings, 1 reply; 8+ messages in thread
From: Vegard Nossum @ 2016-08-27  7:07 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, Jan Kara, Al Viro, linux-block, linux-kernel,
	Vegard Nossum, stable

I got this with syzkaller:

    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    Dumping ftrace buffer:
       (ftrace buffer empty)
    CPU: 0 PID: 11941 Comm: syz-executor Not tainted 4.8.0-rc2+ #169
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    task: ffff880110762cc0 task.stack: ffff880102290000
    RIP: 0010:[<ffffffff81f04b7a>]  [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
    RSP: 0018:ffff880102297cd0  EFLAGS: 00010202
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90000bb4000
    RDX: 0000000000000097 RSI: 0000000000000000 RDI: 00000000000004b8
    RBP: ffff880102297cd8 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000001 R12: ffff88011a010a90
    R13: ffff88011a594568 R14: ffff88011a010890 R15: 7fffffffffffffff
    FS:  00007f2445174700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000200047c8 CR3: 0000000107eb5000 CR4: 00000000000006f0
    DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
    Stack:
     1ffff10020452f9e ffff880102297db8 ffffffff81508daa 0000000000000000
     0000000041b58ab3 ffffffff844e89e1 ffffffff81508b30 ffffed0020452001
     7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff
    Call Trace:
     [<ffffffff81508daa>] __filemap_fdatawrite_range+0x27a/0x2e0
     [<ffffffff81508b30>] ? filemap_check_errors+0xe0/0xe0
     [<ffffffff83c24b47>] ? preempt_schedule+0x27/0x30
     [<ffffffff810020ae>] ? ___preempt_schedule+0x16/0x18
     [<ffffffff81508e36>] filemap_fdatawrite+0x26/0x30
     [<ffffffff817191b0>] fdatawrite_one_bdev+0x50/0x70
     [<ffffffff817341b4>] iterate_bdevs+0x194/0x210
     [<ffffffff81719160>] ? fdatawait_one_bdev+0x70/0x70
     [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
     [<ffffffff817196be>] sys_sync+0xce/0x160
     [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
     [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
     [<ffffffff8150455a>] ? __context_tracking_exit.part.4+0x3a/0x1e0
     [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
     [<ffffffff83c3276a>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: 89 fa 48 c1 ea 03 80 3c 02 00 75 35 48 8b 9b e0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d bb b8 04 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 17 48 8b 83 b8 04 00 00 5b 5d 48 05 10 02 00 00
    RIP  [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
     RSP <ffff880102297cd0>

The problem is that sync() calls down to blk_get_backing_dev_info()
without necessarily having the block device opened (if it races with
another process doing close()):

    /**
     * blk_get_backing_dev_info - get the address of a queue's backing_dev_info
     * @bdev:       device
     *
     * Locates the passed device's request queue and returns the address of its
     * backing_dev_info.  This function can only be called if @bdev is opened      <----
     * and the return value is never NULL.
     */
    struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
    {
            struct request_queue *q = bdev_get_queue(bdev);

            return &q->backing_dev_info;
    }

bdev_get_queue() crashes on dereferencing bdev->bd_disk->queue because
->bd_disk was set to NULL when close() reached __blkdev_put().

Taking bdev->bd_mutex and checking bdev->bd_disk actually seems to be a
reliable test of whether it's safe to call filemap_fdatawrite() for the
block device inode and completely fixes the crash for me.

What I don't like about this patch is that it simply skips block devices
which we don't have any open file descriptors for. That seems wrong to
me because sync() should do writeback on (and wait for) _all_ devices,
not just the ones that we happen to have an open file descriptor for.
Imagine if we opened a device, wrote a lot of data to it, closed it,
called sync(), and sync() returns. Now we should be guaranteed the data
was written, but I'm not sure we are in this case. But maybe I've
misunderstood how the writeback mechanism works.

Another ugly thing is that we're now holding a new mutex over a
potentially big chunk of code (everything that happens inside
filemap_fdatawrite()). The only thing I can say is that it seems to
work here.

I have a reproducer that reliably triggers the problem in ~2 seconds so
I can easily test other proposed fixes if there are any. I would also be
happy to submit a new patch with some guidance on the Right Way to fix
this.

Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
---
 fs/sync.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/sync.c b/fs/sync.c
index 2a54c1f..9189eeb 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -81,7 +81,10 @@ static void sync_fs_one_sb(struct super_block *sb, void *arg)
 
 static void fdatawrite_one_bdev(struct block_device *bdev, void *arg)
 {
-	filemap_fdatawrite(bdev->bd_inode->i_mapping);
+	mutex_lock(&bdev->bd_mutex);
+	if (bdev->bd_disk)
+		filemap_fdatawrite(bdev->bd_inode->i_mapping);
+	mutex_unlock(&bdev->bd_mutex);
 }
 
 static void fdatawait_one_bdev(struct block_device *bdev, void *arg)
-- 
2.10.0.rc0.1.g07c9292

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race
  2016-08-27  7:07 [PATCH] bdev: fix NULL pointer dereference in sync()/close() race Vegard Nossum
@ 2016-08-27  9:03 ` Rabin Vincent
  2016-08-27  9:30   ` Vegard Nossum
  0 siblings, 1 reply; 8+ messages in thread
From: Rabin Vincent @ 2016-08-27  9:03 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Jens Axboe, Tejun Heo, Jan Kara, Al Viro, linux-block,
	linux-kernel, stable

On Sat, Aug 27, 2016 at 09:07:28AM +0200, Vegard Nossum wrote:
> I got this with syzkaller:
> 
>     general protection fault: 0000 [#1] PREEMPT SMP KASAN
>     Dumping ftrace buffer:
>        (ftrace buffer empty)
>     CPU: 0 PID: 11941 Comm: syz-executor Not tainted 4.8.0-rc2+ #169
>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
>     task: ffff880110762cc0 task.stack: ffff880102290000
>     RIP: 0010:[<ffffffff81f04b7a>]  [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
>     RSP: 0018:ffff880102297cd0  EFLAGS: 00010202
>     RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90000bb4000
>     RDX: 0000000000000097 RSI: 0000000000000000 RDI: 00000000000004b8
>     RBP: ffff880102297cd8 R08: 0000000000000000 R09: 0000000000000001
>     R10: 0000000000000000 R11: 0000000000000001 R12: ffff88011a010a90
>     R13: ffff88011a594568 R14: ffff88011a010890 R15: 7fffffffffffffff
>     FS:  00007f2445174700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 00000000200047c8 CR3: 0000000107eb5000 CR4: 00000000000006f0
>     DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000
>     DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>     Stack:
>      1ffff10020452f9e ffff880102297db8 ffffffff81508daa 0000000000000000
>      0000000041b58ab3 ffffffff844e89e1 ffffffff81508b30 ffffed0020452001
>      7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff
>     Call Trace:
>      [<ffffffff81508daa>] __filemap_fdatawrite_range+0x27a/0x2e0
>      [<ffffffff81508b30>] ? filemap_check_errors+0xe0/0xe0
>      [<ffffffff83c24b47>] ? preempt_schedule+0x27/0x30
>      [<ffffffff810020ae>] ? ___preempt_schedule+0x16/0x18
>      [<ffffffff81508e36>] filemap_fdatawrite+0x26/0x30
>      [<ffffffff817191b0>] fdatawrite_one_bdev+0x50/0x70
>      [<ffffffff817341b4>] iterate_bdevs+0x194/0x210
>      [<ffffffff81719160>] ? fdatawait_one_bdev+0x70/0x70
>      [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
>      [<ffffffff817196be>] sys_sync+0xce/0x160
>      [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
>      [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
>      [<ffffffff8150455a>] ? __context_tracking_exit.part.4+0x3a/0x1e0
>      [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
>      [<ffffffff83c3276a>] entry_SYSCALL64_slow_path+0x25/0x25
>     Code: 89 fa 48 c1 ea 03 80 3c 02 00 75 35 48 8b 9b e0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d bb b8 04 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 17 48 8b 83 b8 04 00 00 5b 5d 48 05 10 02 00 00
>     RIP  [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
>      RSP <ffff880102297cd0>
> 
> The problem is that sync() calls down to blk_get_backing_dev_info()
> without necessarily having the block device opened (if it races with
> another process doing close()):
> 
>     /**
>      * blk_get_backing_dev_info - get the address of a queue's backing_dev_info
>      * @bdev:       device
>      *
>      * Locates the passed device's request queue and returns the address of its
>      * backing_dev_info.  This function can only be called if @bdev is opened      <----
>      * and the return value is never NULL.
>      */
>     struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
>     {
>             struct request_queue *q = bdev_get_queue(bdev);
> 
>             return &q->backing_dev_info;
>     }
> 
> bdev_get_queue() crashes on dereferencing bdev->bd_disk->queue because
> ->bd_disk was set to NULL when close() reached __blkdev_put().
> 
> Taking bdev->bd_mutex and checking bdev->bd_disk actually seems to be a
> reliable test of whether it's safe to call filemap_fdatawrite() for the
> block device inode and completely fixes the crash for me.
> 
> What I don't like about this patch is that it simply skips block devices
> which we don't have any open file descriptors for. That seems wrong to
> me because sync() should do writeback on (and wait for) _all_ devices,
> not just the ones that we happen to have an open file descriptor for.
> Imagine if we opened a device, wrote a lot of data to it, closed it,
> called sync(), and sync() returns. Now we should be guaranteed the data
> was written, but I'm not sure we are in this case. But maybe I've
> misunderstood how the writeback mechanism works.
> 
> Another ugly thing is that we're now holding a new mutex over a
> potentially big chunk of code (everything that happens inside
> filemap_fdatawrite()). The only thing I can say is that it seems to
> work here.
> 
> I have a reproducer that reliably triggers the problem in ~2 seconds so
> I can easily test other proposed fixes if there are any. I would also be
> happy to submit a new patch with some guidance on the Right Way to fix
> this.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>

Don't know what's the right fix, but I posted a slightly different one
for the same crash some months ago:
https://patchwork.kernel.org/patch/8556941/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race
  2016-08-27  9:03 ` Rabin Vincent
@ 2016-08-27  9:30   ` Vegard Nossum
  2016-08-29 19:55     ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Vegard Nossum @ 2016-08-27  9:30 UTC (permalink / raw)
  To: Rabin Vincent
  Cc: Jens Axboe, Tejun Heo, Jan Kara, Al Viro, linux-block,
	linux-kernel, stable

On 08/27/2016 11:03 AM, Rabin Vincent wrote:
> On Sat, Aug 27, 2016 at 09:07:28AM +0200, Vegard Nossum wrote:
>> I got this with syzkaller:
>>
>>     general protection fault: 0000 [#1] PREEMPT SMP KASAN
>>     Dumping ftrace buffer:
>>        (ftrace buffer empty)
>>     CPU: 0 PID: 11941 Comm: syz-executor Not tainted 4.8.0-rc2+ #169
>>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
>>     task: ffff880110762cc0 task.stack: ffff880102290000
>>     RIP: 0010:[<ffffffff81f04b7a>]  [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
>>     RSP: 0018:ffff880102297cd0  EFLAGS: 00010202
>>     RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90000bb4000
>>     RDX: 0000000000000097 RSI: 0000000000000000 RDI: 00000000000004b8
>>     RBP: ffff880102297cd8 R08: 0000000000000000 R09: 0000000000000001
>>     R10: 0000000000000000 R11: 0000000000000001 R12: ffff88011a010a90
>>     R13: ffff88011a594568 R14: ffff88011a010890 R15: 7fffffffffffffff
>>     FS:  00007f2445174700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000
>>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>     CR2: 00000000200047c8 CR3: 0000000107eb5000 CR4: 00000000000006f0
>>     DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000
>>     DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>>     Stack:
>>      1ffff10020452f9e ffff880102297db8 ffffffff81508daa 0000000000000000
>>      0000000041b58ab3 ffffffff844e89e1 ffffffff81508b30 ffffed0020452001
>>      7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff
>>     Call Trace:
>>      [<ffffffff81508daa>] __filemap_fdatawrite_range+0x27a/0x2e0
>>      [<ffffffff81508b30>] ? filemap_check_errors+0xe0/0xe0
>>      [<ffffffff83c24b47>] ? preempt_schedule+0x27/0x30
>>      [<ffffffff810020ae>] ? ___preempt_schedule+0x16/0x18
>>      [<ffffffff81508e36>] filemap_fdatawrite+0x26/0x30
>>      [<ffffffff817191b0>] fdatawrite_one_bdev+0x50/0x70
>>      [<ffffffff817341b4>] iterate_bdevs+0x194/0x210
>>      [<ffffffff81719160>] ? fdatawait_one_bdev+0x70/0x70
>>      [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
>>      [<ffffffff817196be>] sys_sync+0xce/0x160
>>      [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
>>      [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
>>      [<ffffffff8150455a>] ? __context_tracking_exit.part.4+0x3a/0x1e0
>>      [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
>>      [<ffffffff83c3276a>] entry_SYSCALL64_slow_path+0x25/0x25
>>     Code: 89 fa 48 c1 ea 03 80 3c 02 00 75 35 48 8b 9b e0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d bb b8 04 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 17 48 8b 83 b8 04 00 00 5b 5d 48 05 10 02 00 00
>>     RIP  [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
>>      RSP <ffff880102297cd0>
>>
>> The problem is that sync() calls down to blk_get_backing_dev_info()
>> without necessarily having the block device opened (if it races with
>> another process doing close()):
>>
>>     /**
>>      * blk_get_backing_dev_info - get the address of a queue's backing_dev_info
>>      * @bdev:       device
>>      *
>>      * Locates the passed device's request queue and returns the address of its
>>      * backing_dev_info.  This function can only be called if @bdev is opened      <----
>>      * and the return value is never NULL.
>>      */
>>     struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
>>     {
>>             struct request_queue *q = bdev_get_queue(bdev);
>>
>>             return &q->backing_dev_info;
>>     }
>>
>> bdev_get_queue() crashes on dereferencing bdev->bd_disk->queue because
>> ->bd_disk was set to NULL when close() reached __blkdev_put().
>>
>> Taking bdev->bd_mutex and checking bdev->bd_disk actually seems to be a
>> reliable test of whether it's safe to call filemap_fdatawrite() for the
>> block device inode and completely fixes the crash for me.
>>
>> What I don't like about this patch is that it simply skips block devices
>> which we don't have any open file descriptors for. That seems wrong to
>> me because sync() should do writeback on (and wait for) _all_ devices,
>> not just the ones that we happen to have an open file descriptor for.
>> Imagine if we opened a device, wrote a lot of data to it, closed it,
>> called sync(), and sync() returns. Now we should be guaranteed the data
>> was written, but I'm not sure we are in this case. But maybe I've
>> misunderstood how the writeback mechanism works.
>>
>> Another ugly thing is that we're now holding a new mutex over a
>> potentially big chunk of code (everything that happens inside
>> filemap_fdatawrite()). The only thing I can say is that it seems to
>> work here.
>>
>> I have a reproducer that reliably triggers the problem in ~2 seconds so
>> I can easily test other proposed fixes if there are any. I would also be
>> happy to submit a new patch with some guidance on the Right Way to fix
>> this.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
>
> Don't know what's the right fix, but I posted a slightly different one
> for the same crash some months ago:
> https://patchwork.kernel.org/patch/8556941/
>

Ah, I'm sorry, I didn't see that.

Your patch is 100% identical to my first attempt at a fix and I can
confirm that it also fixes the problem for me.

If people who are more savvy in block/fs code could ack the locking bits
I think we should apply the patch ASAP because it's an easy local DOS if
you have (open/read) access to any block device.


Vegard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race
  2016-08-27  9:30   ` Vegard Nossum
@ 2016-08-29 19:55     ` Tejun Heo
  2016-08-29 20:49       ` Vegard Nossum
  2016-08-29 21:33       ` Vegard Nossum
  0 siblings, 2 replies; 8+ messages in thread
From: Tejun Heo @ 2016-08-29 19:55 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Rabin Vincent, Jens Axboe, Jan Kara, Al Viro, linux-block,
	linux-kernel, stable

Hello,

On Sat, Aug 27, 2016 at 11:30:22AM +0200, Vegard Nossum wrote:
> > Don't know what's the right fix, but I posted a slightly different one
> > for the same crash some months ago:
> > https://patchwork.kernel.org/patch/8556941/
> > 
> 
> Ah, I'm sorry, I didn't see that.
> 
> Your patch is 100% identical to my first attempt at a fix and I can
> confirm that it also fixes the problem for me.
> 
> If people who are more savvy in block/fs code could ack the locking bits
> I think we should apply the patch ASAP because it's an easy local DOS if
> you have (open/read) access to any block device.

I think the right thing to do there is doing blkdev_get() /
blkdev_put() around func() invocation in iterate_bdevs() rather than
holding bd_mutex across the callback.  Can you please verify whether
that works?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race
  2016-08-29 19:55     ` Tejun Heo
@ 2016-08-29 20:49       ` Vegard Nossum
  2016-08-29 20:54         ` Tejun Heo
  2016-08-29 21:33       ` Vegard Nossum
  1 sibling, 1 reply; 8+ messages in thread
From: Vegard Nossum @ 2016-08-29 20:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rabin Vincent, Jens Axboe, Jan Kara, Al Viro, linux-block,
	linux-kernel, stable

On 08/29/2016 09:55 PM, Tejun Heo wrote:
> Hello,
>
> On Sat, Aug 27, 2016 at 11:30:22AM +0200, Vegard Nossum wrote:
>>> Don't know what's the right fix, but I posted a slightly different one
>>> for the same crash some months ago:
>>> https://patchwork.kernel.org/patch/8556941/
>>>
>>
>> Ah, I'm sorry, I didn't see that.
>>
>> Your patch is 100% identical to my first attempt at a fix and I can
>> confirm that it also fixes the problem for me.
>>
>> If people who are more savvy in block/fs code could ack the locking bits
>> I think we should apply the patch ASAP because it's an easy local DOS if
>> you have (open/read) access to any block device.
>
> I think the right thing to do there is doing blkdev_get() /
> blkdev_put() around func() invocation in iterate_bdevs() rather than
> holding bd_mutex across the callback.  Can you please verify whether
> that works?

Hrmph, I tried this patch first:

     diff --git a/fs/block_dev.c b/fs/block_dev.c
     index e17bdbd..489473d 100644
     --- a/fs/block_dev.c
     +++ b/fs/block_dev.c
     @@ -1884,6 +1884,7 @@ void iterate_bdevs(void (*func)(struct 
block_device *, void *), void *arg)
             spin_lock(&blockdev_superblock->s_inode_list_lock);
             list_for_each_entry(inode, &blockdev_superblock->s_inodes, 
i_sb_list) {
                     struct address_space *mapping = inode->i_mapping;
     +               struct block_device *bdev;

                     spin_lock(&inode->i_lock);
                     if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
     @@ -1905,7 +1906,11 @@ void iterate_bdevs(void (*func)(struct 
block_device *, void *), void *arg)
                     iput(old_inode);
                     old_inode = inode;

     -               func(I_BDEV(inode), arg);
     +               bdev = bd_acquire(inode);
     +               if (bdev) {
     +                       func(bdev, arg);
     +                       bdput(bdev);
     +               }

                     spin_lock(&blockdev_superblock->s_inode_list_lock);
             }

That didn't work at all. I guess bd_acquire() would just do a bdgrab()
and not touch ->bd_holders, whereas blkdev_get() would increment
bd_holders and therefore prevent __blkdev_put() from freeing the block
device? Too confusing...

I'll give your suggestion a try.


Vegard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race
  2016-08-29 20:49       ` Vegard Nossum
@ 2016-08-29 20:54         ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2016-08-29 20:54 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Rabin Vincent, Jens Axboe, Jan Kara, Al Viro, linux-block,
	linux-kernel, stable

Hello,

On Mon, Aug 29, 2016 at 10:49:57PM +0200, Vegard Nossum wrote:
> That didn't work at all. I guess bd_acquire() would just do a bdgrab()
> and not touch ->bd_holders, whereas blkdev_get() would increment

Yeah, bdev has two different refs - one for bdev struct itself and the
other for the actual accessors.

> bd_holders and therefore prevent __blkdev_put() from freeing the block
> device? Too confusing...

I think you'll need to do the actual blkdev_get or one of its
variants.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race
  2016-08-29 19:55     ` Tejun Heo
  2016-08-29 20:49       ` Vegard Nossum
@ 2016-08-29 21:33       ` Vegard Nossum
  2016-08-29 23:49         ` Tejun Heo
  1 sibling, 1 reply; 8+ messages in thread
From: Vegard Nossum @ 2016-08-29 21:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rabin Vincent, Jens Axboe, Jan Kara, Al Viro, linux-block,
	linux-kernel, stable

On 08/29/2016 09:55 PM, Tejun Heo wrote:
> On Sat, Aug 27, 2016 at 11:30:22AM +0200, Vegard Nossum wrote:
>> If people who are more savvy in block/fs code could ack the locking bits
>> I think we should apply the patch ASAP because it's an easy local DOS if
>> you have (open/read) access to any block device.
>
> I think the right thing to do there is doing blkdev_get() /
> blkdev_put() around func() invocation in iterate_bdevs() rather than
> holding bd_mutex across the callback.  Can you please verify whether
> that works?

Didn't work for me, I kept getting use-after-free in __blkdev_get() on
bdev->bd_invalidated after it calls bdev->bd_disk->fops->open(). I tried
a few related things without much luck.

The only thing that worked for me without holding the mutex across the
call was this:

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 08ae993..586d745 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1885,6 +1885,7 @@ void iterate_bdevs(void (*func)(struct 
block_device *, void *), void *arg)
  	spin_lock(&blockdev_superblock->s_inode_list_lock);
  	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
  		struct address_space *mapping = inode->i_mapping;
+		struct block_device *bdev;

  		spin_lock(&inode->i_lock);
  		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
@@ -1906,7 +1907,19 @@ void iterate_bdevs(void (*func)(struct 
block_device *, void *), void *arg)
  		iput(old_inode);
  		old_inode = inode;

-		func(I_BDEV(inode), arg);
+		bdev = I_BDEV(inode);
+
+		mutex_lock(&bdev->bd_mutex);
+		bdev->bd_openers++;
+		bdev->bd_holders++;
+		mutex_unlock(&bdev->bd_mutex);
+
+		func(bdev, arg);
+
+		mutex_lock(&bdev->bd_mutex);
+		bdev->bd_openers--;
+		bdev->bd_holders--;
+		mutex_unlock(&bdev->bd_mutex);

  		spin_lock(&blockdev_superblock->s_inode_list_lock);
  	}

I'm guessing that's too simple to work in general (especially when you
bring in partitions and stuff; I'm just opening /dev/sr0 in my reproducer).

It's been a long day, I'll have a look tomorrow and see if I didn't just
do something stupid.


Vegard

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race
  2016-08-29 21:33       ` Vegard Nossum
@ 2016-08-29 23:49         ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2016-08-29 23:49 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Rabin Vincent, Jens Axboe, Jan Kara, Al Viro, linux-block,
	linux-kernel, stable

Hello,

On Mon, Aug 29, 2016 at 11:33:41PM +0200, Vegard Nossum wrote:
> On 08/29/2016 09:55 PM, Tejun Heo wrote:
> > I think the right thing to do there is doing blkdev_get() /
> > blkdev_put() around func() invocation in iterate_bdevs() rather than
> > holding bd_mutex across the callback.  Can you please verify whether
> > that works?
> 
> Didn't work for me, I kept getting use-after-free in __blkdev_get() on
> bdev->bd_invalidated after it calls bdev->bd_disk->fops->open(). I tried
> a few related things without much luck.

I see.  It could be that it's doing blkdev_get() on a dying device.

> The only thing that worked for me without holding the mutex across the
> call was this:
...
> +		mutex_lock(&bdev->bd_mutex);
> +		bdev->bd_openers++;
> +		bdev->bd_holders++;
> +		mutex_unlock(&bdev->bd_mutex);
> +
> +		func(bdev, arg);
> +
> +		mutex_lock(&bdev->bd_mutex);
> +		bdev->bd_openers--;
> +		bdev->bd_holders--;
> +		mutex_unlock(&bdev->bd_mutex);

And this might not be too far fetched.  I think what we want is

* Bump bd_openers if there are other users already; otherwise, skip.

* blkdev_put() after the callback is finished.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-08-29 23:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-27  7:07 [PATCH] bdev: fix NULL pointer dereference in sync()/close() race Vegard Nossum
2016-08-27  9:03 ` Rabin Vincent
2016-08-27  9:30   ` Vegard Nossum
2016-08-29 19:55     ` Tejun Heo
2016-08-29 20:49       ` Vegard Nossum
2016-08-29 20:54         ` Tejun Heo
2016-08-29 21:33       ` Vegard Nossum
2016-08-29 23:49         ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).