linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* KASAN: null-ptr-deref Write in io_uring_cancel_sqpoll
@ 2021-02-26  9:33 syzbot
       [not found] ` <e939af11-7ce8-46af-8c76-651add0ae56bn@googlegroups.com>
  0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2021-02-26  9:33 UTC (permalink / raw)
  To: asml.silence, axboe, io-uring, linux-kernel, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    d01f2f7e Add linux-next specific files for 20210226
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=108dc5a8d00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a1746d2802a82a05
dashboard link: https://syzkaller.appspot.com/bug?extid=be51ca5a4d97f017cd50

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+be51ca5a4d97f017cd50@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: null-ptr-deref in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
BUG: KASAN: null-ptr-deref in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
BUG: KASAN: null-ptr-deref in io_uring_cancel_sqpoll+0x2c7/0x450 fs/io_uring.c:8871
Write of size 4 at addr 0000000000000110 by task iou-sqp-19439/19447

CPU: 0 PID: 19447 Comm: iou-sqp-19439 Not tainted 5.11.0-next-20210226-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0xfa/0x151 lib/dump_stack.c:120
 __kasan_report mm/kasan/report.c:403 [inline]
 kasan_report.cold+0x5f/0xd8 mm/kasan/report.c:416
 check_region_inline mm/kasan/generic.c:180 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
 instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
 atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
 io_uring_cancel_sqpoll+0x2c7/0x450 fs/io_uring.c:8871
 io_sq_thread+0x1109/0x1ae0 fs/io_uring.c:6782
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
==================================================================
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 19447 Comm: iou-sqp-19439 Tainted: G    B             5.11.0-next-20210226-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0xfa/0x151 lib/dump_stack.c:120
 panic+0x306/0x73d kernel/panic.c:231
 end_report mm/kasan/report.c:102 [inline]
 end_report.cold+0x5a/0x5a mm/kasan/report.c:88
 __kasan_report mm/kasan/report.c:406 [inline]
 kasan_report.cold+0x6a/0xd8 mm/kasan/report.c:416
 check_region_inline mm/kasan/generic.c:180 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
 instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
 atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
 io_uring_cancel_sqpoll+0x2c7/0x450 fs/io_uring.c:8871
 io_sq_thread+0x1109/0x1ae0 fs/io_uring.c:6782
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KASAN: null-ptr-deref Write in io_uring_cancel_sqpoll
       [not found] ` <e939af11-7ce8-46af-8c76-651add0ae56bn@googlegroups.com>
@ 2021-04-27  6:29   ` Dmitry Vyukov
  2021-04-27  7:05     ` Palash Oswal
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2021-04-27  6:29 UTC (permalink / raw)
  To: Palash Oswal, Jens Axboe, io-uring, LKML
  Cc: syzkaller-bugs, syzbot+be51ca5a4d97f017cd50

On Mon, Apr 26, 2021 at 5:58 PM Palash Oswal <oswalpalash@gmail.com> wrote:
> On Friday, February 26, 2021 at 3:03:16 PM UTC+5:30 syzbot wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: d01f2f7e Add linux-next specific files for 20210226
>> git tree: linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=108dc5a8d00000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=a1746d2802a82a05
>> dashboard link: https://syzkaller.appspot.com/bug?extid=be51ca5a4d97f017cd50
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+be51ca...@syzkaller.appspotmail.com
>>
>> ==================================================================
>> BUG: KASAN: null-ptr-deref in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
>> BUG: KASAN: null-ptr-deref in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
>> BUG: KASAN: null-ptr-deref in io_uring_cancel_sqpoll+0x2c7/0x450 fs/io_uring.c:8871
>> Write of size 4 at addr 0000000000000110 by task iou-sqp-19439/19447
>>
>> CPU: 0 PID: 19447 Comm: iou-sqp-19439 Not tainted 5.11.0-next-20210226-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>> __dump_stack lib/dump_stack.c:79 [inline]
>> dump_stack+0xfa/0x151 lib/dump_stack.c:120
>> __kasan_report mm/kasan/report.c:403 [inline]
>> kasan_report.cold+0x5f/0xd8 mm/kasan/report.c:416
>> check_region_inline mm/kasan/generic.c:180 [inline]
>> kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
>> instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
>> atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
>> io_uring_cancel_sqpoll+0x2c7/0x450 fs/io_uring.c:8871
>> io_sq_thread+0x1109/0x1ae0 fs/io_uring.c:6782
>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>> ==================================================================
>> Kernel panic - not syncing: panic_on_warn set ...
>> CPU: 0 PID: 19447 Comm: iou-sqp-19439 Tainted: G B 5.11.0-next-20210226-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>> __dump_stack lib/dump_stack.c:79 [inline]
>> dump_stack+0xfa/0x151 lib/dump_stack.c:120
>> panic+0x306/0x73d kernel/panic.c:231
>> end_report mm/kasan/report.c:102 [inline]
>> end_report.cold+0x5a/0x5a mm/kasan/report.c:88
>> __kasan_report mm/kasan/report.c:406 [inline]
>> kasan_report.cold+0x6a/0xd8 mm/kasan/report.c:416
>> check_region_inline mm/kasan/generic.c:180 [inline]
>> kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
>> instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
>> atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
>> io_uring_cancel_sqpoll+0x2c7/0x450 fs/io_uring.c:8871
>> io_sq_thread+0x1109/0x1ae0 fs/io_uring.c:6782
>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>> Kernel Offset: disabled
>> Rebooting in 86400 seconds..
>>
>>
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzk...@googlegroups.com.
>>
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
>
> My syzkaller instance reported a syz-repro for this bug:
> Syzkaller reproducer: # {Threaded:true Collide:true Repeat:true RepeatTimes:0 Procs:2 Slowdown:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:true NetDevices:true NetReset:true Cgroups:true BinfmtMisc:true CloseFDs:true KCSAN:false DevlinkPCI:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:true UseTmpDir:true HandleSegv:true Repro:false Trace:false}
> r0 = fsmount(0xffffffffffffffff, 0x1, 0xc)
> syz_io_uring_setup(0x329b, &(0x7f0000000080)={0x0, 0x850e, 0x2, 0x2, 0x1b4}, &(0x7f0000ffc000/0x4000)=nil, &(0x7f0000ffa000/0x4000)=nil, 0x0, 0x0)
> syz_io_uring_setup(0x3de2, &(0x7f0000001480)={0x0, 0x4f62, 0x4, 0x2, 0x75}, &(0x7f0000ffb000/0x3000)=nil, &(0x7f0000ffd000/0x3000)=nil, 0x0, 0x0)
> fsetxattr$trusted_overlay_nlink(r0, &(0x7f0000000140), 0x0, 0x0, 0x0)
>
> I'm working to get a c reproducer for it that is consistent. This syz-repro does not produce a working reproducer for me just yet.
> Initial suspicion is that io_sq_thread_stop sets set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
> And subsequently after a return from fork, where the process receives a SIGKILL and io_uring_cancel_sqpoll(ctx) is called with a NULL ctx in io_sq_thread(). I haven't connected all of the dots yet, working on it.

+kernel lists and syzbot email
(almost nobody is reading syzkaller-bugs@ itself)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KASAN: null-ptr-deref Write in io_uring_cancel_sqpoll
  2021-04-27  6:29   ` Dmitry Vyukov
@ 2021-04-27  7:05     ` Palash Oswal
  2021-04-27  8:37       ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: Palash Oswal @ 2021-04-27  7:05 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jens Axboe, io-uring, LKML, syzkaller-bugs, syzbot+be51ca5a4d97f017cd50

> +kernel lists and syzbot email
> (almost nobody is reading syzkaller-bugs@ itself)
Thanks Dmitry. I used "reply-all" in the google groups UI, and I
didn't check the cc list before hitting send :/

I have made progress on this bug. I applied a diff (to print some
debug values) on the v5.12 fs/io_uring.c code and got a fairly
consistent reproducer on a non-kvm based qemu VM (had to slow down the
rate of syscalls processed for this to trigger early).

My initial speculation was very wrong. And the real issue seems to be
that current->io_uring is unset when io_uring_cancel_sqpoll is called.

Adding the c reproducer here:
#define _GNU_SOURCE

#include <endian.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
#include <time.h>
#include <sys/wait.h>

#define sys_io_uring_setup 425
#define WAIT_FLAGS __WALL

static unsigned long long procid;

static void sleep_ms(uint64_t ms)
{
    usleep(ms * 1000);
}


static void kill_and_wait(int pid, int* status)
{
    kill(-pid, SIGKILL);
    kill(pid, SIGKILL);
    for (int i = 0; i < 100; i++) {
        if (waitpid(-1, status, WNOHANG | __WALL) == pid)
            return;
        usleep(1000);
    }
    while (waitpid(-1, status, __WALL) != pid) {
    }
}

static void execute_one(void) {
    *(uint32_t*)0x20000084 = 0x850e;
    *(uint32_t*)0x20000088 = 2;
    *(uint32_t*)0x2000008c = 2;
    *(uint32_t*)0x20000090 = 0x1b4;
    *(uint32_t*)0x20000098 = -1;
    memset((void*)0x2000009c, 0, 12);
    syscall(sys_io_uring_setup, 0x329b, 0x20000080);
}

static uint64_t current_time_ms(void)
{
    struct timespec ts;
    if (clock_gettime(CLOCK_MONOTONIC, &ts))
        exit(1);
    return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}


static void loop(void)
{
    int iter = 0;
    for (;; iter++) {
        int pid = fork();
        if (pid < 0)
            exit(1);
        if (pid == 0) {
            execute_one();
            exit(0);
        }
        int status = 0;
        uint64_t start = current_time_ms();
        for (;;) {
            if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
                break;
            sleep_ms(1);
            kill_and_wait(pid, &status);
            break;
        }
    }
}




int main(void)
{
    syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
    for (procid = 0; procid < 8; procid++) {
        if (fork() == 0) {
            loop();
        }
    }
    sleep(1000000);
    return 0;
}


Changes to the kernel on v5.12:
diff --git a/fs/io_uring.c b/fs/io_uring.c
index dff34975d86b..beff12baaebf 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6758,6 +6758,7 @@ static int io_sq_thread(void *data)
        io_run_task_work_head(&sqd->park_task_work);

        while (!test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state)) {
+               pr_info("In the loop, need to stop");
                int ret;
                bool cap_entries, sqt_spin, needs_sched;

@@ -6831,7 +6832,7 @@ static int io_sq_thread(void *data)
                io_run_task_work_head(&sqd->park_task_work);
                timeout = jiffies + sqd->sq_thread_idle;
        }
-
+       printk("ctx is %d",&ctx);
        list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
                io_uring_cancel_sqpoll(ctx);
        sqd->thread = NULL;
@@ -7171,6 +7172,7 @@ static void io_sq_thread_stop(struct io_sq_data *sqd)
        WARN_ON_ONCE(sqd->thread == current);

        mutex_lock(&sqd->lock);
+       pr_info("Setting Doom bit");
        set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
        if (sqd->thread)
                wake_up_process(sqd->thread);
@@ -8994,7 +8996,11 @@ void __io_uring_files_cancel(struct files_struct *files)
 static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
 {
        struct io_sq_data *sqd = ctx->sq_data;
+       printk("io_uring_cancel_sqpoll called with ctx :%d\n",&ctx);
+       printk("ctx->sq_data is %p \n",ctx->sq_data);
        struct io_uring_task *tctx = current->io_uring;
+       printk("current is %p\n", current);
+       printk("current->io_uring is :%p\n",current->io_uring);
        s64 inflight;
        DEFINE_WAIT(wait);

@@ -9548,7 +9554,6 @@ static int io_uring_create(unsigned entries,
struct io_uring_params *p,
        ret = io_allocate_scq_urings(ctx, p);
        if (ret)
                goto err;
-
        ret = io_sq_offload_create(ctx, p);
        if (ret)
                goto err;


Some debug console logs:
[   58.455071] ctx is 39386608
[   58.455415] io_uring_cancel_sqpoll called with ctx :39386080
[   58.455913] ctx->sq_data is 00000000c9f086d5
[   58.456146] current is 000000005e7bf8a0
[   58.456346] current->io_uring is :0000000000000000
[   58.457244] ==================================================================
[   58.457663] BUG: KASAN: null-ptr-deref in io_uring_cancel_sqpoll+0x341/0x4b0
[   58.458214] Write of size 4 at addr 0000000000000060 by task
iou-sqp-1857/1860
[   58.458699]
[   58.459061] CPU: 1 PID: 1860 Comm: iou-sqp-1857 Not tainted 5.12.0+ #83
[   58.459522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.14.0-1 04/01/2014
[   58.460065] Call Trace:
[   58.460402]  dump_stack+0xe9/0x168
[   58.460749]  ? io_uring_cancel_sqpoll+0x341/0x4b0
[   58.460846]  __kasan_report+0x166/0x1c0
[   58.460846]  ? io_uring_cancel_sqpoll+0x341/0x4b0
[   58.460846]  kasan_report+0x4f/0x70
[   58.460846]  kasan_check_range+0x2f3/0x340
[   58.460846]  __kasan_check_write+0x14/0x20
[   58.460846]  io_uring_cancel_sqpoll+0x341/0x4b0
[   58.460846]  ? io_sq_thread_unpark+0xd0/0xd0
[   58.460846]  ? init_wait_entry+0xe0/0xe0
[   58.460846]  io_sq_thread+0x1a0d/0x1c50
[   58.460846]  ? io_rsrc_put_work+0x380/0x380
[   58.460846]  ? init_wait_entry+0xe0/0xe0
[   58.460846]  ? _raw_spin_lock_irq+0xa5/0x180
[   58.460846]  ? _raw_spin_lock_irqsave+0x190/0x190
[   58.460846]  ? calculate_sigpending+0x6b/0xa0
[   58.460846]  ? io_rsrc_put_work+0x380/0x380
[   58.460846]  ret_from_fork+0x22/0x30

I'm going to look further into why this is happening.

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: KASAN: null-ptr-deref Write in io_uring_cancel_sqpoll
  2021-04-27  7:05     ` Palash Oswal
@ 2021-04-27  8:37       ` Pavel Begunkov
  2021-04-27 10:39         ` Palash Oswal
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-04-27  8:37 UTC (permalink / raw)
  To: Palash Oswal, Dmitry Vyukov
  Cc: Jens Axboe, io-uring, LKML, syzkaller-bugs, syzbot+be51ca5a4d97f017cd50



On 4/27/21 8:05 AM, Palash Oswal wrote:
>> +kernel lists and syzbot email
>> (almost nobody is reading syzkaller-bugs@ itself)
> Thanks Dmitry. I used "reply-all" in the google groups UI, and I
> didn't check the cc list before hitting send :/
> 
> I have made progress on this bug. I applied a diff (to print some
> debug values) on the v5.12 fs/io_uring.c code and got a fairly
> consistent reproducer on a non-kvm based qemu VM (had to slow down the
> rate of syscalls processed for this to trigger early).
> 
> My initial speculation was very wrong. And the real issue seems to be
> that current->io_uring is unset when io_uring_cancel_sqpoll is called.
> 
> Adding the c reproducer here:
> #define _GNU_SOURCE
> 
> #include <endian.h>
> #include <signal.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/types.h>
> #include <unistd.h>
> #include <time.h>
> #include <sys/wait.h>
> 
> #define sys_io_uring_setup 425
> #define WAIT_FLAGS __WALL
> 
> static unsigned long long procid;
> 
> static void sleep_ms(uint64_t ms)
> {
>     usleep(ms * 1000);
> }
> 
> 
> static void kill_and_wait(int pid, int* status)
> {
>     kill(-pid, SIGKILL);
>     kill(pid, SIGKILL);
>     for (int i = 0; i < 100; i++) {
>         if (waitpid(-1, status, WNOHANG | __WALL) == pid)
>             return;
>         usleep(1000);
>     }
>     while (waitpid(-1, status, __WALL) != pid) {
>     }
> }
> 
> static void execute_one(void) {
>     *(uint32_t*)0x20000084 = 0x850e;
>     *(uint32_t*)0x20000088 = 2;
>     *(uint32_t*)0x2000008c = 2;
>     *(uint32_t*)0x20000090 = 0x1b4;
>     *(uint32_t*)0x20000098 = -1;
>     memset((void*)0x2000009c, 0, 12);
>     syscall(sys_io_uring_setup, 0x329b, 0x20000080);
> }
> 
> static uint64_t current_time_ms(void)
> {
>     struct timespec ts;
>     if (clock_gettime(CLOCK_MONOTONIC, &ts))
>         exit(1);
>     return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
> }
> 
> 
> static void loop(void)
> {
>     int iter = 0;
>     for (;; iter++) {
>         int pid = fork();
>         if (pid < 0)
>             exit(1);
>         if (pid == 0) {
>             execute_one();
>             exit(0);
>         }
>         int status = 0;
>         uint64_t start = current_time_ms();
>         for (;;) {
>             if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
>                 break;
>             sleep_ms(1);
>             kill_and_wait(pid, &status);
>             break;
>         }
>     }
> }
> 
> 
> 
> 
> int main(void)
> {
>     syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
>     for (procid = 0; procid < 8; procid++) {
>         if (fork() == 0) {
>             loop();
>         }
>     }
>     sleep(1000000);
>     return 0;
> }
> Some debug console logs:
> [   58.455071] ctx is 39386608
> [   58.455415] io_uring_cancel_sqpoll called with ctx :39386080
> [   58.455913] ctx->sq_data is 00000000c9f086d5
> [   58.456146] current is 000000005e7bf8a0
> [   58.456346] current->io_uring is :0000000000000000
> [   58.457244] ==================================================================
> [   58.457663] BUG: KASAN: null-ptr-deref in io_uring_cancel_sqpoll+0x341/0x4b0
> [   58.458214] Write of size 4 at addr 0000000000000060 by task
> iou-sqp-1857/1860
> [   58.458699]
> [   58.459061] CPU: 1 PID: 1860 Comm: iou-sqp-1857 Not tainted 5.12.0+ #83
> [   58.459522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.14.0-1 04/01/2014
> [   58.460065] Call Trace:
> [   58.460402]  dump_stack+0xe9/0x168
> [   58.460749]  ? io_uring_cancel_sqpoll+0x341/0x4b0
> [   58.460846]  __kasan_report+0x166/0x1c0
> [   58.460846]  ? io_uring_cancel_sqpoll+0x341/0x4b0
> [   58.460846]  kasan_report+0x4f/0x70
> [   58.460846]  kasan_check_range+0x2f3/0x340
> [   58.460846]  __kasan_check_write+0x14/0x20
> [   58.460846]  io_uring_cancel_sqpoll+0x341/0x4b0
> [   58.460846]  ? io_sq_thread_unpark+0xd0/0xd0
> [   58.460846]  ? init_wait_entry+0xe0/0xe0
> [   58.460846]  io_sq_thread+0x1a0d/0x1c50
> [   58.460846]  ? io_rsrc_put_work+0x380/0x380
> [   58.460846]  ? init_wait_entry+0xe0/0xe0
> [   58.460846]  ? _raw_spin_lock_irq+0xa5/0x180
> [   58.460846]  ? _raw_spin_lock_irqsave+0x190/0x190
> [   58.460846]  ? calculate_sigpending+0x6b/0xa0
> [   58.460846]  ? io_rsrc_put_work+0x380/0x380
> [   58.460846]  ret_from_fork+0x22/0x30
> 
> I'm going to look further into why this is happening.

io_sq_offload_create() {
    ...
    ret = io_uring_alloc_task_context(tsk, ctx);
    wake_up_new_task(tsk);
    if (ret)
        goto err;
}

Shouldn't happen unless offload create has failed. Just add
a return in *cancel_sqpoll() for this case. It's failing
so no requests has been submitted and no cancellation is needed.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KASAN: null-ptr-deref Write in io_uring_cancel_sqpoll
  2021-04-27  8:37       ` Pavel Begunkov
@ 2021-04-27 10:39         ` Palash Oswal
  2021-04-27 11:20           ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: Palash Oswal @ 2021-04-27 10:39 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Dmitry Vyukov, Jens Axboe, io-uring, LKML, syzkaller-bugs,
	syzbot+be51ca5a4d97f017cd50

On Tue, Apr 27, 2021 at 2:07 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> io_sq_offload_create() {
>     ...
>     ret = io_uring_alloc_task_context(tsk, ctx);
>     wake_up_new_task(tsk);
>     if (ret)
>         goto err;
> }
>
> Shouldn't happen unless offload create has failed. Just add
> a return in *cancel_sqpoll() for this case. It's failing
> so no requests has been submitted and no cancellation is needed.

io_uring_cancel_sqpoll can be called by two flows:
1. io_uring_task_cancel() -> io_sqpoll_cancel_sync() ->
io_uring_cancel_sqpoll ;  which properly sanitises current->io_uring
to be non NULL. (
https://elixir.bootlin.com/linux/v5.12/source/include/linux/io_uring.h#L21
)
2. io_sq_offload_create -> io_sq_thread -> io_uring_cancel_sqpoll ;
which does not check the value of current->io_uring

In the second flow,
https://elixir.bootlin.com/linux/v5.12/source/fs/io_uring.c#L7970
The initialization of current->io_uring (i.e
io_uring_alloc_task_context() ) happens after calling io_sq_thread.
And, therefore io_uring_cancel_sqpoll receives a NULL value for
current->io_uring.

The backtrace from the crash confirms the second scenario:
[   70.661551] ==================================================================
[   70.662764] BUG: KASAN: null-ptr-deref in io_uring_cancel_sqpoll+0x203/0x350
[   70.663834] Write of size 4 at addr 0000000000000060 by task iou-sqp-750/755
[   70.664025]
[   70.664025] CPU: 1 PID: 755 Comm: iou-sqp-750 Not tainted 5.12.0 #101
[   70.664025] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.14.0-1 04/01/2014
[   70.664025] Call Trace:
[   70.664025]  dump_stack+0xe9/0x168
[   70.664025]  ? io_uring_cancel_sqpoll+0x203/0x350
[   70.664025]  __kasan_report+0x166/0x1c0
[   70.664025]  ? io_uring_cancel_sqpoll+0x203/0x350
[   70.664025]  kasan_report+0x4f/0x70
[   70.664025]  kasan_check_range+0x2f3/0x340
[   70.664025]  __kasan_check_write+0x14/0x20
[   70.664025]  io_uring_cancel_sqpoll+0x203/0x350
[   70.664025]  ? io_sq_thread_unpark+0xd0/0xd0
[   70.664025]  ? mutex_lock+0xbb/0x130
[   70.664025]  ? init_wait_entry+0xe0/0xe0
[   70.664025]  ? wait_for_completion_killable_timeout+0x20/0x20
[   70.664025]  io_sq_thread+0x174c/0x18c0
[   70.664025]  ? io_rsrc_put_work+0x380/0x380
[   70.664025]  ? init_wait_entry+0xe0/0xe0
[   70.664025]  ? _raw_spin_lock_irq+0xa5/0x180
[   70.664025]  ? _raw_spin_lock_irqsave+0x190/0x190
[   70.664025]  ? calculate_sigpending+0x6b/0xa0
[   70.664025]  ? io_rsrc_put_work+0x380/0x380
[   70.664025]  ret_from_fork+0x22/0x30

We might want to add additional validation before calling
io_uring_cancel_sqpoll. I did verify that the reproducer stopped
producing the bug after the following change.
---
diff --git a/fs/io_uring.c b/fs/io_uring.c
index dff34975d86b..36fc9abe8022 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6832,8 +6832,10 @@ static int io_sq_thread(void *data)
                timeout = jiffies + sqd->sq_thread_idle;
        }

-       list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
-               io_uring_cancel_sqpoll(ctx);
+       list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
+               if (current->io_uring)
+                       io_uring_cancel_sqpoll(ctx);
+       }
        sqd->thread = NULL;
        list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
                io_ring_set_wakeup_flag(ctx);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: KASAN: null-ptr-deref Write in io_uring_cancel_sqpoll
  2021-04-27 10:39         ` Palash Oswal
@ 2021-04-27 11:20           ` Pavel Begunkov
  2021-04-27 12:51             ` [PATCH 5.13] io_uring: Check current->io_uring " Palash Oswal
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-04-27 11:20 UTC (permalink / raw)
  To: Palash Oswal
  Cc: Dmitry Vyukov, Jens Axboe, io-uring, LKML, syzkaller-bugs,
	syzbot+be51ca5a4d97f017cd50

On 4/27/21 11:39 AM, Palash Oswal wrote:
> On Tue, Apr 27, 2021 at 2:07 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> io_sq_offload_create() {
>>     ...
>>     ret = io_uring_alloc_task_context(tsk, ctx);
>>     wake_up_new_task(tsk);
>>     if (ret)
>>         goto err;
>> }
>>
>> Shouldn't happen unless offload create has failed. Just add
>> a return in *cancel_sqpoll() for this case. It's failing
>> so no requests has been submitted and no cancellation is needed.
> 
> io_uring_cancel_sqpoll can be called by two flows:
> 1. io_uring_task_cancel() -> io_sqpoll_cancel_sync() ->
> io_uring_cancel_sqpoll ;  which properly sanitises current->io_uring
> to be non NULL. (
> https://elixir.bootlin.com/linux/v5.12/source/include/linux/io_uring.h#L21
> )
> 2. io_sq_offload_create -> io_sq_thread -> io_uring_cancel_sqpoll ;
> which does not check the value of current->io_uring
> 
> In the second flow,
> https://elixir.bootlin.com/linux/v5.12/source/fs/io_uring.c#L7970
> The initialization of current->io_uring (i.e
> io_uring_alloc_task_context() ) happens after calling io_sq_thread.
> And, therefore io_uring_cancel_sqpoll receives a NULL value for
> current->io_uring.

Right, exactly as I wrote in the previous message. And still placing
the check in io_uring_cancel_sqpoll() is a better (safer) option.

Just send a patch for 5.13 and mark it stable

> 
> The backtrace from the crash confirms the second scenario:
> [   70.661551] ==================================================================
> [   70.662764] BUG: KASAN: null-ptr-deref in io_uring_cancel_sqpoll+0x203/0x350
> [   70.663834] Write of size 4 at addr 0000000000000060 by task iou-sqp-750/755
> [   70.664025]
> [   70.664025] CPU: 1 PID: 755 Comm: iou-sqp-750 Not tainted 5.12.0 #101
> [   70.664025] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.14.0-1 04/01/2014
> [   70.664025] Call Trace:
> [   70.664025]  dump_stack+0xe9/0x168
> [   70.664025]  ? io_uring_cancel_sqpoll+0x203/0x350
> [   70.664025]  __kasan_report+0x166/0x1c0
> [   70.664025]  ? io_uring_cancel_sqpoll+0x203/0x350
> [   70.664025]  kasan_report+0x4f/0x70
> [   70.664025]  kasan_check_range+0x2f3/0x340
> [   70.664025]  __kasan_check_write+0x14/0x20
> [   70.664025]  io_uring_cancel_sqpoll+0x203/0x350
> [   70.664025]  ? io_sq_thread_unpark+0xd0/0xd0
> [   70.664025]  ? mutex_lock+0xbb/0x130
> [   70.664025]  ? init_wait_entry+0xe0/0xe0
> [   70.664025]  ? wait_for_completion_killable_timeout+0x20/0x20
> [   70.664025]  io_sq_thread+0x174c/0x18c0
> [   70.664025]  ? io_rsrc_put_work+0x380/0x380
> [   70.664025]  ? init_wait_entry+0xe0/0xe0
> [   70.664025]  ? _raw_spin_lock_irq+0xa5/0x180
> [   70.664025]  ? _raw_spin_lock_irqsave+0x190/0x190
> [   70.664025]  ? calculate_sigpending+0x6b/0xa0
> [   70.664025]  ? io_rsrc_put_work+0x380/0x380
> [   70.664025]  ret_from_fork+0x22/0x30
> 
> We might want to add additional validation before calling
> io_uring_cancel_sqpoll. I did verify that the reproducer stopped
> producing the bug after the following change.
> ---
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index dff34975d86b..36fc9abe8022 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6832,8 +6832,10 @@ static int io_sq_thread(void *data)
>                 timeout = jiffies + sqd->sq_thread_idle;
>         }
> 
> -       list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
> -               io_uring_cancel_sqpoll(ctx);
> +       list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
> +               if (current->io_uring)
> +                       io_uring_cancel_sqpoll(ctx);
> +       }
>         sqd->thread = NULL;
>         list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
>                 io_ring_set_wakeup_flag(ctx);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5.13] io_uring: Check current->io_uring in io_uring_cancel_sqpoll
  2021-04-27 11:20           ` Pavel Begunkov
@ 2021-04-27 12:51             ` Palash Oswal
  2021-04-27 13:08               ` Pavel Begunkov
  2021-04-27 13:37               ` Jens Axboe
  0 siblings, 2 replies; 12+ messages in thread
From: Palash Oswal @ 2021-04-27 12:51 UTC (permalink / raw)
  To: asml.silence
  Cc: axboe, dvyukov, io-uring, linux-kernel, oswalpalash,
	syzbot+be51ca5a4d97f017cd50, syzkaller-bugs, Palash Oswal,
	stable

syzkaller identified KASAN: null-ptr-deref Write in
io_uring_cancel_sqpoll on v5.12

io_uring_cancel_sqpoll is called by io_sq_thread before calling
io_uring_alloc_task_context. This leads to current->io_uring being
NULL. io_uring_cancel_sqpoll should not have to deal with threads
where current->io_uring is NULL.

In order to cast a wider safety net, perform input sanitisation
directly in io_uring_cancel_sqpoll and return for NULL value of
current->io_uring.

Reported-by: syzbot+be51ca5a4d97f017cd50@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Palash Oswal <hello@oswalpalash.com>
---
 fs/io_uring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index dff34975d86b..eccad51b7954 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8998,6 +8998,8 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
 	s64 inflight;
 	DEFINE_WAIT(wait);
 
+	if (!current->io_uring)
+		return;
 	WARN_ON_ONCE(!sqd || ctx->sq_data->thread != current);
 
 	atomic_inc(&tctx->in_idle);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 5.13] io_uring: Check current->io_uring in io_uring_cancel_sqpoll
  2021-04-27 12:51             ` [PATCH 5.13] io_uring: Check current->io_uring " Palash Oswal
@ 2021-04-27 13:08               ` Pavel Begunkov
  2021-04-27 13:37               ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-04-27 13:08 UTC (permalink / raw)
  To: Palash Oswal
  Cc: axboe, dvyukov, io-uring, linux-kernel, oswalpalash,
	syzbot+be51ca5a4d97f017cd50, syzkaller-bugs, stable

On 4/27/21 1:51 PM, Palash Oswal wrote:
> syzkaller identified KASAN: null-ptr-deref Write in
> io_uring_cancel_sqpoll on v5.12
> 
> io_uring_cancel_sqpoll is called by io_sq_thread before calling
> io_uring_alloc_task_context. This leads to current->io_uring being
> NULL. io_uring_cancel_sqpoll should not have to deal with threads
> where current->io_uring is NULL.
> 
> In order to cast a wider safety net, perform input sanitisation
> directly in io_uring_cancel_sqpoll and return for NULL value of
> current->io_uring.

Looks good to me, but better to add a comment why it can be ignored,
e.g. "can skip it as it couldn't have submitted requests without tctx"

Also a nit: s/current->io_uring/tctx/

> 
> Reported-by: syzbot+be51ca5a4d97f017cd50@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Palash Oswal <hello@oswalpalash.com>
> ---
>  fs/io_uring.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index dff34975d86b..eccad51b7954 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8998,6 +8998,8 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
>  	s64 inflight;
>  	DEFINE_WAIT(wait);
>  
> +	if (!current->io_uring)
> +		return;
>  	WARN_ON_ONCE(!sqd || ctx->sq_data->thread != current);
>  
>  	atomic_inc(&tctx->in_idle);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5.13] io_uring: Check current->io_uring in io_uring_cancel_sqpoll
  2021-04-27 12:51             ` [PATCH 5.13] io_uring: Check current->io_uring " Palash Oswal
  2021-04-27 13:08               ` Pavel Begunkov
@ 2021-04-27 13:37               ` Jens Axboe
  2021-04-27 17:00                 ` Pavel Begunkov
  1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2021-04-27 13:37 UTC (permalink / raw)
  To: Palash Oswal, asml.silence
  Cc: dvyukov, io-uring, linux-kernel, oswalpalash,
	syzbot+be51ca5a4d97f017cd50, syzkaller-bugs, stable

On 4/27/21 6:51 AM, Palash Oswal wrote:
> syzkaller identified KASAN: null-ptr-deref Write in
> io_uring_cancel_sqpoll on v5.12
> 
> io_uring_cancel_sqpoll is called by io_sq_thread before calling
> io_uring_alloc_task_context. This leads to current->io_uring being
> NULL. io_uring_cancel_sqpoll should not have to deal with threads
> where current->io_uring is NULL.
> 
> In order to cast a wider safety net, perform input sanitisation
> directly in io_uring_cancel_sqpoll and return for NULL value of
> current->io_uring.

Thanks applied - I augmented the commit message a bit.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5.13] io_uring: Check current->io_uring in io_uring_cancel_sqpoll
  2021-04-27 13:37               ` Jens Axboe
@ 2021-04-27 17:00                 ` Pavel Begunkov
  2021-04-27 17:00                   ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-04-27 17:00 UTC (permalink / raw)
  To: Jens Axboe, Palash Oswal
  Cc: dvyukov, io-uring, linux-kernel, oswalpalash,
	syzbot+be51ca5a4d97f017cd50, syzkaller-bugs, stable

On 4/27/21 2:37 PM, Jens Axboe wrote:
> On 4/27/21 6:51 AM, Palash Oswal wrote:
>> syzkaller identified KASAN: null-ptr-deref Write in
>> io_uring_cancel_sqpoll on v5.12
>>
>> io_uring_cancel_sqpoll is called by io_sq_thread before calling
>> io_uring_alloc_task_context. This leads to current->io_uring being
>> NULL. io_uring_cancel_sqpoll should not have to deal with threads
>> where current->io_uring is NULL.
>>
>> In order to cast a wider safety net, perform input sanitisation
>> directly in io_uring_cancel_sqpoll and return for NULL value of
>> current->io_uring.
> 
> Thanks applied - I augmented the commit message a bit.

btw, does it fixes the replied before syz report? Should 
syz fix or tag it if so.
Reported-by: syzbot+be51ca5a4d97f017cd50@syzkaller.appspotmail.com

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5.13] io_uring: Check current->io_uring in io_uring_cancel_sqpoll
  2021-04-27 17:00                 ` Pavel Begunkov
@ 2021-04-27 17:00                   ` Jens Axboe
  2021-04-27 17:04                     ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2021-04-27 17:00 UTC (permalink / raw)
  To: Pavel Begunkov, Palash Oswal
  Cc: dvyukov, io-uring, linux-kernel, oswalpalash,
	syzbot+be51ca5a4d97f017cd50, syzkaller-bugs, stable

On 4/27/21 11:00 AM, Pavel Begunkov wrote:
> On 4/27/21 2:37 PM, Jens Axboe wrote:
>> On 4/27/21 6:51 AM, Palash Oswal wrote:
>>> syzkaller identified KASAN: null-ptr-deref Write in
>>> io_uring_cancel_sqpoll on v5.12
>>>
>>> io_uring_cancel_sqpoll is called by io_sq_thread before calling
>>> io_uring_alloc_task_context. This leads to current->io_uring being
>>> NULL. io_uring_cancel_sqpoll should not have to deal with threads
>>> where current->io_uring is NULL.
>>>
>>> In order to cast a wider safety net, perform input sanitisation
>>> directly in io_uring_cancel_sqpoll and return for NULL value of
>>> current->io_uring.
>>
>> Thanks applied - I augmented the commit message a bit.
> 
> btw, does it fixes the replied before syz report? Should 
> syz fix or tag it if so.
> Reported-by: syzbot+be51ca5a4d97f017cd50@syzkaller.appspotmail.com

That tag was already there.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5.13] io_uring: Check current->io_uring in io_uring_cancel_sqpoll
  2021-04-27 17:00                   ` Jens Axboe
@ 2021-04-27 17:04                     ` Pavel Begunkov
  0 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-04-27 17:04 UTC (permalink / raw)
  To: Jens Axboe, Palash Oswal
  Cc: dvyukov, io-uring, linux-kernel, oswalpalash,
	syzbot+be51ca5a4d97f017cd50, syzkaller-bugs, stable

On 4/27/21 6:00 PM, Jens Axboe wrote:
> On 4/27/21 11:00 AM, Pavel Begunkov wrote:
>> On 4/27/21 2:37 PM, Jens Axboe wrote:
>>> On 4/27/21 6:51 AM, Palash Oswal wrote:
>>>> syzkaller identified KASAN: null-ptr-deref Write in
>>>> io_uring_cancel_sqpoll on v5.12
>>>>
>>>> io_uring_cancel_sqpoll is called by io_sq_thread before calling
>>>> io_uring_alloc_task_context. This leads to current->io_uring being
>>>> NULL. io_uring_cancel_sqpoll should not have to deal with threads
>>>> where current->io_uring is NULL.
>>>>
>>>> In order to cast a wider safety net, perform input sanitisation
>>>> directly in io_uring_cancel_sqpoll and return for NULL value of
>>>> current->io_uring.
>>>
>>> Thanks applied - I augmented the commit message a bit.
>>
>> btw, does it fixes the replied before syz report? Should 
>> syz fix or tag it if so.
>> Reported-by: syzbot+be51ca5a4d97f017cd50@syzkaller.appspotmail.com
> 
> That tag was already there.

Oh, right, missed it

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-04-27 17:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-26  9:33 KASAN: null-ptr-deref Write in io_uring_cancel_sqpoll syzbot
     [not found] ` <e939af11-7ce8-46af-8c76-651add0ae56bn@googlegroups.com>
2021-04-27  6:29   ` Dmitry Vyukov
2021-04-27  7:05     ` Palash Oswal
2021-04-27  8:37       ` Pavel Begunkov
2021-04-27 10:39         ` Palash Oswal
2021-04-27 11:20           ` Pavel Begunkov
2021-04-27 12:51             ` [PATCH 5.13] io_uring: Check current->io_uring " Palash Oswal
2021-04-27 13:08               ` Pavel Begunkov
2021-04-27 13:37               ` Jens Axboe
2021-04-27 17:00                 ` Pavel Begunkov
2021-04-27 17:00                   ` Jens Axboe
2021-04-27 17:04                     ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).