All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] cxl can not create region
@ 2022-08-05  2:20 Bobo WL
  2022-08-08 12:37   ` Jonathan Cameron via
  2022-08-08 15:58 ` Dan Williams
  0 siblings, 2 replies; 38+ messages in thread
From: Bobo WL @ 2022-08-05  2:20 UTC (permalink / raw)
  To: linux-cxl; +Cc: qemu-devel, qemu-arm

Hi list

I want to test cxl functions in arm64, and found some problems I can't
figure out.

My test environment:

1. build latest bios from https://github.com/tianocore/edk2.git master
branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2)
2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git
master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm
support patch: https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/
3. build Linux kernel from
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git preview
branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683)
4. build latest ndctl tools from https://github.com/pmem/ndctl
create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159)

And my qemu test commands:
sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \
        -cpu max -smp 8 -nographic -no-reboot \
        -kernel $KERNEL -bios $BIOS_BIN \
        -drive if=none,file=$ROOTFS,format=qcow2,id=hd \
        -device virtio-blk-pci,drive=hd -append 'root=/dev/vda1
nokaslr dyndbg="module cxl* +p"' \
        -object memory-backend-ram,size=4G,id=mem0 \
        -numa node,nodeid=0,cpus=0-7,memdev=mem0 \
        -net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \
        -object
memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M
\
        -object
memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M
\
        -object
memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
\
        -object
memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
\
        -object
memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M
\
        -object
memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M
\
        -object
memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M
\
        -object
memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M
\
        -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
        -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
        -device cxl-upstream,bus=root_port0,id=us0 \
        -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
        -device
cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
        -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
        -device
cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \
        -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
        -device
cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \
        -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
        -device
cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \
        -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k

And I have got two problems.
1. When I want to create x1 region with command: "cxl create-region -d
decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer
reference. Crash log:

[  534.697324] cxl_region region0: config state: 0
[  534.697346] cxl_region region0: probe: -6
[  534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0
[  534.699115] cxl region0: mem0:endpoint3 decoder3.0 add:
mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
[  534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
[  534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add:
mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
[  534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
[  534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
for mem0:decoder3.0 @ 0
[  534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256
[  534.699193] cxl region0: 0000:0d:00.0:port2 target[0] =
0000:0e:00.0 for mem0:decoder3.0 @ 0
[  534.699405] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
[  534.701474] Mem abort info:
[  534.701994]   ESR = 0x0000000086000004
[  534.702653]   EC = 0x21: IABT (current EL), IL = 32 bits
[  534.703616]   SET = 0, FnV = 0
[  534.704174]   EA = 0, S1PTW = 0
[  534.704803]   FSC = 0x04: level 0 translation fault
[  534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000
[  534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP
[  534.710301] Modules linked in:
[  534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted
5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11
[  534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  534.719190] pc : 0x0
[  534.719928] lr : commit_store+0x118/0x2cc
[  534.721007] sp : ffff80000aec3c30
[  534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: ffff0000c0c06b30
[  534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: ffff0000c0a29400
[  534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: ffff0000c0c06800
[  534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: 0000000000000000
[  534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffd41fe838
[  534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[  534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff0000c0906e80
[  534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000aec3bf0
[  534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c155a000
[  534.738878] Call trace:
[  534.739368]  0x0
[  534.739713]  dev_attr_store+0x1c/0x30
[  534.740186]  sysfs_kf_write+0x48/0x58
[  534.740961]  kernfs_fop_write_iter+0x128/0x184
[  534.741872]  new_sync_write+0xdc/0x158
[  534.742706]  vfs_write+0x1ac/0x2a8
[  534.743440]  ksys_write+0x68/0xf0
[  534.744328]  __arm64_sys_write+0x1c/0x28
[  534.745180]  invoke_syscall+0x44/0xf0
[  534.745989]  el0_svc_common+0x4c/0xfc
[  534.746661]  do_el0_svc+0x60/0xa8
[  534.747378]  el0_svc+0x2c/0x78
[  534.748066]  el0t_64_sync_handler+0xb8/0x12c
[  534.748919]  el0t_64_sync+0x18c/0x190
[  534.749629] Code: bad PC value
[  534.750169] ---[ end trace 0000000000000000 ]---

2. When I want to create x4 region with command: "cxl create-region -d
decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors:

cxl region: create_region: region0: failed to set target3 to mem3
cxl region: cmd_create_region: created 0 regions

And kernel log as below:
[   60.536663] cxl_region region0: config state: 0
[   60.536675] cxl_region region0: probe: -6
[   60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0
[   60.538251] cxl region0: mem0:endpoint3 decoder3.0 add:
mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
[   60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
[   60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add:
mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
[   60.538647] cxl region0: mem1:endpoint4 decoder4.0 add:
mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1
[   60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2
[   60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add:
mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1
[   60.539311] cxl region0: mem2:endpoint5 decoder5.0 add:
mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1
[   60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3
[   60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add:
mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1
[   60.539711] cxl region0: mem3:endpoint6 decoder6.0 add:
mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1
[   60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4
[   60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add:
mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1
[   60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
[   60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
for mem0:decoder3.0 @ 0
[   60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512
[   60.539758] cxl region0: 0000:0d:00.0:port2 target[0] =
0000:0e:00.0 for mem0:decoder3.0 @ 0
[   60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at 1

I have tried to write sysfs node manually, got same errors.

Hope I can get some helps here.

Bob

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-05  2:20 [BUG] cxl can not create region Bobo WL
@ 2022-08-08 12:37   ` Jonathan Cameron via
  2022-08-08 15:58 ` Dan Williams
  1 sibling, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-08 12:37 UTC (permalink / raw)
  To: Bobo WL; +Cc: linux-cxl, qemu-devel, qemu-arm

On Fri, 5 Aug 2022 10:20:23 +0800
Bobo WL <lmw.bobo@gmail.com> wrote:

> Hi list
> 
> I want to test cxl functions in arm64, and found some problems I can't
> figure out.
Hi Bob,

Glad to see people testing this code.

> 
> My test environment:
> 
> 1. build latest bios from https://github.com/tianocore/edk2.git master
> branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2)
> 2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git
> master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm
> support patch: https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/
> 3. build Linux kernel from
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git preview
> branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683)
> 4. build latest ndctl tools from https://github.com/pmem/ndctl
> create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159)
> 
> And my qemu test commands:
> sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \
>         -cpu max -smp 8 -nographic -no-reboot \
>         -kernel $KERNEL -bios $BIOS_BIN \
>         -drive if=none,file=$ROOTFS,format=qcow2,id=hd \
>         -device virtio-blk-pci,drive=hd -append 'root=/dev/vda1
> nokaslr dyndbg="module cxl* +p"' \
>         -object memory-backend-ram,size=4G,id=mem0 \
>         -numa node,nodeid=0,cpus=0-7,memdev=mem0 \
>         -net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \
>         -object
> memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M
> \
>         -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
>         -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \

Probably not related to your problem, but there is a disconnect in QEMU /
kernel assumptionsaround the presence of an HDM decoder when a HB only
has a single root port. Spec allows it to be provided or not as an implementation choice.
Kernel assumes it isn't provide. Qemu assumes it is.

The temporary solution is to throw in a second root port on the HB and not
connect anything to it.  Longer term I may special case this so that the particular
decoder defaults to pass through settings in QEMU if there is only one root port.

>         -device cxl-upstream,bus=root_port0,id=us0 \
>         -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
>         -device
> cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
>         -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
>         -device
> cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \
>         -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
>         -device
> cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \
>         -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
>         -device
> cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \
>         -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
> 
> And I have got two problems.
> 1. When I want to create x1 region with command: "cxl create-region -d
> decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer
> reference. Crash log:
> 
> [  534.697324] cxl_region region0: config state: 0
> [  534.697346] cxl_region region0: probe: -6

Seems odd this is up here.  But maybe fine.

> [  534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0
> [  534.699115] cxl region0: mem0:endpoint3 decoder3.0 add:
> mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> [  534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
> [  534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
> [  534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
> [  534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
> for mem0:decoder3.0 @ 0
> [  534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256
> [  534.699193] cxl region0: 0000:0d:00.0:port2 target[0] =
> 0000:0e:00.0 for mem0:decoder3.0 @ 0
> [  534.699405] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000000
> [  534.701474] Mem abort info:
> [  534.701994]   ESR = 0x0000000086000004
> [  534.702653]   EC = 0x21: IABT (current EL), IL = 32 bits
> [  534.703616]   SET = 0, FnV = 0
> [  534.704174]   EA = 0, S1PTW = 0
> [  534.704803]   FSC = 0x04: level 0 translation fault
> [  534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000
> [  534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> [  534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP
> [  534.710301] Modules linked in:
> [  534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted
> 5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11
> [  534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> [  534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  534.719190] pc : 0x0
> [  534.719928] lr : commit_store+0x118/0x2cc
> [  534.721007] sp : ffff80000aec3c30
> [  534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: ffff0000c0c06b30
> [  534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: ffff0000c0a29400
> [  534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: ffff0000c0c06800
> [  534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: 0000000000000000
> [  534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffd41fe838
> [  534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [  534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
> [  534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff0000c0906e80
> [  534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000aec3bf0
> [  534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c155a000
> [  534.738878] Call trace:
> [  534.739368]  0x0
> [  534.739713]  dev_attr_store+0x1c/0x30
> [  534.740186]  sysfs_kf_write+0x48/0x58
> [  534.740961]  kernfs_fop_write_iter+0x128/0x184
> [  534.741872]  new_sync_write+0xdc/0x158
> [  534.742706]  vfs_write+0x1ac/0x2a8
> [  534.743440]  ksys_write+0x68/0xf0
> [  534.744328]  __arm64_sys_write+0x1c/0x28
> [  534.745180]  invoke_syscall+0x44/0xf0
> [  534.745989]  el0_svc_common+0x4c/0xfc
> [  534.746661]  do_el0_svc+0x60/0xa8
> [  534.747378]  el0_svc+0x2c/0x78
> [  534.748066]  el0t_64_sync_handler+0xb8/0x12c
> [  534.748919]  el0t_64_sync+0x18c/0x190
> [  534.749629] Code: bad PC value
> [  534.750169] ---[ end trace 0000000000000000 ]---
> 
> 2. When I want to create x4 region with command: "cxl create-region -d
> decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors:
> 
> cxl region: create_region: region0: failed to set target3 to mem3
> cxl region: cmd_create_region: created 0 regions
> 
> And kernel log as below:
> [   60.536663] cxl_region region0: config state: 0
> [   60.536675] cxl_region region0: probe: -6
> [   60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0
> [   60.538251] cxl region0: mem0:endpoint3 decoder3.0 add:
> mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> [   60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
> [   60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
> [   60.538647] cxl region0: mem1:endpoint4 decoder4.0 add:
> mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1
> [   60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2
> [   60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1
> [   60.539311] cxl region0: mem2:endpoint5 decoder5.0 add:
> mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1
> [   60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3
> [   60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1
> [   60.539711] cxl region0: mem3:endpoint6 decoder6.0 add:
> mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1
> [   60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4
> [   60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1
> [   60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
> [   60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
> for mem0:decoder3.0 @ 0
> [   60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512

This looks like off by 1 that should be fixed in the below mentioned
cxl/pending branch.  That ig should be 256.  Note the fix was
for a test case with a fat HB and no switch, but certainly looks
like this is the same issue.

> [   60.539758] cxl region0: 0000:0d:00.0:port2 target[0] =
> 0000:0e:00.0 for mem0:decoder3.0 @ 0
> [   60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at 1
> 
> I have tried to write sysfs node manually, got same errors.
When stepping through by hand, which sysfs write triggers the crash above?

Not sure it's related, but I've just sent out a fix to the
target register handling in QEMU. 

https://lore.kernel.org/linux-cxl/20220808122051.14822-1-Jonathan.Cameron@huawei.com/T/#m47ff985412ce44559e6b04d677c302f8cd371330

I did have one instance last week of triggering what looked to be a race condition but
the stack trace doesn't looks related to what you've hit.

It will probably be a few days before I have time to take a look at replicating
what you have seen.

If you have time, try using the kernel.org cxl/pending branch as there are
a few additional fixes on there since you sent this email.  Optimistic to hope
this is covered by one of those, but at least it will mean we are trying to replicate
on same branch.

Jonathan


> 
> Hope I can get some helps here.
> 
> Bob


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-08 12:37   ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-08 12:37 UTC (permalink / raw)
  To: Bobo WL; +Cc: linux-cxl, qemu-devel, qemu-arm

On Fri, 5 Aug 2022 10:20:23 +0800
Bobo WL <lmw.bobo@gmail.com> wrote:

> Hi list
> 
> I want to test cxl functions in arm64, and found some problems I can't
> figure out.
Hi Bob,

Glad to see people testing this code.

> 
> My test environment:
> 
> 1. build latest bios from https://github.com/tianocore/edk2.git master
> branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2)
> 2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git
> master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm
> support patch: https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/
> 3. build Linux kernel from
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git preview
> branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683)
> 4. build latest ndctl tools from https://github.com/pmem/ndctl
> create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159)
> 
> And my qemu test commands:
> sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \
>         -cpu max -smp 8 -nographic -no-reboot \
>         -kernel $KERNEL -bios $BIOS_BIN \
>         -drive if=none,file=$ROOTFS,format=qcow2,id=hd \
>         -device virtio-blk-pci,drive=hd -append 'root=/dev/vda1
> nokaslr dyndbg="module cxl* +p"' \
>         -object memory-backend-ram,size=4G,id=mem0 \
>         -numa node,nodeid=0,cpus=0-7,memdev=mem0 \
>         -net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \
>         -object
> memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M
> \
>         -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
>         -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \

Probably not related to your problem, but there is a disconnect in QEMU /
kernel assumptionsaround the presence of an HDM decoder when a HB only
has a single root port. Spec allows it to be provided or not as an implementation choice.
Kernel assumes it isn't provide. Qemu assumes it is.

The temporary solution is to throw in a second root port on the HB and not
connect anything to it.  Longer term I may special case this so that the particular
decoder defaults to pass through settings in QEMU if there is only one root port.

>         -device cxl-upstream,bus=root_port0,id=us0 \
>         -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
>         -device
> cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
>         -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
>         -device
> cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \
>         -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
>         -device
> cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \
>         -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
>         -device
> cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \
>         -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
> 
> And I have got two problems.
> 1. When I want to create x1 region with command: "cxl create-region -d
> decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer
> reference. Crash log:
> 
> [  534.697324] cxl_region region0: config state: 0
> [  534.697346] cxl_region region0: probe: -6

Seems odd this is up here.  But maybe fine.

> [  534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0
> [  534.699115] cxl region0: mem0:endpoint3 decoder3.0 add:
> mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> [  534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
> [  534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
> [  534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
> [  534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
> for mem0:decoder3.0 @ 0
> [  534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256
> [  534.699193] cxl region0: 0000:0d:00.0:port2 target[0] =
> 0000:0e:00.0 for mem0:decoder3.0 @ 0
> [  534.699405] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000000
> [  534.701474] Mem abort info:
> [  534.701994]   ESR = 0x0000000086000004
> [  534.702653]   EC = 0x21: IABT (current EL), IL = 32 bits
> [  534.703616]   SET = 0, FnV = 0
> [  534.704174]   EA = 0, S1PTW = 0
> [  534.704803]   FSC = 0x04: level 0 translation fault
> [  534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000
> [  534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> [  534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP
> [  534.710301] Modules linked in:
> [  534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted
> 5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11
> [  534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> [  534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  534.719190] pc : 0x0
> [  534.719928] lr : commit_store+0x118/0x2cc
> [  534.721007] sp : ffff80000aec3c30
> [  534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: ffff0000c0c06b30
> [  534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: ffff0000c0a29400
> [  534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: ffff0000c0c06800
> [  534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: 0000000000000000
> [  534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffd41fe838
> [  534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [  534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
> [  534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff0000c0906e80
> [  534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000aec3bf0
> [  534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c155a000
> [  534.738878] Call trace:
> [  534.739368]  0x0
> [  534.739713]  dev_attr_store+0x1c/0x30
> [  534.740186]  sysfs_kf_write+0x48/0x58
> [  534.740961]  kernfs_fop_write_iter+0x128/0x184
> [  534.741872]  new_sync_write+0xdc/0x158
> [  534.742706]  vfs_write+0x1ac/0x2a8
> [  534.743440]  ksys_write+0x68/0xf0
> [  534.744328]  __arm64_sys_write+0x1c/0x28
> [  534.745180]  invoke_syscall+0x44/0xf0
> [  534.745989]  el0_svc_common+0x4c/0xfc
> [  534.746661]  do_el0_svc+0x60/0xa8
> [  534.747378]  el0_svc+0x2c/0x78
> [  534.748066]  el0t_64_sync_handler+0xb8/0x12c
> [  534.748919]  el0t_64_sync+0x18c/0x190
> [  534.749629] Code: bad PC value
> [  534.750169] ---[ end trace 0000000000000000 ]---
> 
> 2. When I want to create x4 region with command: "cxl create-region -d
> decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors:
> 
> cxl region: create_region: region0: failed to set target3 to mem3
> cxl region: cmd_create_region: created 0 regions
> 
> And kernel log as below:
> [   60.536663] cxl_region region0: config state: 0
> [   60.536675] cxl_region region0: probe: -6
> [   60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0
> [   60.538251] cxl region0: mem0:endpoint3 decoder3.0 add:
> mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> [   60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
> [   60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
> [   60.538647] cxl region0: mem1:endpoint4 decoder4.0 add:
> mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1
> [   60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2
> [   60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1
> [   60.539311] cxl region0: mem2:endpoint5 decoder5.0 add:
> mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1
> [   60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3
> [   60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1
> [   60.539711] cxl region0: mem3:endpoint6 decoder6.0 add:
> mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1
> [   60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4
> [   60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1
> [   60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
> [   60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
> for mem0:decoder3.0 @ 0
> [   60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512

This looks like off by 1 that should be fixed in the below mentioned
cxl/pending branch.  That ig should be 256.  Note the fix was
for a test case with a fat HB and no switch, but certainly looks
like this is the same issue.

> [   60.539758] cxl region0: 0000:0d:00.0:port2 target[0] =
> 0000:0e:00.0 for mem0:decoder3.0 @ 0
> [   60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at 1
> 
> I have tried to write sysfs node manually, got same errors.
When stepping through by hand, which sysfs write triggers the crash above?

Not sure it's related, but I've just sent out a fix to the
target register handling in QEMU. 

https://lore.kernel.org/linux-cxl/20220808122051.14822-1-Jonathan.Cameron@huawei.com/T/#m47ff985412ce44559e6b04d677c302f8cd371330

I did have one instance last week of triggering what looked to be a race condition but
the stack trace doesn't looks related to what you've hit.

It will probably be a few days before I have time to take a look at replicating
what you have seen.

If you have time, try using the kernel.org cxl/pending branch as there are
a few additional fixes on there since you sent this email.  Optimistic to hope
this is covered by one of those, but at least it will mean we are trying to replicate
on same branch.

Jonathan


> 
> Hope I can get some helps here.
> 
> Bob



^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [BUG] cxl can not create region
  2022-08-05  2:20 [BUG] cxl can not create region Bobo WL
  2022-08-08 12:37   ` Jonathan Cameron via
@ 2022-08-08 15:58 ` Dan Williams
  2022-08-09 13:12   ` Bobo WL
  1 sibling, 1 reply; 38+ messages in thread
From: Dan Williams @ 2022-08-08 15:58 UTC (permalink / raw)
  To: Bobo WL, linux-cxl; +Cc: qemu-devel, qemu-arm

Bobo WL wrote:
> Hi list
> 
> I want to test cxl functions in arm64, and found some problems I can't
> figure out.
> 
> My test environment:
> 
> 1. build latest bios from https://github.com/tianocore/edk2.git master
> branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2)
> 2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git
> master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm
> support patch: https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/
> 3. build Linux kernel from
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git preview
> branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683)
> 4. build latest ndctl tools from https://github.com/pmem/ndctl
> create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159)
> 
> And my qemu test commands:
> sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \
>         -cpu max -smp 8 -nographic -no-reboot \
>         -kernel $KERNEL -bios $BIOS_BIN \
>         -drive if=none,file=$ROOTFS,format=qcow2,id=hd \
>         -device virtio-blk-pci,drive=hd -append 'root=/dev/vda1
> nokaslr dyndbg="module cxl* +p"' \
>         -object memory-backend-ram,size=4G,id=mem0 \
>         -numa node,nodeid=0,cpus=0-7,memdev=mem0 \
>         -net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \
>         -object
> memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M
> \
>         -object
> memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M
> \
>         -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
>         -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
>         -device cxl-upstream,bus=root_port0,id=us0 \
>         -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
>         -device
> cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
>         -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
>         -device
> cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \
>         -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
>         -device
> cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \
>         -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
>         -device
> cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \
>         -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
> 
> And I have got two problems.
> 1. When I want to create x1 region with command: "cxl create-region -d
> decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer
> reference. Crash log:
> 
> [  534.697324] cxl_region region0: config state: 0
> [  534.697346] cxl_region region0: probe: -6
> [  534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0
> [  534.699115] cxl region0: mem0:endpoint3 decoder3.0 add:
> mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> [  534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
> [  534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
> [  534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
> [  534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
> for mem0:decoder3.0 @ 0
> [  534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256
> [  534.699193] cxl region0: 0000:0d:00.0:port2 target[0] =
> 0000:0e:00.0 for mem0:decoder3.0 @ 0
> [  534.699405] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000000
> [  534.701474] Mem abort info:
> [  534.701994]   ESR = 0x0000000086000004
> [  534.702653]   EC = 0x21: IABT (current EL), IL = 32 bits
> [  534.703616]   SET = 0, FnV = 0
> [  534.704174]   EA = 0, S1PTW = 0
> [  534.704803]   FSC = 0x04: level 0 translation fault
> [  534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000
> [  534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> [  534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP
> [  534.710301] Modules linked in:
> [  534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted
> 5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11
> [  534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> [  534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  534.719190] pc : 0x0
> [  534.719928] lr : commit_store+0x118/0x2cc
> [  534.721007] sp : ffff80000aec3c30
> [  534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: ffff0000c0c06b30
> [  534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: ffff0000c0a29400
> [  534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: ffff0000c0c06800
> [  534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: 0000000000000000
> [  534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffd41fe838
> [  534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [  534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
> [  534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff0000c0906e80
> [  534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000aec3bf0
> [  534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c155a000
> [  534.738878] Call trace:
> [  534.739368]  0x0
> [  534.739713]  dev_attr_store+0x1c/0x30
> [  534.740186]  sysfs_kf_write+0x48/0x58
> [  534.740961]  kernfs_fop_write_iter+0x128/0x184
> [  534.741872]  new_sync_write+0xdc/0x158
> [  534.742706]  vfs_write+0x1ac/0x2a8
> [  534.743440]  ksys_write+0x68/0xf0
> [  534.744328]  __arm64_sys_write+0x1c/0x28
> [  534.745180]  invoke_syscall+0x44/0xf0
> [  534.745989]  el0_svc_common+0x4c/0xfc
> [  534.746661]  do_el0_svc+0x60/0xa8
> [  534.747378]  el0_svc+0x2c/0x78
> [  534.748066]  el0t_64_sync_handler+0xb8/0x12c
> [  534.748919]  el0t_64_sync+0x18c/0x190
> [  534.749629] Code: bad PC value
> [  534.750169] ---[ end trace 0000000000000000 ]---

What was the top kernel commit when you ran this test? What is the line
number of "commit_store+0x118"?

> 2. When I want to create x4 region with command: "cxl create-region -d
> decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors:
> 
> cxl region: create_region: region0: failed to set target3 to mem3
> cxl region: cmd_create_region: created 0 regions
> 
> And kernel log as below:
> [   60.536663] cxl_region region0: config state: 0
> [   60.536675] cxl_region region0: probe: -6
> [   60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0
> [   60.538251] cxl region0: mem0:endpoint3 decoder3.0 add:
> mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> [   60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
> [   60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
> [   60.538647] cxl region0: mem1:endpoint4 decoder4.0 add:
> mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1
> [   60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2
> [   60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1
> [   60.539311] cxl region0: mem2:endpoint5 decoder5.0 add:
> mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1
> [   60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3
> [   60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1
> [   60.539711] cxl region0: mem3:endpoint6 decoder6.0 add:
> mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1
> [   60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
> mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4
> [   60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add:
> mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1
> [   60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
> [   60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
> for mem0:decoder3.0 @ 0
> [   60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512
> [   60.539758] cxl region0: 0000:0d:00.0:port2 target[0] =
> 0000:0e:00.0 for mem0:decoder3.0 @ 0
> [   60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at 1
> 
> I have tried to write sysfs node manually, got same errors.
> 
> Hope I can get some helps here.

What is the output of:

    cxl list -MDTu -d decoder0.0

...? It might be the case that mem1 cannot be mapped by decoder0.0, or
at least not in the specified order, or that validation check is broken.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-08 12:37   ` Jonathan Cameron via
  (?)
@ 2022-08-09 13:07   ` Bobo WL
  2022-08-09 16:08       ` Jonathan Cameron via
  -1 siblings, 1 reply; 38+ messages in thread
From: Bobo WL @ 2022-08-09 13:07 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-cxl, qemu-devel, qemu-arm

Hi Jonathan

Thanks for your reply!

On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> Probably not related to your problem, but there is a disconnect in QEMU /
> kernel assumptionsaround the presence of an HDM decoder when a HB only
> has a single root port. Spec allows it to be provided or not as an implementation choice.
> Kernel assumes it isn't provide. Qemu assumes it is.
>
> The temporary solution is to throw in a second root port on the HB and not
> connect anything to it.  Longer term I may special case this so that the particular
> decoder defaults to pass through settings in QEMU if there is only one root port.
>

You are right! After adding an extra HB in qemu, I can create a x1
region successfully.
But have some errors in Nvdimm:

[   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
[   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
[   74.927470] nd_region region0: nmem0: is disabled, failing probe

And x4 region still failed with same errors, using latest cxl/preview
branch don't work.
I have picked "Two CXL emulation fixes" patches in qemu, still not working.

Bob

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-08 15:58 ` Dan Williams
@ 2022-08-09 13:12   ` Bobo WL
  2022-08-09 15:17     ` Dan Williams
  0 siblings, 1 reply; 38+ messages in thread
From: Bobo WL @ 2022-08-09 13:12 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, qemu-devel, qemu-arm

Hi Dan,

Thanks for your reply!

On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> What is the output of:
>
>     cxl list -MDTu -d decoder0.0
>
> ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> at least not in the specified order, or that validation check is broken.

Command "cxl list -MDTu -d decoder0.0" output:

[
  {
    "memdevs":[
      {
        "memdev":"mem2",
        "pmem_size":"256.00 MiB (268.44 MB)",
        "ram_size":0,
        "serial":"0",
        "host":"0000:11:00.0"
      },
      {
        "memdev":"mem1",
        "pmem_size":"256.00 MiB (268.44 MB)",
        "ram_size":0,
        "serial":"0",
        "host":"0000:10:00.0"
      },
      {
        "memdev":"mem0",
        "pmem_size":"256.00 MiB (268.44 MB)",
        "ram_size":0,
        "serial":"0",
        "host":"0000:0f:00.0"
      },
      {
        "memdev":"mem3",
        "pmem_size":"256.00 MiB (268.44 MB)",
        "ram_size":0,
        "serial":"0",
        "host":"0000:12:00.0"
      }
    ]
  },
  {
    "root decoders":[
      {
        "decoder":"decoder0.0",
        "resource":"0x10000000000",
        "size":"4.00 GiB (4.29 GB)",
        "pmem_capable":true,
        "volatile_capable":true,
        "accelmem_capable":true,
        "nr_targets":1,
        "targets":[
          {
            "target":"ACPI0016:01",
            "alias":"pci0000:0c",
            "position":0,
            "id":"0xc"
          }
        ]
      }
    ]
  }
]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-09 13:12   ` Bobo WL
@ 2022-08-09 15:17     ` Dan Williams
  2022-08-11  3:10       ` Bobo WL
  2022-08-12  0:46       ` Dan Williams
  0 siblings, 2 replies; 38+ messages in thread
From: Dan Williams @ 2022-08-09 15:17 UTC (permalink / raw)
  To: Bobo WL, Dan Williams; +Cc: linux-cxl, qemu-devel, qemu-arm

Bobo WL wrote:
> Hi Dan,
> 
> Thanks for your reply!
> 
> On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > What is the output of:
> >
> >     cxl list -MDTu -d decoder0.0
> >
> > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > at least not in the specified order, or that validation check is broken.
> 
> Command "cxl list -MDTu -d decoder0.0" output:

Thanks for this, I think I know the problem, but will try some
experiments with cxl_test first.

Did the commit_store() crash stop reproducing with latest cxl/preview
branch?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-09 13:07   ` Bobo WL
@ 2022-08-09 16:08       ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-09 16:08 UTC (permalink / raw)
  To: Bobo WL; +Cc: linux-cxl, qemu-devel, qemu-arm

On Tue, 9 Aug 2022 21:07:06 +0800
Bobo WL <lmw.bobo@gmail.com> wrote:

> Hi Jonathan
> 
> Thanks for your reply!
> 
> On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > Probably not related to your problem, but there is a disconnect in QEMU /
> > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > Kernel assumes it isn't provide. Qemu assumes it is.
> >
> > The temporary solution is to throw in a second root port on the HB and not
> > connect anything to it.  Longer term I may special case this so that the particular
> > decoder defaults to pass through settings in QEMU if there is only one root port.
> >  
> 
> You are right! After adding an extra HB in qemu, I can create a x1
> region successfully.
> But have some errors in Nvdimm:
> 
> [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> [   74.927470] nd_region region0: nmem0: is disabled, failing probe

Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
down. Once I reach this state I can verify the HDM Decode is correct which is what
I've been using to test (Which wasn't true until earlier this week). 
I'm currently testing via devmem, more for historical reasons than because it makes
that much sense anymore.  

> 
> And x4 region still failed with same errors, using latest cxl/preview
> branch don't work.
> I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> 
> Bob

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-09 16:08       ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-09 16:08 UTC (permalink / raw)
  To: Bobo WL; +Cc: linux-cxl, qemu-devel, qemu-arm

On Tue, 9 Aug 2022 21:07:06 +0800
Bobo WL <lmw.bobo@gmail.com> wrote:

> Hi Jonathan
> 
> Thanks for your reply!
> 
> On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > Probably not related to your problem, but there is a disconnect in QEMU /
> > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > Kernel assumes it isn't provide. Qemu assumes it is.
> >
> > The temporary solution is to throw in a second root port on the HB and not
> > connect anything to it.  Longer term I may special case this so that the particular
> > decoder defaults to pass through settings in QEMU if there is only one root port.
> >  
> 
> You are right! After adding an extra HB in qemu, I can create a x1
> region successfully.
> But have some errors in Nvdimm:
> 
> [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> [   74.927470] nd_region region0: nmem0: is disabled, failing probe

Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
down. Once I reach this state I can verify the HDM Decode is correct which is what
I've been using to test (Which wasn't true until earlier this week). 
I'm currently testing via devmem, more for historical reasons than because it makes
that much sense anymore.  

> 
> And x4 region still failed with same errors, using latest cxl/preview
> branch don't work.
> I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> 
> Bob


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-09 15:17     ` Dan Williams
@ 2022-08-11  3:10       ` Bobo WL
  2022-08-12  0:46       ` Dan Williams
  1 sibling, 0 replies; 38+ messages in thread
From: Bobo WL @ 2022-08-11  3:10 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, qemu-devel, qemu-arm

On Tue, Aug 9, 2022 at 11:17 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Bobo WL wrote:
> > Hi Dan,
> >
> > Thanks for your reply!
> >
> > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > What is the output of:
> > >
> > >     cxl list -MDTu -d decoder0.0
> > >
> > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > at least not in the specified order, or that validation check is broken.
> >
> > Command "cxl list -MDTu -d decoder0.0" output:
>
> Thanks for this, I think I know the problem, but will try some
> experiments with cxl_test first.
>
> Did the commit_store() crash stop reproducing with latest cxl/preview
> branch?

No, still hitting this bug if don't add extra HB device in qemu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-09 16:08       ` Jonathan Cameron via
@ 2022-08-11 17:08         ` Jonathan Cameron
  -1 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-11 17:08 UTC (permalink / raw)
  To: Bobo WL; +Cc: linux-cxl, qemu-devel, qemu-arm

On Tue, 9 Aug 2022 17:08:25 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Tue, 9 Aug 2022 21:07:06 +0800
> Bobo WL <lmw.bobo@gmail.com> wrote:
> 
> > Hi Jonathan
> > 
> > Thanks for your reply!
> > 
> > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:  
> > >
> > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > Kernel assumes it isn't provide. Qemu assumes it is.
> > >
> > > The temporary solution is to throw in a second root port on the HB and not
> > > connect anything to it.  Longer term I may special case this so that the particular
> > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > >    
> > 
> > You are right! After adding an extra HB in qemu, I can create a x1
> > region successfully.
> > But have some errors in Nvdimm:
> > 
> > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > [   74.927470] nd_region region0: nmem0: is disabled, failing probe  
> 
> Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> down. Once I reach this state I can verify the HDM Decode is correct which is what
> I've been using to test (Which wasn't true until earlier this week). 
> I'm currently testing via devmem, more for historical reasons than because it makes
> that much sense anymore.  

*embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
I'd forgotten that was still on the todo list. I don't think it will
be particularly hard to do and will take a look in next few days.

Very very indirectly this error is causing a driver probe fail that means that
we hit a code path that has a rather odd looking check on NDD_LABELING.
Should not have gotten near that path though - hence the problem is actually
when we call cxl_pmem_get_config_data() and it returns an error because
we haven't fully connected up the command in QEMU.

Jonathan


> 
> > 
> > And x4 region still failed with same errors, using latest cxl/preview
> > branch don't work.
> > I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> > 
> > Bob  



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-11 17:08         ` Jonathan Cameron
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-11 17:08 UTC (permalink / raw)
  To: Bobo WL; +Cc: linux-cxl, qemu-devel, qemu-arm

On Tue, 9 Aug 2022 17:08:25 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Tue, 9 Aug 2022 21:07:06 +0800
> Bobo WL <lmw.bobo@gmail.com> wrote:
> 
> > Hi Jonathan
> > 
> > Thanks for your reply!
> > 
> > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:  
> > >
> > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > Kernel assumes it isn't provide. Qemu assumes it is.
> > >
> > > The temporary solution is to throw in a second root port on the HB and not
> > > connect anything to it.  Longer term I may special case this so that the particular
> > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > >    
> > 
> > You are right! After adding an extra HB in qemu, I can create a x1
> > region successfully.
> > But have some errors in Nvdimm:
> > 
> > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > [   74.927470] nd_region region0: nmem0: is disabled, failing probe  
> 
> Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> down. Once I reach this state I can verify the HDM Decode is correct which is what
> I've been using to test (Which wasn't true until earlier this week). 
> I'm currently testing via devmem, more for historical reasons than because it makes
> that much sense anymore.  

*embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
I'd forgotten that was still on the todo list. I don't think it will
be particularly hard to do and will take a look in next few days.

Very very indirectly this error is causing a driver probe fail that means that
we hit a code path that has a rather odd looking check on NDD_LABELING.
Should not have gotten near that path though - hence the problem is actually
when we call cxl_pmem_get_config_data() and it returns an error because
we haven't fully connected up the command in QEMU.

Jonathan


> 
> > 
> > And x4 region still failed with same errors, using latest cxl/preview
> > branch don't work.
> > I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> > 
> > Bob  


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-09 15:17     ` Dan Williams
  2022-08-11  3:10       ` Bobo WL
@ 2022-08-12  0:46       ` Dan Williams
  2022-08-17 16:16           ` Jonathan Cameron via
  1 sibling, 1 reply; 38+ messages in thread
From: Dan Williams @ 2022-08-12  0:46 UTC (permalink / raw)
  To: Dan Williams, Bobo WL; +Cc: linux-cxl, qemu-devel, qemu-arm

Dan Williams wrote:
> Bobo WL wrote:
> > Hi Dan,
> > 
> > Thanks for your reply!
> > 
> > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > What is the output of:
> > >
> > >     cxl list -MDTu -d decoder0.0
> > >
> > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > at least not in the specified order, or that validation check is broken.
> > 
> > Command "cxl list -MDTu -d decoder0.0" output:
> 
> Thanks for this, I think I know the problem, but will try some
> experiments with cxl_test first.

Hmm, so my cxl_test experiment unfortunately passed so I'm not
reproducing the failure mode. This is the result of creating x4 region
with devices directly attached to a single host-bridge:

# cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
{
  "region":"region8",
  "resource":"0xf1f0000000",
  "size":"1024.00 MiB (1073.74 MB)",
  "interleave_ways":4,
  "interleave_granularity":256,
  "decode_state":"commit",
  "mappings":[
    {
      "position":3,
      "memdev":"mem11",
      "decoder":"decoder21.0"
    },
    {
      "position":2,
      "memdev":"mem9",
      "decoder":"decoder19.0"
    },
    {
      "position":1,
      "memdev":"mem10",
      "decoder":"decoder20.0"
    },
    {
      "position":0,
      "memdev":"mem12",
      "decoder":"decoder22.0"
    }
  ]
}
cxl region: cmd_create_region: created 1 region

> Did the commit_store() crash stop reproducing with latest cxl/preview
> branch?

I missed the answer to this question.

All of these changes are now in Linus' tree perhaps give that a try and
post the debug log again?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-11 17:08         ` Jonathan Cameron
@ 2022-08-12 15:44           ` Jonathan Cameron via
  -1 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-12 15:44 UTC (permalink / raw)
  To: Jonathan Cameron via; +Cc: Jonathan Cameron, Bobo WL, linux-cxl, qemu-arm

On Thu, 11 Aug 2022 18:08:57 +0100
Jonathan Cameron via <qemu-devel@nongnu.org> wrote:

> On Tue, 9 Aug 2022 17:08:25 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Tue, 9 Aug 2022 21:07:06 +0800
> > Bobo WL <lmw.bobo@gmail.com> wrote:
> >   
> > > Hi Jonathan
> > > 
> > > Thanks for your reply!
> > > 
> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > <Jonathan.Cameron@huawei.com> wrote:    
> > > >
> > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > >
> > > > The temporary solution is to throw in a second root port on the HB and not
> > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > >      
> > > 
> > > You are right! After adding an extra HB in qemu, I can create a x1
> > > region successfully.
> > > But have some errors in Nvdimm:
> > > 
> > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe    
> > 
> > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > I've been using to test (Which wasn't true until earlier this week). 
> > I'm currently testing via devmem, more for historical reasons than because it makes
> > that much sense anymore.    
> 
> *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> I'd forgotten that was still on the todo list. I don't think it will
> be particularly hard to do and will take a look in next few days.
> 
> Very very indirectly this error is causing a driver probe fail that means that
> we hit a code path that has a rather odd looking check on NDD_LABELING.
> Should not have gotten near that path though - hence the problem is actually
> when we call cxl_pmem_get_config_data() and it returns an error because
> we haven't fully connected up the command in QEMU.

So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
We just need to relax condition on the supplied length.

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index c352a935c4..fdda9529fe 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
     cxl_cmd = &cxl_cmd_set[set][cmd];
     h = cxl_cmd->handler;
     if (h) {
-        if (len == cxl_cmd->in) {
+        if (len == cxl_cmd->in || !cxl_cmd->in) {
             cxl_cmd->payload = cxl_dstate->mbox_reg_state +
                 A_CXL_DEV_CMD_PAYLOAD;
             ret = (*h)(cxl_cmd, cxl_dstate, &len);


This lets the nvdimm/region probe fine, but I'm getting some issues with
namespace capacity so I'll look at what is causing that next.
Unfortunately I'm not that familiar with the driver/nvdimm side of things
so it's take a while to figure out what kicks off what!

Jonathan

> 
> Jonathan
> 
> 
> >   
> > > 
> > > And x4 region still failed with same errors, using latest cxl/preview
> > > branch don't work.
> > > I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> > > 
> > > Bob    
> 
> 


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-12 15:44           ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-12 15:44 UTC (permalink / raw)
  To: Jonathan Cameron via; +Cc: Jonathan Cameron, Bobo WL, linux-cxl, qemu-arm

On Thu, 11 Aug 2022 18:08:57 +0100
Jonathan Cameron via <qemu-devel@nongnu.org> wrote:

> On Tue, 9 Aug 2022 17:08:25 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Tue, 9 Aug 2022 21:07:06 +0800
> > Bobo WL <lmw.bobo@gmail.com> wrote:
> >   
> > > Hi Jonathan
> > > 
> > > Thanks for your reply!
> > > 
> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > <Jonathan.Cameron@huawei.com> wrote:    
> > > >
> > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > >
> > > > The temporary solution is to throw in a second root port on the HB and not
> > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > >      
> > > 
> > > You are right! After adding an extra HB in qemu, I can create a x1
> > > region successfully.
> > > But have some errors in Nvdimm:
> > > 
> > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe    
> > 
> > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > I've been using to test (Which wasn't true until earlier this week). 
> > I'm currently testing via devmem, more for historical reasons than because it makes
> > that much sense anymore.    
> 
> *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> I'd forgotten that was still on the todo list. I don't think it will
> be particularly hard to do and will take a look in next few days.
> 
> Very very indirectly this error is causing a driver probe fail that means that
> we hit a code path that has a rather odd looking check on NDD_LABELING.
> Should not have gotten near that path though - hence the problem is actually
> when we call cxl_pmem_get_config_data() and it returns an error because
> we haven't fully connected up the command in QEMU.

So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
We just need to relax condition on the supplied length.

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index c352a935c4..fdda9529fe 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
     cxl_cmd = &cxl_cmd_set[set][cmd];
     h = cxl_cmd->handler;
     if (h) {
-        if (len == cxl_cmd->in) {
+        if (len == cxl_cmd->in || !cxl_cmd->in) {
             cxl_cmd->payload = cxl_dstate->mbox_reg_state +
                 A_CXL_DEV_CMD_PAYLOAD;
             ret = (*h)(cxl_cmd, cxl_dstate, &len);


This lets the nvdimm/region probe fine, but I'm getting some issues with
namespace capacity so I'll look at what is causing that next.
Unfortunately I'm not that familiar with the driver/nvdimm side of things
so it's take a while to figure out what kicks off what!

Jonathan

> 
> Jonathan
> 
> 
> >   
> > > 
> > > And x4 region still failed with same errors, using latest cxl/preview
> > > branch don't work.
> > > I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> > > 
> > > Bob    
> 
> 



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-12 15:44           ` Jonathan Cameron via
  (?)
@ 2022-08-12 16:03           ` Dan Williams
  2022-08-12 16:15               ` Jonathan Cameron via
  -1 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2022-08-12 16:03 UTC (permalink / raw)
  To: Jonathan Cameron, Jonathan Cameron via
  Cc: Jonathan Cameron, Bobo WL, linux-cxl, qemu-arm

Jonathan Cameron wrote:
> On Thu, 11 Aug 2022 18:08:57 +0100
> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> 
> > On Tue, 9 Aug 2022 17:08:25 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > 
> > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > >   
> > > > Hi Jonathan
> > > > 
> > > > Thanks for your reply!
> > > > 
> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > <Jonathan.Cameron@huawei.com> wrote:    
> > > > >
> > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > >
> > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > >      
> > > > 
> > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > region successfully.
> > > > But have some errors in Nvdimm:
> > > > 
> > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe    
> > > 
> > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > I've been using to test (Which wasn't true until earlier this week). 
> > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > that much sense anymore.    
> > 
> > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > I'd forgotten that was still on the todo list. I don't think it will
> > be particularly hard to do and will take a look in next few days.
> > 
> > Very very indirectly this error is causing a driver probe fail that means that
> > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > Should not have gotten near that path though - hence the problem is actually
> > when we call cxl_pmem_get_config_data() and it returns an error because
> > we haven't fully connected up the command in QEMU.
> 
> So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> We just need to relax condition on the supplied length.
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index c352a935c4..fdda9529fe 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
>      cxl_cmd = &cxl_cmd_set[set][cmd];
>      h = cxl_cmd->handler;
>      if (h) {
> -        if (len == cxl_cmd->in) {
> +        if (len == cxl_cmd->in || !cxl_cmd->in) {
>              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
>                  A_CXL_DEV_CMD_PAYLOAD;
>              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> 
> 
> This lets the nvdimm/region probe fine, but I'm getting some issues with
> namespace capacity so I'll look at what is causing that next.
> Unfortunately I'm not that familiar with the driver/nvdimm side of things
> so it's take a while to figure out what kicks off what!

The whirlwind tour is that 'struct nd_region' instances that represent a
persitent memory address range are composed of one more mappings of
'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
the dimm (if locked) and interrogating the label area to look for
namespace labels.

The label command calls are routed to the '->ndctl()' callback that was
registered when the CXL nvdimm_bus_descriptor was created. That callback
handles both 'bus' scope calls, currently none for CXL, and per nvdimm
calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
to CXL commands.

The 'struct nvdimm' objects that the CXL side registers have the
NDD_LABELING flag set which means that namespaces need to be explicitly
created / provisioned from region capacity. Otherwise, if
drivers/nvdimm/dimm.c does not find a namespace-label-index block then
the region reverts to label-less mode and a default namespace equal to
the size of the region is instantiated.

If you are seeing small mismatches in namespace capacity then it may
just be the fact that by default 'ndctl create-namespace' results in an
'fsdax' mode namespace which just means that it is a block device where
1.5% of the capacity is reserved for 'struct page' metadata. You should
be able to see namespace capacity == region capacity by doing "ndctl
create-namespace -m raw", and disable DAX operation.

Hope that helps.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-12 16:03           ` Dan Williams
@ 2022-08-12 16:15               ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-12 16:15 UTC (permalink / raw)
  To: Dan Williams; +Cc: Jonathan Cameron via, Bobo WL, linux-cxl, qemu-arm

On Fri, 12 Aug 2022 09:03:02 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Thu, 11 Aug 2022 18:08:57 +0100
> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> >   
> > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > >   
> > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > >     
> > > > > Hi Jonathan
> > > > > 
> > > > > Thanks for your reply!
> > > > > 
> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > <Jonathan.Cameron@huawei.com> wrote:      
> > > > > >
> > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > >
> > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > >        
> > > > > 
> > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > region successfully.
> > > > > But have some errors in Nvdimm:
> > > > > 
> > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe      
> > > > 
> > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > that much sense anymore.      
> > > 
> > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > I'd forgotten that was still on the todo list. I don't think it will
> > > be particularly hard to do and will take a look in next few days.
> > > 
> > > Very very indirectly this error is causing a driver probe fail that means that
> > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > Should not have gotten near that path though - hence the problem is actually
> > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > we haven't fully connected up the command in QEMU.  
> > 
> > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > We just need to relax condition on the supplied length.
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index c352a935c4..fdda9529fe 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> >      cxl_cmd = &cxl_cmd_set[set][cmd];
> >      h = cxl_cmd->handler;
> >      if (h) {
> > -        if (len == cxl_cmd->in) {
> > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
> >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> >                  A_CXL_DEV_CMD_PAYLOAD;
> >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> > 
> > 
> > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > namespace capacity so I'll look at what is causing that next.
> > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > so it's take a while to figure out what kicks off what!  
> 
> The whirlwind tour is that 'struct nd_region' instances that represent a
> persitent memory address range are composed of one more mappings of
> 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
> in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
> the dimm (if locked) and interrogating the label area to look for
> namespace labels.
> 
> The label command calls are routed to the '->ndctl()' callback that was
> registered when the CXL nvdimm_bus_descriptor was created. That callback
> handles both 'bus' scope calls, currently none for CXL, and per nvdimm
> calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
> to CXL commands.
> 
> The 'struct nvdimm' objects that the CXL side registers have the
> NDD_LABELING flag set which means that namespaces need to be explicitly
> created / provisioned from region capacity. Otherwise, if
> drivers/nvdimm/dimm.c does not find a namespace-label-index block then
> the region reverts to label-less mode and a default namespace equal to
> the size of the region is instantiated.
> 
> If you are seeing small mismatches in namespace capacity then it may
> just be the fact that by default 'ndctl create-namespace' results in an
> 'fsdax' mode namespace which just means that it is a block device where
> 1.5% of the capacity is reserved for 'struct page' metadata. You should
> be able to see namespace capacity == region capacity by doing "ndctl
> create-namespace -m raw", and disable DAX operation.

Currently ndctl create-namespace crashes qemu ;)
Which isn't ideal!

> 
> Hope that helps.
Got me looking at the right code. Thanks!

Jonathan



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-12 16:15               ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-12 16:15 UTC (permalink / raw)
  To: Dan Williams; +Cc: Jonathan Cameron via, Bobo WL, linux-cxl, qemu-arm

On Fri, 12 Aug 2022 09:03:02 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Thu, 11 Aug 2022 18:08:57 +0100
> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> >   
> > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > >   
> > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > >     
> > > > > Hi Jonathan
> > > > > 
> > > > > Thanks for your reply!
> > > > > 
> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > <Jonathan.Cameron@huawei.com> wrote:      
> > > > > >
> > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > >
> > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > >        
> > > > > 
> > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > region successfully.
> > > > > But have some errors in Nvdimm:
> > > > > 
> > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe      
> > > > 
> > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > that much sense anymore.      
> > > 
> > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > I'd forgotten that was still on the todo list. I don't think it will
> > > be particularly hard to do and will take a look in next few days.
> > > 
> > > Very very indirectly this error is causing a driver probe fail that means that
> > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > Should not have gotten near that path though - hence the problem is actually
> > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > we haven't fully connected up the command in QEMU.  
> > 
> > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > We just need to relax condition on the supplied length.
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index c352a935c4..fdda9529fe 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> >      cxl_cmd = &cxl_cmd_set[set][cmd];
> >      h = cxl_cmd->handler;
> >      if (h) {
> > -        if (len == cxl_cmd->in) {
> > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
> >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> >                  A_CXL_DEV_CMD_PAYLOAD;
> >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> > 
> > 
> > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > namespace capacity so I'll look at what is causing that next.
> > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > so it's take a while to figure out what kicks off what!  
> 
> The whirlwind tour is that 'struct nd_region' instances that represent a
> persitent memory address range are composed of one more mappings of
> 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
> in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
> the dimm (if locked) and interrogating the label area to look for
> namespace labels.
> 
> The label command calls are routed to the '->ndctl()' callback that was
> registered when the CXL nvdimm_bus_descriptor was created. That callback
> handles both 'bus' scope calls, currently none for CXL, and per nvdimm
> calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
> to CXL commands.
> 
> The 'struct nvdimm' objects that the CXL side registers have the
> NDD_LABELING flag set which means that namespaces need to be explicitly
> created / provisioned from region capacity. Otherwise, if
> drivers/nvdimm/dimm.c does not find a namespace-label-index block then
> the region reverts to label-less mode and a default namespace equal to
> the size of the region is instantiated.
> 
> If you are seeing small mismatches in namespace capacity then it may
> just be the fact that by default 'ndctl create-namespace' results in an
> 'fsdax' mode namespace which just means that it is a block device where
> 1.5% of the capacity is reserved for 'struct page' metadata. You should
> be able to see namespace capacity == region capacity by doing "ndctl
> create-namespace -m raw", and disable DAX operation.

Currently ndctl create-namespace crashes qemu ;)
Which isn't ideal!

> 
> Hope that helps.
Got me looking at the right code. Thanks!

Jonathan




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-12 16:15               ` Jonathan Cameron via
@ 2022-08-15 14:18                 ` Jonathan Cameron via
  -1 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-15 14:18 UTC (permalink / raw)
  To: Dan Williams; +Cc: Jonathan Cameron via, Bobo WL, linux-cxl, qemu-arm

On Fri, 12 Aug 2022 17:15:09 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Fri, 12 Aug 2022 09:03:02 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Jonathan Cameron wrote:  
> > > On Thu, 11 Aug 2022 18:08:57 +0100
> > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> > >     
> > > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > >     
> > > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > > >       
> > > > > > Hi Jonathan
> > > > > > 
> > > > > > Thanks for your reply!
> > > > > > 
> > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > > <Jonathan.Cameron@huawei.com> wrote:        
> > > > > > >
> > > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > > >
> > > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > > >          
> > > > > > 
> > > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > > region successfully.
> > > > > > But have some errors in Nvdimm:
> > > > > > 
> > > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe        
> > > > > 
> > > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > > that much sense anymore.        
> > > > 
> > > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > > I'd forgotten that was still on the todo list. I don't think it will
> > > > be particularly hard to do and will take a look in next few days.
> > > > 
> > > > Very very indirectly this error is causing a driver probe fail that means that
> > > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > > Should not have gotten near that path though - hence the problem is actually
> > > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > > we haven't fully connected up the command in QEMU.    
> > > 
> > > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > > We just need to relax condition on the supplied length.
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index c352a935c4..fdda9529fe 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> > >      cxl_cmd = &cxl_cmd_set[set][cmd];
> > >      h = cxl_cmd->handler;
> > >      if (h) {
> > > -        if (len == cxl_cmd->in) {
> > > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
> > >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> > >                  A_CXL_DEV_CMD_PAYLOAD;
> > >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> > > 
> > > 
> > > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > > namespace capacity so I'll look at what is causing that next.
> > > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > > so it's take a while to figure out what kicks off what!    
> > 
> > The whirlwind tour is that 'struct nd_region' instances that represent a
> > persitent memory address range are composed of one more mappings of
> > 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
> > in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
> > the dimm (if locked) and interrogating the label area to look for
> > namespace labels.
> > 
> > The label command calls are routed to the '->ndctl()' callback that was
> > registered when the CXL nvdimm_bus_descriptor was created. That callback
> > handles both 'bus' scope calls, currently none for CXL, and per nvdimm
> > calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
> > to CXL commands.
> > 
> > The 'struct nvdimm' objects that the CXL side registers have the
> > NDD_LABELING flag set which means that namespaces need to be explicitly
> > created / provisioned from region capacity. Otherwise, if
> > drivers/nvdimm/dimm.c does not find a namespace-label-index block then
> > the region reverts to label-less mode and a default namespace equal to
> > the size of the region is instantiated.
> > 
> > If you are seeing small mismatches in namespace capacity then it may
> > just be the fact that by default 'ndctl create-namespace' results in an
> > 'fsdax' mode namespace which just means that it is a block device where
> > 1.5% of the capacity is reserved for 'struct page' metadata. You should
> > be able to see namespace capacity == region capacity by doing "ndctl
> > create-namespace -m raw", and disable DAX operation.  
> 
> Currently ndctl create-namespace crashes qemu ;)
> Which isn't ideal!
> 

Found a cause for this one.  Mailbox payload may be as small as 256 bytes.
We have code in kernel sanity checking that output payload fits in the
mailbox, but nothing on the input payload.  Symptom is that we write just
off the end whatever size the payload is.  Note doing this shouldn't crash
qemu - so I need to fix a range check somewhere.

I think this is because cxl_pmem_get_config_size() returns the mailbox
payload size as being the available LSA size, forgetting to remove the
size of the headers on the set_lsa side of things.
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110

I've hacked the max_payload to be -8

Now we still don't succeed in creating the namespace, but bonus is it doesn't crash any more.


Jonathan



> > 
> > Hope that helps.  
> Got me looking at the right code. Thanks!
> 
> Jonathan
> 
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-15 14:18                 ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-15 14:18 UTC (permalink / raw)
  To: Dan Williams; +Cc: Jonathan Cameron via, Bobo WL, linux-cxl, qemu-arm

On Fri, 12 Aug 2022 17:15:09 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Fri, 12 Aug 2022 09:03:02 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Jonathan Cameron wrote:  
> > > On Thu, 11 Aug 2022 18:08:57 +0100
> > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> > >     
> > > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > >     
> > > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > > >       
> > > > > > Hi Jonathan
> > > > > > 
> > > > > > Thanks for your reply!
> > > > > > 
> > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > > <Jonathan.Cameron@huawei.com> wrote:        
> > > > > > >
> > > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > > >
> > > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > > >          
> > > > > > 
> > > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > > region successfully.
> > > > > > But have some errors in Nvdimm:
> > > > > > 
> > > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe        
> > > > > 
> > > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > > that much sense anymore.        
> > > > 
> > > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > > I'd forgotten that was still on the todo list. I don't think it will
> > > > be particularly hard to do and will take a look in next few days.
> > > > 
> > > > Very very indirectly this error is causing a driver probe fail that means that
> > > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > > Should not have gotten near that path though - hence the problem is actually
> > > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > > we haven't fully connected up the command in QEMU.    
> > > 
> > > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > > We just need to relax condition on the supplied length.
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index c352a935c4..fdda9529fe 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> > >      cxl_cmd = &cxl_cmd_set[set][cmd];
> > >      h = cxl_cmd->handler;
> > >      if (h) {
> > > -        if (len == cxl_cmd->in) {
> > > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
> > >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> > >                  A_CXL_DEV_CMD_PAYLOAD;
> > >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> > > 
> > > 
> > > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > > namespace capacity so I'll look at what is causing that next.
> > > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > > so it's take a while to figure out what kicks off what!    
> > 
> > The whirlwind tour is that 'struct nd_region' instances that represent a
> > persitent memory address range are composed of one more mappings of
> > 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
> > in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
> > the dimm (if locked) and interrogating the label area to look for
> > namespace labels.
> > 
> > The label command calls are routed to the '->ndctl()' callback that was
> > registered when the CXL nvdimm_bus_descriptor was created. That callback
> > handles both 'bus' scope calls, currently none for CXL, and per nvdimm
> > calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
> > to CXL commands.
> > 
> > The 'struct nvdimm' objects that the CXL side registers have the
> > NDD_LABELING flag set which means that namespaces need to be explicitly
> > created / provisioned from region capacity. Otherwise, if
> > drivers/nvdimm/dimm.c does not find a namespace-label-index block then
> > the region reverts to label-less mode and a default namespace equal to
> > the size of the region is instantiated.
> > 
> > If you are seeing small mismatches in namespace capacity then it may
> > just be the fact that by default 'ndctl create-namespace' results in an
> > 'fsdax' mode namespace which just means that it is a block device where
> > 1.5% of the capacity is reserved for 'struct page' metadata. You should
> > be able to see namespace capacity == region capacity by doing "ndctl
> > create-namespace -m raw", and disable DAX operation.  
> 
> Currently ndctl create-namespace crashes qemu ;)
> Which isn't ideal!
> 

Found a cause for this one.  Mailbox payload may be as small as 256 bytes.
We have code in kernel sanity checking that output payload fits in the
mailbox, but nothing on the input payload.  Symptom is that we write just
off the end whatever size the payload is.  Note doing this shouldn't crash
qemu - so I need to fix a range check somewhere.

I think this is because cxl_pmem_get_config_size() returns the mailbox
payload size as being the available LSA size, forgetting to remove the
size of the headers on the set_lsa side of things.
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110

I've hacked the max_payload to be -8

Now we still don't succeed in creating the namespace, but bonus is it doesn't crash any more.


Jonathan



> > 
> > Hope that helps.  
> Got me looking at the right code. Thanks!
> 
> Jonathan
> 
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-15 14:18                 ` Jonathan Cameron via
@ 2022-08-15 14:55                   ` Jonathan Cameron via
  -1 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-15 14:55 UTC (permalink / raw)
  To: Jonathan Cameron via
  Cc: Jonathan Cameron, Dan Williams, Bobo WL, linux-cxl, qemu-arm

On Mon, 15 Aug 2022 15:18:09 +0100
Jonathan Cameron via <qemu-devel@nongnu.org> wrote:

> On Fri, 12 Aug 2022 17:15:09 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Fri, 12 Aug 2022 09:03:02 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > Jonathan Cameron wrote:    
> > > > On Thu, 11 Aug 2022 18:08:57 +0100
> > > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> > > >       
> > > > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > > >       
> > > > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > > > >         
> > > > > > > Hi Jonathan
> > > > > > > 
> > > > > > > Thanks for your reply!
> > > > > > > 
> > > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > > > <Jonathan.Cameron@huawei.com> wrote:          
> > > > > > > >
> > > > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > > > >
> > > > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > > > >            
> > > > > > > 
> > > > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > > > region successfully.
> > > > > > > But have some errors in Nvdimm:
> > > > > > > 
> > > > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe          
> > > > > > 
> > > > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > > > that much sense anymore.          
> > > > > 
> > > > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > > > I'd forgotten that was still on the todo list. I don't think it will
> > > > > be particularly hard to do and will take a look in next few days.
> > > > > 
> > > > > Very very indirectly this error is causing a driver probe fail that means that
> > > > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > > > Should not have gotten near that path though - hence the problem is actually
> > > > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > > > we haven't fully connected up the command in QEMU.      
> > > > 
> > > > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > > > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > > > We just need to relax condition on the supplied length.
> > > > 
> > > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > > index c352a935c4..fdda9529fe 100644
> > > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> > > >      cxl_cmd = &cxl_cmd_set[set][cmd];
> > > >      h = cxl_cmd->handler;
> > > >      if (h) {
> > > > -        if (len == cxl_cmd->in) {
> > > > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
> > > >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> > > >                  A_CXL_DEV_CMD_PAYLOAD;
> > > >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> > > > 
> > > > 
> > > > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > > > namespace capacity so I'll look at what is causing that next.
> > > > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > > > so it's take a while to figure out what kicks off what!      
> > > 
> > > The whirlwind tour is that 'struct nd_region' instances that represent a
> > > persitent memory address range are composed of one more mappings of
> > > 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
> > > in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
> > > the dimm (if locked) and interrogating the label area to look for
> > > namespace labels.
> > > 
> > > The label command calls are routed to the '->ndctl()' callback that was
> > > registered when the CXL nvdimm_bus_descriptor was created. That callback
> > > handles both 'bus' scope calls, currently none for CXL, and per nvdimm
> > > calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
> > > to CXL commands.
> > > 
> > > The 'struct nvdimm' objects that the CXL side registers have the
> > > NDD_LABELING flag set which means that namespaces need to be explicitly
> > > created / provisioned from region capacity. Otherwise, if
> > > drivers/nvdimm/dimm.c does not find a namespace-label-index block then
> > > the region reverts to label-less mode and a default namespace equal to
> > > the size of the region is instantiated.
> > > 
> > > If you are seeing small mismatches in namespace capacity then it may
> > > just be the fact that by default 'ndctl create-namespace' results in an
> > > 'fsdax' mode namespace which just means that it is a block device where
> > > 1.5% of the capacity is reserved for 'struct page' metadata. You should
> > > be able to see namespace capacity == region capacity by doing "ndctl
> > > create-namespace -m raw", and disable DAX operation.    
> > 
> > Currently ndctl create-namespace crashes qemu ;)
> > Which isn't ideal!
> >   
> 
> Found a cause for this one.  Mailbox payload may be as small as 256 bytes.
> We have code in kernel sanity checking that output payload fits in the
> mailbox, but nothing on the input payload.  Symptom is that we write just
> off the end whatever size the payload is.  Note doing this shouldn't crash
> qemu - so I need to fix a range check somewhere.
> 
> I think this is because cxl_pmem_get_config_size() returns the mailbox
> payload size as being the available LSA size, forgetting to remove the
> size of the headers on the set_lsa side of things.
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110
> 
> I've hacked the max_payload to be -8
> 
> Now we still don't succeed in creating the namespace, but bonus is it doesn't crash any more.

In the interests of defensive / correct handling from QEMU I took a
look into why it was crashing.  Turns out that providing a NULL write callback for
the memory device region (that the above overlarge write was spilling into) isn't
a safe thing to do.  Needs a stub. Oops.

On plus side we might never have noticed this was going wrong without the crash
*silver lining in every cloud*

Fix to follow...

Jonathan


> 
> 
> Jonathan
> 
> 
> 
> > > 
> > > Hope that helps.    
> > Got me looking at the right code. Thanks!
> > 
> > Jonathan
> > 
> >   
> 
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-15 14:55                   ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-15 14:55 UTC (permalink / raw)
  To: Jonathan Cameron via
  Cc: Jonathan Cameron, Dan Williams, Bobo WL, linux-cxl, qemu-arm

On Mon, 15 Aug 2022 15:18:09 +0100
Jonathan Cameron via <qemu-devel@nongnu.org> wrote:

> On Fri, 12 Aug 2022 17:15:09 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Fri, 12 Aug 2022 09:03:02 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > Jonathan Cameron wrote:    
> > > > On Thu, 11 Aug 2022 18:08:57 +0100
> > > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> > > >       
> > > > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > > >       
> > > > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > > > >         
> > > > > > > Hi Jonathan
> > > > > > > 
> > > > > > > Thanks for your reply!
> > > > > > > 
> > > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > > > <Jonathan.Cameron@huawei.com> wrote:          
> > > > > > > >
> > > > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > > > >
> > > > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > > > >            
> > > > > > > 
> > > > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > > > region successfully.
> > > > > > > But have some errors in Nvdimm:
> > > > > > > 
> > > > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe          
> > > > > > 
> > > > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > > > that much sense anymore.          
> > > > > 
> > > > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > > > I'd forgotten that was still on the todo list. I don't think it will
> > > > > be particularly hard to do and will take a look in next few days.
> > > > > 
> > > > > Very very indirectly this error is causing a driver probe fail that means that
> > > > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > > > Should not have gotten near that path though - hence the problem is actually
> > > > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > > > we haven't fully connected up the command in QEMU.      
> > > > 
> > > > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > > > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > > > We just need to relax condition on the supplied length.
> > > > 
> > > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > > index c352a935c4..fdda9529fe 100644
> > > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> > > >      cxl_cmd = &cxl_cmd_set[set][cmd];
> > > >      h = cxl_cmd->handler;
> > > >      if (h) {
> > > > -        if (len == cxl_cmd->in) {
> > > > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
> > > >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> > > >                  A_CXL_DEV_CMD_PAYLOAD;
> > > >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> > > > 
> > > > 
> > > > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > > > namespace capacity so I'll look at what is causing that next.
> > > > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > > > so it's take a while to figure out what kicks off what!      
> > > 
> > > The whirlwind tour is that 'struct nd_region' instances that represent a
> > > persitent memory address range are composed of one more mappings of
> > > 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
> > > in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
> > > the dimm (if locked) and interrogating the label area to look for
> > > namespace labels.
> > > 
> > > The label command calls are routed to the '->ndctl()' callback that was
> > > registered when the CXL nvdimm_bus_descriptor was created. That callback
> > > handles both 'bus' scope calls, currently none for CXL, and per nvdimm
> > > calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
> > > to CXL commands.
> > > 
> > > The 'struct nvdimm' objects that the CXL side registers have the
> > > NDD_LABELING flag set which means that namespaces need to be explicitly
> > > created / provisioned from region capacity. Otherwise, if
> > > drivers/nvdimm/dimm.c does not find a namespace-label-index block then
> > > the region reverts to label-less mode and a default namespace equal to
> > > the size of the region is instantiated.
> > > 
> > > If you are seeing small mismatches in namespace capacity then it may
> > > just be the fact that by default 'ndctl create-namespace' results in an
> > > 'fsdax' mode namespace which just means that it is a block device where
> > > 1.5% of the capacity is reserved for 'struct page' metadata. You should
> > > be able to see namespace capacity == region capacity by doing "ndctl
> > > create-namespace -m raw", and disable DAX operation.    
> > 
> > Currently ndctl create-namespace crashes qemu ;)
> > Which isn't ideal!
> >   
> 
> Found a cause for this one.  Mailbox payload may be as small as 256 bytes.
> We have code in kernel sanity checking that output payload fits in the
> mailbox, but nothing on the input payload.  Symptom is that we write just
> off the end whatever size the payload is.  Note doing this shouldn't crash
> qemu - so I need to fix a range check somewhere.
> 
> I think this is because cxl_pmem_get_config_size() returns the mailbox
> payload size as being the available LSA size, forgetting to remove the
> size of the headers on the set_lsa side of things.
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110
> 
> I've hacked the max_payload to be -8
> 
> Now we still don't succeed in creating the namespace, but bonus is it doesn't crash any more.

In the interests of defensive / correct handling from QEMU I took a
look into why it was crashing.  Turns out that providing a NULL write callback for
the memory device region (that the above overlarge write was spilling into) isn't
a safe thing to do.  Needs a stub. Oops.

On plus side we might never have noticed this was going wrong without the crash
*silver lining in every cloud*

Fix to follow...

Jonathan


> 
> 
> Jonathan
> 
> 
> 
> > > 
> > > Hope that helps.    
> > Got me looking at the right code. Thanks!
> > 
> > Jonathan
> > 
> >   
> 
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-15 14:55                   ` Jonathan Cameron via
  (?)
@ 2022-08-15 15:07                   ` Peter Maydell
  -1 siblings, 0 replies; 38+ messages in thread
From: Peter Maydell @ 2022-08-15 15:07 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Jonathan Cameron via, Dan Williams, Bobo WL, linux-cxl, qemu-arm

On Mon, 15 Aug 2022 at 15:55, Jonathan Cameron via <qemu-arm@nongnu.org> wrote:
> In the interests of defensive / correct handling from QEMU I took a
> look into why it was crashing.  Turns out that providing a NULL write callback for
> the memory device region (that the above overlarge write was spilling into) isn't
> a safe thing to do.  Needs a stub. Oops.

Yeah. We've talked before about adding an assert so that that kind of
"missing function" bug is caught at device creation rather than only
if the guest tries to access the device, but we never quite got around
to it...

-- PMM

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-12 15:44           ` Jonathan Cameron via
@ 2022-08-15 17:04             ` Jonathan Cameron via
  -1 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-15 17:04 UTC (permalink / raw)
  To: Jonathan Cameron via; +Cc: Bobo WL, linux-cxl, qemu-arm

On Fri, 12 Aug 2022 16:44:03 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Thu, 11 Aug 2022 18:08:57 +0100
> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> 
> > On Tue, 9 Aug 2022 17:08:25 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >   
> > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > >     
> > > > Hi Jonathan
> > > > 
> > > > Thanks for your reply!
> > > > 
> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > <Jonathan.Cameron@huawei.com> wrote:      
> > > > >
> > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > >
> > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > >        
> > > > 
> > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > region successfully.
> > > > But have some errors in Nvdimm:
> > > > 
> > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe      
> > > 
> > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > I've been using to test (Which wasn't true until earlier this week). 
> > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > that much sense anymore.      
> > 
> > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > I'd forgotten that was still on the todo list. I don't think it will
> > be particularly hard to do and will take a look in next few days.
> > 
> > Very very indirectly this error is causing a driver probe fail that means that
> > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > Should not have gotten near that path though - hence the problem is actually
> > when we call cxl_pmem_get_config_data() and it returns an error because
> > we haven't fully connected up the command in QEMU.  
> 
> So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> We just need to relax condition on the supplied length.
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index c352a935c4..fdda9529fe 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
>      cxl_cmd = &cxl_cmd_set[set][cmd];
>      h = cxl_cmd->handler;
>      if (h) {
> -        if (len == cxl_cmd->in) {
> +        if (len == cxl_cmd->in || !cxl_cmd->in) {
Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.

With that fixed we hit new fun paths - after some errors we get the
worrying - not totally sure but looks like a failure on an error cleanup.
I'll chase down the error source, but even then this is probably triggerable by
hardware problem or similar.  Some bonus prints in here from me chasing
error paths, but it's otherwise just cxl/next + the fix I posted earlier today.

[   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
[   69.920108] nd_region_probe
[   69.920623] ------------[ cut here ]------------
[   69.920675] refcount_t: addition on 0; use-after-free.
[   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
[   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi cxl_core
[   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
[   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[   69.931482] Workqueue: events_unbound async_run_entry_fn
[   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   69.934023] pc : refcount_warn_saturate+0xa0/0x144
[   69.935161] lr : refcount_warn_saturate+0xa0/0x144
[   69.936541] sp : ffff80000890b960
[   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000
[   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000
[   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800
[   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff
[   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000
[   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375
[   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288
[   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e
[   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000
root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
[   69.957098] Call trace:
[   69.957959]  refcount_warn_saturate+0xa0/0x144
[   69.958773]  get_ndd+0x5c/0x80
[   69.959294]  nd_region_register_namespaces+0xe4/0xe90
[   69.960253]  nd_region_probe+0x100/0x290
[   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
[   69.962087]  really_probe+0x19c/0x3f0
[   69.962620]  __driver_probe_device+0x11c/0x190
[   69.963258]  driver_probe_device+0x44/0xf4
[   69.963773]  __device_attach_driver+0xa4/0x140
[   69.964471]  bus_for_each_drv+0x84/0xe0
[   69.965068]  __device_attach+0xb0/0x1f0
[   69.966101]  device_initial_probe+0x20/0x30
[   69.967142]  bus_probe_device+0xa4/0xb0
[   69.968104]  device_add+0x3e8/0x910
[   69.969111]  nd_async_device_register+0x24/0x74
[   69.969928]  async_run_entry_fn+0x40/0x150
[   69.970725]  process_one_work+0x1dc/0x450
[   69.971796]  worker_thread+0x154/0x450
[   69.972700]  kthread+0x118/0x120
[   69.974141]  ret_from_fork+0x10/0x20
[   69.975141] ---[ end trace 0000000000000000 ]---
[   70.117887] Into nd_namespace_pmem_set_resource()

>              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
>                  A_CXL_DEV_CMD_PAYLOAD;
>              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> 
> 
> This lets the nvdimm/region probe fine, but I'm getting some issues with
> namespace capacity so I'll look at what is causing that next.
> Unfortunately I'm not that familiar with the driver/nvdimm side of things
> so it's take a while to figure out what kicks off what!
> 
> Jonathan
> 
> > 
> > Jonathan
> > 
> >   
> > >     
> > > > 
> > > > And x4 region still failed with same errors, using latest cxl/preview
> > > > branch don't work.
> > > > I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> > > > 
> > > > Bob      
> > 
> >   
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-15 17:04             ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-15 17:04 UTC (permalink / raw)
  To: Jonathan Cameron via; +Cc: Bobo WL, linux-cxl, qemu-arm

On Fri, 12 Aug 2022 16:44:03 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Thu, 11 Aug 2022 18:08:57 +0100
> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> 
> > On Tue, 9 Aug 2022 17:08:25 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >   
> > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > >     
> > > > Hi Jonathan
> > > > 
> > > > Thanks for your reply!
> > > > 
> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > <Jonathan.Cameron@huawei.com> wrote:      
> > > > >
> > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > >
> > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > >        
> > > > 
> > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > region successfully.
> > > > But have some errors in Nvdimm:
> > > > 
> > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe      
> > > 
> > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > I've been using to test (Which wasn't true until earlier this week). 
> > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > that much sense anymore.      
> > 
> > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > I'd forgotten that was still on the todo list. I don't think it will
> > be particularly hard to do and will take a look in next few days.
> > 
> > Very very indirectly this error is causing a driver probe fail that means that
> > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > Should not have gotten near that path though - hence the problem is actually
> > when we call cxl_pmem_get_config_data() and it returns an error because
> > we haven't fully connected up the command in QEMU.  
> 
> So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> We just need to relax condition on the supplied length.
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index c352a935c4..fdda9529fe 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
>      cxl_cmd = &cxl_cmd_set[set][cmd];
>      h = cxl_cmd->handler;
>      if (h) {
> -        if (len == cxl_cmd->in) {
> +        if (len == cxl_cmd->in || !cxl_cmd->in) {
Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.

With that fixed we hit new fun paths - after some errors we get the
worrying - not totally sure but looks like a failure on an error cleanup.
I'll chase down the error source, but even then this is probably triggerable by
hardware problem or similar.  Some bonus prints in here from me chasing
error paths, but it's otherwise just cxl/next + the fix I posted earlier today.

[   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
[   69.920108] nd_region_probe
[   69.920623] ------------[ cut here ]------------
[   69.920675] refcount_t: addition on 0; use-after-free.
[   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
[   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi cxl_core
[   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
[   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[   69.931482] Workqueue: events_unbound async_run_entry_fn
[   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   69.934023] pc : refcount_warn_saturate+0xa0/0x144
[   69.935161] lr : refcount_warn_saturate+0xa0/0x144
[   69.936541] sp : ffff80000890b960
[   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000
[   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000
[   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800
[   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff
[   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000
[   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375
[   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288
[   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e
[   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000
root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
[   69.957098] Call trace:
[   69.957959]  refcount_warn_saturate+0xa0/0x144
[   69.958773]  get_ndd+0x5c/0x80
[   69.959294]  nd_region_register_namespaces+0xe4/0xe90
[   69.960253]  nd_region_probe+0x100/0x290
[   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
[   69.962087]  really_probe+0x19c/0x3f0
[   69.962620]  __driver_probe_device+0x11c/0x190
[   69.963258]  driver_probe_device+0x44/0xf4
[   69.963773]  __device_attach_driver+0xa4/0x140
[   69.964471]  bus_for_each_drv+0x84/0xe0
[   69.965068]  __device_attach+0xb0/0x1f0
[   69.966101]  device_initial_probe+0x20/0x30
[   69.967142]  bus_probe_device+0xa4/0xb0
[   69.968104]  device_add+0x3e8/0x910
[   69.969111]  nd_async_device_register+0x24/0x74
[   69.969928]  async_run_entry_fn+0x40/0x150
[   69.970725]  process_one_work+0x1dc/0x450
[   69.971796]  worker_thread+0x154/0x450
[   69.972700]  kthread+0x118/0x120
[   69.974141]  ret_from_fork+0x10/0x20
[   69.975141] ---[ end trace 0000000000000000 ]---
[   70.117887] Into nd_namespace_pmem_set_resource()

>              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
>                  A_CXL_DEV_CMD_PAYLOAD;
>              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> 
> 
> This lets the nvdimm/region probe fine, but I'm getting some issues with
> namespace capacity so I'll look at what is causing that next.
> Unfortunately I'm not that familiar with the driver/nvdimm side of things
> so it's take a while to figure out what kicks off what!
> 
> Jonathan
> 
> > 
> > Jonathan
> > 
> >   
> > >     
> > > > 
> > > > And x4 region still failed with same errors, using latest cxl/preview
> > > > branch don't work.
> > > > I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> > > > 
> > > > Bob      
> > 
> >   
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-15 17:04             ` Jonathan Cameron via
@ 2022-08-15 17:14               ` Jonathan Cameron via
  -1 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-15 17:14 UTC (permalink / raw)
  To: Jonathan Cameron via; +Cc: Bobo WL, linux-cxl, qemu-arm

On Mon, 15 Aug 2022 18:04:44 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Fri, 12 Aug 2022 16:44:03 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Thu, 11 Aug 2022 18:08:57 +0100
> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> >   
> > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > >     
> > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > >       
> > > > > Hi Jonathan
> > > > > 
> > > > > Thanks for your reply!
> > > > > 
> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > <Jonathan.Cameron@huawei.com> wrote:        
> > > > > >
> > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > >
> > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > >          
> > > > > 
> > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > region successfully.
> > > > > But have some errors in Nvdimm:
> > > > > 
> > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe        
> > > > 
> > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > that much sense anymore.        
> > > 
> > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > I'd forgotten that was still on the todo list. I don't think it will
> > > be particularly hard to do and will take a look in next few days.
> > > 
> > > Very very indirectly this error is causing a driver probe fail that means that
> > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > Should not have gotten near that path though - hence the problem is actually
> > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > we haven't fully connected up the command in QEMU.    
> > 
> > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > We just need to relax condition on the supplied length.
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index c352a935c4..fdda9529fe 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> >      cxl_cmd = &cxl_cmd_set[set][cmd];
> >      h = cxl_cmd->handler;
> >      if (h) {
> > -        if (len == cxl_cmd->in) {
> > +        if (len == cxl_cmd->in || !cxl_cmd->in) {  
> Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.

Cause of the error is a failure in GET_LSA.
Reason, payload length is wrong in QEMU but was hidden previously by my wrong
fix here.  Probably still a good idea to inject an error in GET_LSA and chase
down the refcount issue.


diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index fdda9529fe..e8565fbd6e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -489,7 +489,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
         cmd_identify_memory_device, 0, 0 },
     [CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO",
         cmd_ccls_get_partition_info, 0, 0 },
-    [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 0, 0 },
+    [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 },
     [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
         ~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
     [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST",
@@ -510,12 +510,13 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
     cxl_cmd = &cxl_cmd_set[set][cmd];
     h = cxl_cmd->handler;
     if (h) {
-        if (len == cxl_cmd->in || !cxl_cmd->in) {
+        if (len == cxl_cmd->in || cxl_cmd->in == ~0) {
             cxl_cmd->payload = cxl_dstate->mbox_reg_state +
                 A_CXL_DEV_CMD_PAYLOAD;

And woot, we get a namespace in the LSA :)

I'll post QEMU fixes in next day or two.  Kernel side now seems more or less
fine be it with suspicious refcount underflow.

> 
> With that fixed we hit new fun paths - after some errors we get the
> worrying - not totally sure but looks like a failure on an error cleanup.
> I'll chase down the error source, but even then this is probably triggerable by
> hardware problem or similar.  Some bonus prints in here from me chasing
> error paths, but it's otherwise just cxl/next + the fix I posted earlier today.
> 
> [   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
> [   69.920108] nd_region_probe
> [   69.920623] ------------[ cut here ]------------
> [   69.920675] refcount_t: addition on 0; use-after-free.
> [   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
> [   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi cxl_core
> [   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
> [   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> [   69.931482] Workqueue: events_unbound async_run_entry_fn
> [   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   69.934023] pc : refcount_warn_saturate+0xa0/0x144
> [   69.935161] lr : refcount_warn_saturate+0xa0/0x144
> [   69.936541] sp : ffff80000890b960
> [   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000
> [   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000
> [   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800
> [   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff
> [   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000
> [   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375
> [   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288
> [   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e
> [   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000
> root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
> [   69.957098] Call trace:
> [   69.957959]  refcount_warn_saturate+0xa0/0x144
> [   69.958773]  get_ndd+0x5c/0x80
> [   69.959294]  nd_region_register_namespaces+0xe4/0xe90
> [   69.960253]  nd_region_probe+0x100/0x290
> [   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
> [   69.962087]  really_probe+0x19c/0x3f0
> [   69.962620]  __driver_probe_device+0x11c/0x190
> [   69.963258]  driver_probe_device+0x44/0xf4
> [   69.963773]  __device_attach_driver+0xa4/0x140
> [   69.964471]  bus_for_each_drv+0x84/0xe0
> [   69.965068]  __device_attach+0xb0/0x1f0
> [   69.966101]  device_initial_probe+0x20/0x30
> [   69.967142]  bus_probe_device+0xa4/0xb0
> [   69.968104]  device_add+0x3e8/0x910
> [   69.969111]  nd_async_device_register+0x24/0x74
> [   69.969928]  async_run_entry_fn+0x40/0x150
> [   69.970725]  process_one_work+0x1dc/0x450
> [   69.971796]  worker_thread+0x154/0x450
> [   69.972700]  kthread+0x118/0x120
> [   69.974141]  ret_from_fork+0x10/0x20
> [   69.975141] ---[ end trace 0000000000000000 ]---
> [   70.117887] Into nd_namespace_pmem_set_resource()
> 
> >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> >                  A_CXL_DEV_CMD_PAYLOAD;
> >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> > 
> > 
> > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > namespace capacity so I'll look at what is causing that next.
> > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > so it's take a while to figure out what kicks off what!
> > 
> > Jonathan
> >   
> > > 
> > > Jonathan
> > > 
> > >     
> > > >       
> > > > > 
> > > > > And x4 region still failed with same errors, using latest cxl/preview
> > > > > branch don't work.
> > > > > I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> > > > > 
> > > > > Bob        
> > > 
> > >     
> >   
> 


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-15 17:14               ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-15 17:14 UTC (permalink / raw)
  To: Jonathan Cameron via; +Cc: Bobo WL, linux-cxl, qemu-arm

On Mon, 15 Aug 2022 18:04:44 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Fri, 12 Aug 2022 16:44:03 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Thu, 11 Aug 2022 18:08:57 +0100
> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> >   
> > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > >     
> > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > >       
> > > > > Hi Jonathan
> > > > > 
> > > > > Thanks for your reply!
> > > > > 
> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > <Jonathan.Cameron@huawei.com> wrote:        
> > > > > >
> > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > >
> > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > >          
> > > > > 
> > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > region successfully.
> > > > > But have some errors in Nvdimm:
> > > > > 
> > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe        
> > > > 
> > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > that much sense anymore.        
> > > 
> > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > I'd forgotten that was still on the todo list. I don't think it will
> > > be particularly hard to do and will take a look in next few days.
> > > 
> > > Very very indirectly this error is causing a driver probe fail that means that
> > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > Should not have gotten near that path though - hence the problem is actually
> > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > we haven't fully connected up the command in QEMU.    
> > 
> > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > We just need to relax condition on the supplied length.
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index c352a935c4..fdda9529fe 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> >      cxl_cmd = &cxl_cmd_set[set][cmd];
> >      h = cxl_cmd->handler;
> >      if (h) {
> > -        if (len == cxl_cmd->in) {
> > +        if (len == cxl_cmd->in || !cxl_cmd->in) {  
> Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.

Cause of the error is a failure in GET_LSA.
Reason, payload length is wrong in QEMU but was hidden previously by my wrong
fix here.  Probably still a good idea to inject an error in GET_LSA and chase
down the refcount issue.


diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index fdda9529fe..e8565fbd6e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -489,7 +489,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
         cmd_identify_memory_device, 0, 0 },
     [CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO",
         cmd_ccls_get_partition_info, 0, 0 },
-    [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 0, 0 },
+    [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 },
     [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
         ~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
     [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST",
@@ -510,12 +510,13 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
     cxl_cmd = &cxl_cmd_set[set][cmd];
     h = cxl_cmd->handler;
     if (h) {
-        if (len == cxl_cmd->in || !cxl_cmd->in) {
+        if (len == cxl_cmd->in || cxl_cmd->in == ~0) {
             cxl_cmd->payload = cxl_dstate->mbox_reg_state +
                 A_CXL_DEV_CMD_PAYLOAD;

And woot, we get a namespace in the LSA :)

I'll post QEMU fixes in next day or two.  Kernel side now seems more or less
fine be it with suspicious refcount underflow.

> 
> With that fixed we hit new fun paths - after some errors we get the
> worrying - not totally sure but looks like a failure on an error cleanup.
> I'll chase down the error source, but even then this is probably triggerable by
> hardware problem or similar.  Some bonus prints in here from me chasing
> error paths, but it's otherwise just cxl/next + the fix I posted earlier today.
> 
> [   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
> [   69.920108] nd_region_probe
> [   69.920623] ------------[ cut here ]------------
> [   69.920675] refcount_t: addition on 0; use-after-free.
> [   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
> [   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi cxl_core
> [   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
> [   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> [   69.931482] Workqueue: events_unbound async_run_entry_fn
> [   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   69.934023] pc : refcount_warn_saturate+0xa0/0x144
> [   69.935161] lr : refcount_warn_saturate+0xa0/0x144
> [   69.936541] sp : ffff80000890b960
> [   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000
> [   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000
> [   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800
> [   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff
> [   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000
> [   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375
> [   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288
> [   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e
> [   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000
> root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
> [   69.957098] Call trace:
> [   69.957959]  refcount_warn_saturate+0xa0/0x144
> [   69.958773]  get_ndd+0x5c/0x80
> [   69.959294]  nd_region_register_namespaces+0xe4/0xe90
> [   69.960253]  nd_region_probe+0x100/0x290
> [   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
> [   69.962087]  really_probe+0x19c/0x3f0
> [   69.962620]  __driver_probe_device+0x11c/0x190
> [   69.963258]  driver_probe_device+0x44/0xf4
> [   69.963773]  __device_attach_driver+0xa4/0x140
> [   69.964471]  bus_for_each_drv+0x84/0xe0
> [   69.965068]  __device_attach+0xb0/0x1f0
> [   69.966101]  device_initial_probe+0x20/0x30
> [   69.967142]  bus_probe_device+0xa4/0xb0
> [   69.968104]  device_add+0x3e8/0x910
> [   69.969111]  nd_async_device_register+0x24/0x74
> [   69.969928]  async_run_entry_fn+0x40/0x150
> [   69.970725]  process_one_work+0x1dc/0x450
> [   69.971796]  worker_thread+0x154/0x450
> [   69.972700]  kthread+0x118/0x120
> [   69.974141]  ret_from_fork+0x10/0x20
> [   69.975141] ---[ end trace 0000000000000000 ]---
> [   70.117887] Into nd_namespace_pmem_set_resource()
> 
> >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
> >                  A_CXL_DEV_CMD_PAYLOAD;
> >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
> > 
> > 
> > This lets the nvdimm/region probe fine, but I'm getting some issues with
> > namespace capacity so I'll look at what is causing that next.
> > Unfortunately I'm not that familiar with the driver/nvdimm side of things
> > so it's take a while to figure out what kicks off what!
> > 
> > Jonathan
> >   
> > > 
> > > Jonathan
> > > 
> > >     
> > > >       
> > > > > 
> > > > > And x4 region still failed with same errors, using latest cxl/preview
> > > > > branch don't work.
> > > > > I have picked "Two CXL emulation fixes" patches in qemu, still not working.
> > > > > 
> > > > > Bob        
> > > 
> > >     
> >   
> 



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-15 17:04             ` Jonathan Cameron via
  (?)
  (?)
@ 2022-08-15 22:55             ` Dan Williams
  2022-08-17 11:25                 ` Jonathan Cameron via
  -1 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2022-08-15 22:55 UTC (permalink / raw)
  To: Jonathan Cameron, Jonathan Cameron via; +Cc: Bobo WL, linux-cxl, qemu-arm

Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 16:44:03 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Thu, 11 Aug 2022 18:08:57 +0100
> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> > 
> > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > >   
> > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > >     
> > > > > Hi Jonathan
> > > > > 
> > > > > Thanks for your reply!
> > > > > 
> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > <Jonathan.Cameron@huawei.com> wrote:      
> > > > > >
> > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > >
> > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > >        
> > > > > 
> > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > region successfully.
> > > > > But have some errors in Nvdimm:
> > > > > 
> > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe      
> > > > 
> > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > that much sense anymore.      
> > > 
> > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > I'd forgotten that was still on the todo list. I don't think it will
> > > be particularly hard to do and will take a look in next few days.
> > > 
> > > Very very indirectly this error is causing a driver probe fail that means that
> > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > Should not have gotten near that path though - hence the problem is actually
> > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > we haven't fully connected up the command in QEMU.  
> > 
> > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > We just need to relax condition on the supplied length.
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index c352a935c4..fdda9529fe 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> >      cxl_cmd = &cxl_cmd_set[set][cmd];
> >      h = cxl_cmd->handler;
> >      if (h) {
> > -        if (len == cxl_cmd->in) {
> > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
> Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.
> 
> With that fixed we hit new fun paths - after some errors we get the
> worrying - not totally sure but looks like a failure on an error cleanup.
> I'll chase down the error source, but even then this is probably triggerable by
> hardware problem or similar.  Some bonus prints in here from me chasing
> error paths, but it's otherwise just cxl/next + the fix I posted earlier today.

One of the scenarios that I cannot rule out is nvdimm_probe() racing
nd_region_probe(), but given all the work it takes to create a region I
suspect all the nvdimm_probe() work to have completed...

It is at least one potentially wrong hypothesis that needs to be chased
down.

> 
> [   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
> [   69.920108] nd_region_probe
> [   69.920623] ------------[ cut here ]------------
> [   69.920675] refcount_t: addition on 0; use-after-free.
> [   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
> [   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi cxl_core
> [   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
> [   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> [   69.931482] Workqueue: events_unbound async_run_entry_fn
> [   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   69.934023] pc : refcount_warn_saturate+0xa0/0x144
> [   69.935161] lr : refcount_warn_saturate+0xa0/0x144
> [   69.936541] sp : ffff80000890b960
> [   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000
> [   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000
> [   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800
> [   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff
> [   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000
> [   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375
> [   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288
> [   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e
> [   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000
> root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
> [   69.957098] Call trace:
> [   69.957959]  refcount_warn_saturate+0xa0/0x144
> [   69.958773]  get_ndd+0x5c/0x80
> [   69.959294]  nd_region_register_namespaces+0xe4/0xe90
> [   69.960253]  nd_region_probe+0x100/0x290
> [   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
> [   69.962087]  really_probe+0x19c/0x3f0
> [   69.962620]  __driver_probe_device+0x11c/0x190
> [   69.963258]  driver_probe_device+0x44/0xf4
> [   69.963773]  __device_attach_driver+0xa4/0x140
> [   69.964471]  bus_for_each_drv+0x84/0xe0
> [   69.965068]  __device_attach+0xb0/0x1f0
> [   69.966101]  device_initial_probe+0x20/0x30
> [   69.967142]  bus_probe_device+0xa4/0xb0
> [   69.968104]  device_add+0x3e8/0x910
> [   69.969111]  nd_async_device_register+0x24/0x74
> [   69.969928]  async_run_entry_fn+0x40/0x150
> [   69.970725]  process_one_work+0x1dc/0x450
> [   69.971796]  worker_thread+0x154/0x450
> [   69.972700]  kthread+0x118/0x120
> [   69.974141]  ret_from_fork+0x10/0x20
> [   69.975141] ---[ end trace 0000000000000000 ]---
> [   70.117887] Into nd_namespace_pmem_set_resource()

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-15 22:55             ` Dan Williams
@ 2022-08-17 11:25                 ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-17 11:25 UTC (permalink / raw)
  To: Dan Williams; +Cc: Jonathan Cameron via, Bobo WL, linux-cxl, qemu-arm

On Mon, 15 Aug 2022 15:55:15 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 16:44:03 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >   
> > > On Thu, 11 Aug 2022 18:08:57 +0100
> > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> > >   
> > > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > >     
> > > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > > >       
> > > > > > Hi Jonathan
> > > > > > 
> > > > > > Thanks for your reply!
> > > > > > 
> > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > > <Jonathan.Cameron@huawei.com> wrote:        
> > > > > > >
> > > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > > >
> > > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > > >          
> > > > > > 
> > > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > > region successfully.
> > > > > > But have some errors in Nvdimm:
> > > > > > 
> > > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe        
> > > > > 
> > > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > > that much sense anymore.        
> > > > 
> > > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > > I'd forgotten that was still on the todo list. I don't think it will
> > > > be particularly hard to do and will take a look in next few days.
> > > > 
> > > > Very very indirectly this error is causing a driver probe fail that means that
> > > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > > Should not have gotten near that path though - hence the problem is actually
> > > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > > we haven't fully connected up the command in QEMU.    
> > > 
> > > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > > We just need to relax condition on the supplied length.
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index c352a935c4..fdda9529fe 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> > >      cxl_cmd = &cxl_cmd_set[set][cmd];
> > >      h = cxl_cmd->handler;
> > >      if (h) {
> > > -        if (len == cxl_cmd->in) {
> > > +        if (len == cxl_cmd->in || !cxl_cmd->in) {  
> > Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.
> > 
> > With that fixed we hit new fun paths - after some errors we get the
> > worrying - not totally sure but looks like a failure on an error cleanup.
> > I'll chase down the error source, but even then this is probably triggerable by
> > hardware problem or similar.  Some bonus prints in here from me chasing
> > error paths, but it's otherwise just cxl/next + the fix I posted earlier today.  
> 
> One of the scenarios that I cannot rule out is nvdimm_probe() racing
> nd_region_probe(), but given all the work it takes to create a region I
> suspect all the nvdimm_probe() work to have completed...
> 
> It is at least one potentially wrong hypothesis that needs to be chased
> down.

Maybe there should be a special award for the non-intuitive 
ndctl create-namespace command (modifies existing namespace and might create
a different empty one...) I'm sure there is some interesting history behind that one :)

Upshot is I just threw a filesystem on fsdax and wrote some text files on it
to allow easy grepping. The right data ends up in the memory and a plausible
namespace description is stored in the LSA.

So to some degree at least it's 'working' on an 8 way direct connected
set of emulated devices.

One snag is that serial number support isn't yet upstream in QEMU.
(I have had it in my tree for a while but not posted it yet because of
 QEMU feature freeze)
https://gitlab.com/jic23/qemu/-/commit/144c783ea8a5fbe169f46ea1ba92940157f42733
That's needed for meaningful cookie generation.  Otherwise you can build the
namespace once, but it won't work on next probe as the cookie is 0 and you
hit some error paths.

Maybe sensible to add a sanity check and fail namespace creation if
cookie is 0?  (Silly side question, but is there a theoretical risk of
a serial number / other data combination leading to a fletcher64()
checksum that happens to be 0 - that would give a very odd bug report!)

So to make it work the following is needed:

1) The kernel fix for mailbox buffer overflow.
2) Qemu fix for size of arguements for get_lsa
3) Qemu fix to allow variable size input arguements (for set_lsa)
4) Serial number patch above + command lines to qemu to set appropriate
   serial numbers.

I'll send out the QEMU fixes shortly and post the Serial number patch,
though that almost certainly won't go in until next QEMU development
cycle starts in a few weeks.

Next up, run through same tests on some other topologies.

Jonathan

> 
> > 
> > [   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
> > [   69.920108] nd_region_probe
> > [   69.920623] ------------[ cut here ]------------
> > [   69.920675] refcount_t: addition on 0; use-after-free.
> > [   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
> > [   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi cxl_core
> > [   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
> > [   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > [   69.931482] Workqueue: events_unbound async_run_entry_fn
> > [   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [   69.934023] pc : refcount_warn_saturate+0xa0/0x144
> > [   69.935161] lr : refcount_warn_saturate+0xa0/0x144
> > [   69.936541] sp : ffff80000890b960
> > [   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000
> > [   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000
> > [   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800
> > [   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff
> > [   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000
> > [   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375
> > [   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288
> > [   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e
> > [   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000
> > root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
> > [   69.957098] Call trace:
> > [   69.957959]  refcount_warn_saturate+0xa0/0x144
> > [   69.958773]  get_ndd+0x5c/0x80
> > [   69.959294]  nd_region_register_namespaces+0xe4/0xe90
> > [   69.960253]  nd_region_probe+0x100/0x290
> > [   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
> > [   69.962087]  really_probe+0x19c/0x3f0
> > [   69.962620]  __driver_probe_device+0x11c/0x190
> > [   69.963258]  driver_probe_device+0x44/0xf4
> > [   69.963773]  __device_attach_driver+0xa4/0x140
> > [   69.964471]  bus_for_each_drv+0x84/0xe0
> > [   69.965068]  __device_attach+0xb0/0x1f0
> > [   69.966101]  device_initial_probe+0x20/0x30
> > [   69.967142]  bus_probe_device+0xa4/0xb0
> > [   69.968104]  device_add+0x3e8/0x910
> > [   69.969111]  nd_async_device_register+0x24/0x74
> > [   69.969928]  async_run_entry_fn+0x40/0x150
> > [   69.970725]  process_one_work+0x1dc/0x450
> > [   69.971796]  worker_thread+0x154/0x450
> > [   69.972700]  kthread+0x118/0x120
> > [   69.974141]  ret_from_fork+0x10/0x20
> > [   69.975141] ---[ end trace 0000000000000000 ]---
> > [   70.117887] Into nd_namespace_pmem_set_resource()  


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-17 11:25                 ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-17 11:25 UTC (permalink / raw)
  To: Dan Williams; +Cc: Jonathan Cameron via, Bobo WL, linux-cxl, qemu-arm

On Mon, 15 Aug 2022 15:55:15 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 16:44:03 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >   
> > > On Thu, 11 Aug 2022 18:08:57 +0100
> > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> > >   
> > > > On Tue, 9 Aug 2022 17:08:25 +0100
> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > >     
> > > > > On Tue, 9 Aug 2022 21:07:06 +0800
> > > > > Bobo WL <lmw.bobo@gmail.com> wrote:
> > > > >       
> > > > > > Hi Jonathan
> > > > > > 
> > > > > > Thanks for your reply!
> > > > > > 
> > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
> > > > > > <Jonathan.Cameron@huawei.com> wrote:        
> > > > > > >
> > > > > > > Probably not related to your problem, but there is a disconnect in QEMU /
> > > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
> > > > > > > has a single root port. Spec allows it to be provided or not as an implementation choice.
> > > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
> > > > > > >
> > > > > > > The temporary solution is to throw in a second root port on the HB and not
> > > > > > > connect anything to it.  Longer term I may special case this so that the particular
> > > > > > > decoder defaults to pass through settings in QEMU if there is only one root port.
> > > > > > >          
> > > > > > 
> > > > > > You are right! After adding an extra HB in qemu, I can create a x1
> > > > > > region successfully.
> > > > > > But have some errors in Nvdimm:
> > > > > > 
> > > > > > [   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
> > > > > > [   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
> > > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe        
> > > > > 
> > > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo list to chase
> > > > > down. Once I reach this state I can verify the HDM Decode is correct which is what
> > > > > I've been using to test (Which wasn't true until earlier this week). 
> > > > > I'm currently testing via devmem, more for historical reasons than because it makes
> > > > > that much sense anymore.        
> > > > 
> > > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
> > > > I'd forgotten that was still on the todo list. I don't think it will
> > > > be particularly hard to do and will take a look in next few days.
> > > > 
> > > > Very very indirectly this error is causing a driver probe fail that means that
> > > > we hit a code path that has a rather odd looking check on NDD_LABELING.
> > > > Should not have gotten near that path though - hence the problem is actually
> > > > when we call cxl_pmem_get_config_data() and it returns an error because
> > > > we haven't fully connected up the command in QEMU.    
> > > 
> > > So a least one bug in QEMU. We were not supporting variable length payloads on mailbox
> > > inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
> > > We just need to relax condition on the supplied length.
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index c352a935c4..fdda9529fe 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> > >      cxl_cmd = &cxl_cmd_set[set][cmd];
> > >      h = cxl_cmd->handler;
> > >      if (h) {
> > > -        if (len == cxl_cmd->in) {
> > > +        if (len == cxl_cmd->in || !cxl_cmd->in) {  
> > Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.
> > 
> > With that fixed we hit new fun paths - after some errors we get the
> > worrying - not totally sure but looks like a failure on an error cleanup.
> > I'll chase down the error source, but even then this is probably triggerable by
> > hardware problem or similar.  Some bonus prints in here from me chasing
> > error paths, but it's otherwise just cxl/next + the fix I posted earlier today.  
> 
> One of the scenarios that I cannot rule out is nvdimm_probe() racing
> nd_region_probe(), but given all the work it takes to create a region I
> suspect all the nvdimm_probe() work to have completed...
> 
> It is at least one potentially wrong hypothesis that needs to be chased
> down.

Maybe there should be a special award for the non-intuitive 
ndctl create-namespace command (modifies existing namespace and might create
a different empty one...) I'm sure there is some interesting history behind that one :)

Upshot is I just threw a filesystem on fsdax and wrote some text files on it
to allow easy grepping. The right data ends up in the memory and a plausible
namespace description is stored in the LSA.

So to some degree at least it's 'working' on an 8 way direct connected
set of emulated devices.

One snag is that serial number support isn't yet upstream in QEMU.
(I have had it in my tree for a while but not posted it yet because of
 QEMU feature freeze)
https://gitlab.com/jic23/qemu/-/commit/144c783ea8a5fbe169f46ea1ba92940157f42733
That's needed for meaningful cookie generation.  Otherwise you can build the
namespace once, but it won't work on next probe as the cookie is 0 and you
hit some error paths.

Maybe sensible to add a sanity check and fail namespace creation if
cookie is 0?  (Silly side question, but is there a theoretical risk of
a serial number / other data combination leading to a fletcher64()
checksum that happens to be 0 - that would give a very odd bug report!)

So to make it work the following is needed:

1) The kernel fix for mailbox buffer overflow.
2) Qemu fix for size of arguements for get_lsa
3) Qemu fix to allow variable size input arguements (for set_lsa)
4) Serial number patch above + command lines to qemu to set appropriate
   serial numbers.

I'll send out the QEMU fixes shortly and post the Serial number patch,
though that almost certainly won't go in until next QEMU development
cycle starts in a few weeks.

Next up, run through same tests on some other topologies.

Jonathan

> 
> > 
> > [   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
> > [   69.920108] nd_region_probe
> > [   69.920623] ------------[ cut here ]------------
> > [   69.920675] refcount_t: addition on 0; use-after-free.
> > [   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
> > [   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi cxl_core
> > [   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
> > [   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > [   69.931482] Workqueue: events_unbound async_run_entry_fn
> > [   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [   69.934023] pc : refcount_warn_saturate+0xa0/0x144
> > [   69.935161] lr : refcount_warn_saturate+0xa0/0x144
> > [   69.936541] sp : ffff80000890b960
> > [   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000
> > [   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000
> > [   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800
> > [   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff
> > [   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000
> > [   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375
> > [   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288
> > [   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e
> > [   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000
> > root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
> > [   69.957098] Call trace:
> > [   69.957959]  refcount_warn_saturate+0xa0/0x144
> > [   69.958773]  get_ndd+0x5c/0x80
> > [   69.959294]  nd_region_register_namespaces+0xe4/0xe90
> > [   69.960253]  nd_region_probe+0x100/0x290
> > [   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
> > [   69.962087]  really_probe+0x19c/0x3f0
> > [   69.962620]  __driver_probe_device+0x11c/0x190
> > [   69.963258]  driver_probe_device+0x44/0xf4
> > [   69.963773]  __device_attach_driver+0xa4/0x140
> > [   69.964471]  bus_for_each_drv+0x84/0xe0
> > [   69.965068]  __device_attach+0xb0/0x1f0
> > [   69.966101]  device_initial_probe+0x20/0x30
> > [   69.967142]  bus_probe_device+0xa4/0xb0
> > [   69.968104]  device_add+0x3e8/0x910
> > [   69.969111]  nd_async_device_register+0x24/0x74
> > [   69.969928]  async_run_entry_fn+0x40/0x150
> > [   69.970725]  process_one_work+0x1dc/0x450
> > [   69.971796]  worker_thread+0x154/0x450
> > [   69.972700]  kthread+0x118/0x120
> > [   69.974141]  ret_from_fork+0x10/0x20
> > [   69.975141] ---[ end trace 0000000000000000 ]---
> > [   70.117887] Into nd_namespace_pmem_set_resource()  



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-12  0:46       ` Dan Williams
@ 2022-08-17 16:16           ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-17 16:16 UTC (permalink / raw)
  To: Dan Williams; +Cc: Bobo WL, linux-cxl, qemu-devel, qemu-arm

On Thu, 11 Aug 2022 17:46:55 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Dan Williams wrote:
> > Bobo WL wrote:  
> > > Hi Dan,
> > > 
> > > Thanks for your reply!
> > > 
> > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:  
> > > >
> > > > What is the output of:
> > > >
> > > >     cxl list -MDTu -d decoder0.0
> > > >
> > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > > at least not in the specified order, or that validation check is broken.  
> > > 
> > > Command "cxl list -MDTu -d decoder0.0" output:  
> > 
> > Thanks for this, I think I know the problem, but will try some
> > experiments with cxl_test first.  
> 
> Hmm, so my cxl_test experiment unfortunately passed so I'm not
> reproducing the failure mode. This is the result of creating x4 region
> with devices directly attached to a single host-bridge:
> 
> # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
> {
>   "region":"region8",
>   "resource":"0xf1f0000000",
>   "size":"1024.00 MiB (1073.74 MB)",
>   "interleave_ways":4,
>   "interleave_granularity":256,
>   "decode_state":"commit",
>   "mappings":[
>     {
>       "position":3,
>       "memdev":"mem11",
>       "decoder":"decoder21.0"
>     },
>     {
>       "position":2,
>       "memdev":"mem9",
>       "decoder":"decoder19.0"
>     },
>     {
>       "position":1,
>       "memdev":"mem10",
>       "decoder":"decoder20.0"
>     },
>     {
>       "position":0,
>       "memdev":"mem12",
>       "decoder":"decoder22.0"
>     }
>   ]
> }
> cxl region: cmd_create_region: created 1 region
> 
> > Did the commit_store() crash stop reproducing with latest cxl/preview
> > branch?  
> 
> I missed the answer to this question.
> 
> All of these changes are now in Linus' tree perhaps give that a try and
> post the debug log again?

Hi Dan,

I've moved onto looking at this one.
1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that up
at some stage), 1 switch, 4 downstream switch ports each with a type 3

I'm not getting a crash, but can't successfully setup a region.
Upon adding the final target
It's failing in check_last_peer() as pos < distance.
Seems distance is 4 which makes me think it's using the wrong level of the heirarchy for
some reason or that distance check is wrong.
Wasn't a good idea to just skip that step though as it goes boom - though
stack trace is not useful.

Jonathan







^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-17 16:16           ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-17 16:16 UTC (permalink / raw)
  To: Dan Williams; +Cc: Bobo WL, linux-cxl, qemu-devel, qemu-arm

On Thu, 11 Aug 2022 17:46:55 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Dan Williams wrote:
> > Bobo WL wrote:  
> > > Hi Dan,
> > > 
> > > Thanks for your reply!
> > > 
> > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:  
> > > >
> > > > What is the output of:
> > > >
> > > >     cxl list -MDTu -d decoder0.0
> > > >
> > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > > at least not in the specified order, or that validation check is broken.  
> > > 
> > > Command "cxl list -MDTu -d decoder0.0" output:  
> > 
> > Thanks for this, I think I know the problem, but will try some
> > experiments with cxl_test first.  
> 
> Hmm, so my cxl_test experiment unfortunately passed so I'm not
> reproducing the failure mode. This is the result of creating x4 region
> with devices directly attached to a single host-bridge:
> 
> # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
> {
>   "region":"region8",
>   "resource":"0xf1f0000000",
>   "size":"1024.00 MiB (1073.74 MB)",
>   "interleave_ways":4,
>   "interleave_granularity":256,
>   "decode_state":"commit",
>   "mappings":[
>     {
>       "position":3,
>       "memdev":"mem11",
>       "decoder":"decoder21.0"
>     },
>     {
>       "position":2,
>       "memdev":"mem9",
>       "decoder":"decoder19.0"
>     },
>     {
>       "position":1,
>       "memdev":"mem10",
>       "decoder":"decoder20.0"
>     },
>     {
>       "position":0,
>       "memdev":"mem12",
>       "decoder":"decoder22.0"
>     }
>   ]
> }
> cxl region: cmd_create_region: created 1 region
> 
> > Did the commit_store() crash stop reproducing with latest cxl/preview
> > branch?  
> 
> I missed the answer to this question.
> 
> All of these changes are now in Linus' tree perhaps give that a try and
> post the debug log again?

Hi Dan,

I've moved onto looking at this one.
1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that up
at some stage), 1 switch, 4 downstream switch ports each with a type 3

I'm not getting a crash, but can't successfully setup a region.
Upon adding the final target
It's failing in check_last_peer() as pos < distance.
Seems distance is 4 which makes me think it's using the wrong level of the heirarchy for
some reason or that distance check is wrong.
Wasn't a good idea to just skip that step though as it goes boom - though
stack trace is not useful.

Jonathan








^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-17 16:16           ` Jonathan Cameron via
@ 2022-08-18 16:37             ` Jonathan Cameron via
  -1 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-18 16:37 UTC (permalink / raw)
  To: Dan Williams; +Cc: Bobo WL, linux-cxl, qemu-devel, qemu-arm

On Wed, 17 Aug 2022 17:16:19 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Thu, 11 Aug 2022 17:46:55 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Dan Williams wrote:  
> > > Bobo WL wrote:    
> > > > Hi Dan,
> > > > 
> > > > Thanks for your reply!
> > > > 
> > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:    
> > > > >
> > > > > What is the output of:
> > > > >
> > > > >     cxl list -MDTu -d decoder0.0
> > > > >
> > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > > > at least not in the specified order, or that validation check is broken.    
> > > > 
> > > > Command "cxl list -MDTu -d decoder0.0" output:    
> > > 
> > > Thanks for this, I think I know the problem, but will try some
> > > experiments with cxl_test first.    
> > 
> > Hmm, so my cxl_test experiment unfortunately passed so I'm not
> > reproducing the failure mode. This is the result of creating x4 region
> > with devices directly attached to a single host-bridge:
> > 
> > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
> > {
> >   "region":"region8",
> >   "resource":"0xf1f0000000",
> >   "size":"1024.00 MiB (1073.74 MB)",
> >   "interleave_ways":4,
> >   "interleave_granularity":256,
> >   "decode_state":"commit",
> >   "mappings":[
> >     {
> >       "position":3,
> >       "memdev":"mem11",
> >       "decoder":"decoder21.0"
> >     },
> >     {
> >       "position":2,
> >       "memdev":"mem9",
> >       "decoder":"decoder19.0"
> >     },
> >     {
> >       "position":1,
> >       "memdev":"mem10",
> >       "decoder":"decoder20.0"
> >     },
> >     {
> >       "position":0,
> >       "memdev":"mem12",
> >       "decoder":"decoder22.0"
> >     }
> >   ]
> > }
> > cxl region: cmd_create_region: created 1 region
> >   
> > > Did the commit_store() crash stop reproducing with latest cxl/preview
> > > branch?    
> > 
> > I missed the answer to this question.
> > 
> > All of these changes are now in Linus' tree perhaps give that a try and
> > post the debug log again?  
> 
> Hi Dan,
> 
> I've moved onto looking at this one.
> 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that up
> at some stage), 1 switch, 4 downstream switch ports each with a type 3
> 
> I'm not getting a crash, but can't successfully setup a region.
> Upon adding the final target
> It's failing in check_last_peer() as pos < distance.
> Seems distance is 4 which makes me think it's using the wrong level of the heirarchy for
> some reason or that distance check is wrong.
> Wasn't a good idea to just skip that step though as it goes boom - though
> stack trace is not useful.

Turns out really weird corruption happens if you accidentally back two type3 devices
with the same memory device. Who would have thought it :)

That aside ignoring the check_last_peer() failure seems to make everything work for this
topology.  I'm not seeing the crash, so my guess is we fixed it somewhere along the way.

Now for the fun one.  I've replicated the crash if we have

1HB 1*RP 1SW, 4SW-DSP, 4Type3

Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be programmed
but the null pointer dereference isn't related to that.

The bug is straight forward.  Not all decoders have commit callbacks... Will send out
a possible fix shortly.

Jonathan



> 
> Jonathan
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-18 16:37             ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-18 16:37 UTC (permalink / raw)
  To: Dan Williams; +Cc: Bobo WL, linux-cxl, qemu-devel, qemu-arm

On Wed, 17 Aug 2022 17:16:19 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Thu, 11 Aug 2022 17:46:55 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Dan Williams wrote:  
> > > Bobo WL wrote:    
> > > > Hi Dan,
> > > > 
> > > > Thanks for your reply!
> > > > 
> > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:    
> > > > >
> > > > > What is the output of:
> > > > >
> > > > >     cxl list -MDTu -d decoder0.0
> > > > >
> > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > > > at least not in the specified order, or that validation check is broken.    
> > > > 
> > > > Command "cxl list -MDTu -d decoder0.0" output:    
> > > 
> > > Thanks for this, I think I know the problem, but will try some
> > > experiments with cxl_test first.    
> > 
> > Hmm, so my cxl_test experiment unfortunately passed so I'm not
> > reproducing the failure mode. This is the result of creating x4 region
> > with devices directly attached to a single host-bridge:
> > 
> > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
> > {
> >   "region":"region8",
> >   "resource":"0xf1f0000000",
> >   "size":"1024.00 MiB (1073.74 MB)",
> >   "interleave_ways":4,
> >   "interleave_granularity":256,
> >   "decode_state":"commit",
> >   "mappings":[
> >     {
> >       "position":3,
> >       "memdev":"mem11",
> >       "decoder":"decoder21.0"
> >     },
> >     {
> >       "position":2,
> >       "memdev":"mem9",
> >       "decoder":"decoder19.0"
> >     },
> >     {
> >       "position":1,
> >       "memdev":"mem10",
> >       "decoder":"decoder20.0"
> >     },
> >     {
> >       "position":0,
> >       "memdev":"mem12",
> >       "decoder":"decoder22.0"
> >     }
> >   ]
> > }
> > cxl region: cmd_create_region: created 1 region
> >   
> > > Did the commit_store() crash stop reproducing with latest cxl/preview
> > > branch?    
> > 
> > I missed the answer to this question.
> > 
> > All of these changes are now in Linus' tree perhaps give that a try and
> > post the debug log again?  
> 
> Hi Dan,
> 
> I've moved onto looking at this one.
> 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that up
> at some stage), 1 switch, 4 downstream switch ports each with a type 3
> 
> I'm not getting a crash, but can't successfully setup a region.
> Upon adding the final target
> It's failing in check_last_peer() as pos < distance.
> Seems distance is 4 which makes me think it's using the wrong level of the heirarchy for
> some reason or that distance check is wrong.
> Wasn't a good idea to just skip that step though as it goes boom - though
> stack trace is not useful.

Turns out really weird corruption happens if you accidentally back two type3 devices
with the same memory device. Who would have thought it :)

That aside ignoring the check_last_peer() failure seems to make everything work for this
topology.  I'm not seeing the crash, so my guess is we fixed it somewhere along the way.

Now for the fun one.  I've replicated the crash if we have

1HB 1*RP 1SW, 4SW-DSP, 4Type3

Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be programmed
but the null pointer dereference isn't related to that.

The bug is straight forward.  Not all decoders have commit callbacks... Will send out
a possible fix shortly.

Jonathan



> 
> Jonathan
> 
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-18 16:37             ` Jonathan Cameron via
@ 2022-08-19  8:46               ` Jonathan Cameron via
  -1 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-08-19  8:46 UTC (permalink / raw)
  To: Jonathan Cameron via
  Cc: Jonathan Cameron, Dan Williams, Bobo WL, linux-cxl, qemu-arm

On Thu, 18 Aug 2022 17:37:40 +0100
Jonathan Cameron via <qemu-devel@nongnu.org> wrote:

> On Wed, 17 Aug 2022 17:16:19 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Thu, 11 Aug 2022 17:46:55 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > Dan Williams wrote:    
> > > > Bobo WL wrote:      
> > > > > Hi Dan,
> > > > > 
> > > > > Thanks for your reply!
> > > > > 
> > > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:      
> > > > > >
> > > > > > What is the output of:
> > > > > >
> > > > > >     cxl list -MDTu -d decoder0.0
> > > > > >
> > > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > > > > at least not in the specified order, or that validation check is broken.      
> > > > > 
> > > > > Command "cxl list -MDTu -d decoder0.0" output:      
> > > > 
> > > > Thanks for this, I think I know the problem, but will try some
> > > > experiments with cxl_test first.      
> > > 
> > > Hmm, so my cxl_test experiment unfortunately passed so I'm not
> > > reproducing the failure mode. This is the result of creating x4 region
> > > with devices directly attached to a single host-bridge:
> > > 
> > > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
> > > {
> > >   "region":"region8",
> > >   "resource":"0xf1f0000000",
> > >   "size":"1024.00 MiB (1073.74 MB)",
> > >   "interleave_ways":4,
> > >   "interleave_granularity":256,
> > >   "decode_state":"commit",
> > >   "mappings":[
> > >     {
> > >       "position":3,
> > >       "memdev":"mem11",
> > >       "decoder":"decoder21.0"
> > >     },
> > >     {
> > >       "position":2,
> > >       "memdev":"mem9",
> > >       "decoder":"decoder19.0"
> > >     },
> > >     {
> > >       "position":1,
> > >       "memdev":"mem10",
> > >       "decoder":"decoder20.0"
> > >     },
> > >     {
> > >       "position":0,
> > >       "memdev":"mem12",
> > >       "decoder":"decoder22.0"
> > >     }
> > >   ]
> > > }
> > > cxl region: cmd_create_region: created 1 region
> > >     
> > > > Did the commit_store() crash stop reproducing with latest cxl/preview
> > > > branch?      
> > > 
> > > I missed the answer to this question.
> > > 
> > > All of these changes are now in Linus' tree perhaps give that a try and
> > > post the debug log again?    
> > 
> > Hi Dan,
> > 
> > I've moved onto looking at this one.
> > 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that up
> > at some stage), 1 switch, 4 downstream switch ports each with a type 3
> > 
> > I'm not getting a crash, but can't successfully setup a region.
> > Upon adding the final target
> > It's failing in check_last_peer() as pos < distance.
> > Seems distance is 4 which makes me think it's using the wrong level of the heirarchy for
> > some reason or that distance check is wrong.
> > Wasn't a good idea to just skip that step though as it goes boom - though
> > stack trace is not useful.  
> 
> Turns out really weird corruption happens if you accidentally back two type3 devices
> with the same memory device. Who would have thought it :)
> 
> That aside ignoring the check_last_peer() failure seems to make everything work for this
> topology.  I'm not seeing the crash, so my guess is we fixed it somewhere along the way.
> 
> Now for the fun one.  I've replicated the crash if we have
> 
> 1HB 1*RP 1SW, 4SW-DSP, 4Type3
> 
> Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be programmed
> but the null pointer dereference isn't related to that.
> 
> The bug is straight forward.  Not all decoders have commit callbacks... Will send out
> a possible fix shortly.
> 
For completeness I'm carrying this hack because I haven't gotten my head
around the right fix for check_last_peer() failing on this test topology.

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c49d9a5f1091..275e143bd748 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
                                rc = check_last_peer(cxled, ep, cxl_rr,
                                                     distance);
                                if (rc)
-                                       return rc;
+                                       //      return rc;
                                goto out_target_set;
                        }
                goto add_target;
--

I might find more bugs with more testing, but this is all the ones I've
seen so far + in Bobo's reports.  Qemu fixes are now in upstream so
will be there in the release. 

As a reminder, testing on QEMU has a few corners...

Need a patch to add serial number ECAP support. It is on list for revew,
but will have wait for after QEMU 7.1 release (which may be next week)

QEMU still assumes HDM decoder on the host bridge will be programmed.
So if you want anything to work there should be at least
2 RP below the HB (no need to plug anything in to one of them).

I don't want to add a commandline parameter to hide the decoder in QEMU
and detecting there is only one RP would require moving a bunch of static
stuff into runtime code (I think).

I still think we should make the kernel check to see if there is a decoder,
but if not I might see how bad a hack it is to have QEMU ignore that decoder
if not committed in this one special case (HB HDM decoder with only one place
it can send stuff). Obviously that would be a break from specification
so less than idea!

Thanks,

Jonathan


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-08-19  8:46               ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-08-19  8:46 UTC (permalink / raw)
  To: Jonathan Cameron via
  Cc: Jonathan Cameron, Dan Williams, Bobo WL, linux-cxl, qemu-arm

On Thu, 18 Aug 2022 17:37:40 +0100
Jonathan Cameron via <qemu-devel@nongnu.org> wrote:

> On Wed, 17 Aug 2022 17:16:19 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Thu, 11 Aug 2022 17:46:55 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > Dan Williams wrote:    
> > > > Bobo WL wrote:      
> > > > > Hi Dan,
> > > > > 
> > > > > Thanks for your reply!
> > > > > 
> > > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:      
> > > > > >
> > > > > > What is the output of:
> > > > > >
> > > > > >     cxl list -MDTu -d decoder0.0
> > > > > >
> > > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > > > > at least not in the specified order, or that validation check is broken.      
> > > > > 
> > > > > Command "cxl list -MDTu -d decoder0.0" output:      
> > > > 
> > > > Thanks for this, I think I know the problem, but will try some
> > > > experiments with cxl_test first.      
> > > 
> > > Hmm, so my cxl_test experiment unfortunately passed so I'm not
> > > reproducing the failure mode. This is the result of creating x4 region
> > > with devices directly attached to a single host-bridge:
> > > 
> > > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
> > > {
> > >   "region":"region8",
> > >   "resource":"0xf1f0000000",
> > >   "size":"1024.00 MiB (1073.74 MB)",
> > >   "interleave_ways":4,
> > >   "interleave_granularity":256,
> > >   "decode_state":"commit",
> > >   "mappings":[
> > >     {
> > >       "position":3,
> > >       "memdev":"mem11",
> > >       "decoder":"decoder21.0"
> > >     },
> > >     {
> > >       "position":2,
> > >       "memdev":"mem9",
> > >       "decoder":"decoder19.0"
> > >     },
> > >     {
> > >       "position":1,
> > >       "memdev":"mem10",
> > >       "decoder":"decoder20.0"
> > >     },
> > >     {
> > >       "position":0,
> > >       "memdev":"mem12",
> > >       "decoder":"decoder22.0"
> > >     }
> > >   ]
> > > }
> > > cxl region: cmd_create_region: created 1 region
> > >     
> > > > Did the commit_store() crash stop reproducing with latest cxl/preview
> > > > branch?      
> > > 
> > > I missed the answer to this question.
> > > 
> > > All of these changes are now in Linus' tree perhaps give that a try and
> > > post the debug log again?    
> > 
> > Hi Dan,
> > 
> > I've moved onto looking at this one.
> > 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that up
> > at some stage), 1 switch, 4 downstream switch ports each with a type 3
> > 
> > I'm not getting a crash, but can't successfully setup a region.
> > Upon adding the final target
> > It's failing in check_last_peer() as pos < distance.
> > Seems distance is 4 which makes me think it's using the wrong level of the heirarchy for
> > some reason or that distance check is wrong.
> > Wasn't a good idea to just skip that step though as it goes boom - though
> > stack trace is not useful.  
> 
> Turns out really weird corruption happens if you accidentally back two type3 devices
> with the same memory device. Who would have thought it :)
> 
> That aside ignoring the check_last_peer() failure seems to make everything work for this
> topology.  I'm not seeing the crash, so my guess is we fixed it somewhere along the way.
> 
> Now for the fun one.  I've replicated the crash if we have
> 
> 1HB 1*RP 1SW, 4SW-DSP, 4Type3
> 
> Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be programmed
> but the null pointer dereference isn't related to that.
> 
> The bug is straight forward.  Not all decoders have commit callbacks... Will send out
> a possible fix shortly.
> 
For completeness I'm carrying this hack because I haven't gotten my head
around the right fix for check_last_peer() failing on this test topology.

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c49d9a5f1091..275e143bd748 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
                                rc = check_last_peer(cxled, ep, cxl_rr,
                                                     distance);
                                if (rc)
-                                       return rc;
+                                       //      return rc;
                                goto out_target_set;
                        }
                goto add_target;
--

I might find more bugs with more testing, but this is all the ones I've
seen so far + in Bobo's reports.  Qemu fixes are now in upstream so
will be there in the release. 

As a reminder, testing on QEMU has a few corners...

Need a patch to add serial number ECAP support. It is on list for revew,
but will have wait for after QEMU 7.1 release (which may be next week)

QEMU still assumes HDM decoder on the host bridge will be programmed.
So if you want anything to work there should be at least
2 RP below the HB (no need to plug anything in to one of them).

I don't want to add a commandline parameter to hide the decoder in QEMU
and detecting there is only one RP would require moving a bunch of static
stuff into runtime code (I think).

I still think we should make the kernel check to see if there is a decoder,
but if not I might see how bad a hack it is to have QEMU ignore that decoder
if not committed in this one special case (HB HDM decoder with only one place
it can send stuff). Obviously that would be a break from specification
so less than idea!

Thanks,

Jonathan



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
  2022-08-19  8:46               ` Jonathan Cameron via
@ 2022-10-10 16:20                 ` Jonathan Cameron via
  -1 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron @ 2022-10-10 16:20 UTC (permalink / raw)
  To: Jonathan Cameron via; +Cc: Dan Williams, Bobo WL, linux-cxl, qemu-arm

On Fri, 19 Aug 2022 09:46:55 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Thu, 18 Aug 2022 17:37:40 +0100
> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> 
> > On Wed, 17 Aug 2022 17:16:19 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >   
> > > On Thu, 11 Aug 2022 17:46:55 -0700
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >     
> > > > Dan Williams wrote:      
> > > > > Bobo WL wrote:        
> > > > > > Hi Dan,
> > > > > > 
> > > > > > Thanks for your reply!
> > > > > > 
> > > > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:        
> > > > > > >
> > > > > > > What is the output of:
> > > > > > >
> > > > > > >     cxl list -MDTu -d decoder0.0
> > > > > > >
> > > > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > > > > > at least not in the specified order, or that validation check is broken.        
> > > > > > 
> > > > > > Command "cxl list -MDTu -d decoder0.0" output:        
> > > > > 
> > > > > Thanks for this, I think I know the problem, but will try some
> > > > > experiments with cxl_test first.        
> > > > 
> > > > Hmm, so my cxl_test experiment unfortunately passed so I'm not
> > > > reproducing the failure mode. This is the result of creating x4 region
> > > > with devices directly attached to a single host-bridge:
> > > > 
> > > > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
> > > > {
> > > >   "region":"region8",
> > > >   "resource":"0xf1f0000000",
> > > >   "size":"1024.00 MiB (1073.74 MB)",
> > > >   "interleave_ways":4,
> > > >   "interleave_granularity":256,
> > > >   "decode_state":"commit",
> > > >   "mappings":[
> > > >     {
> > > >       "position":3,
> > > >       "memdev":"mem11",
> > > >       "decoder":"decoder21.0"
> > > >     },
> > > >     {
> > > >       "position":2,
> > > >       "memdev":"mem9",
> > > >       "decoder":"decoder19.0"
> > > >     },
> > > >     {
> > > >       "position":1,
> > > >       "memdev":"mem10",
> > > >       "decoder":"decoder20.0"
> > > >     },
> > > >     {
> > > >       "position":0,
> > > >       "memdev":"mem12",
> > > >       "decoder":"decoder22.0"
> > > >     }
> > > >   ]
> > > > }
> > > > cxl region: cmd_create_region: created 1 region
> > > >       
> > > > > Did the commit_store() crash stop reproducing with latest cxl/preview
> > > > > branch?        
> > > > 
> > > > I missed the answer to this question.
> > > > 
> > > > All of these changes are now in Linus' tree perhaps give that a try and
> > > > post the debug log again?      
> > > 
> > > Hi Dan,
> > > 
> > > I've moved onto looking at this one.
> > > 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that up
> > > at some stage), 1 switch, 4 downstream switch ports each with a type 3
> > > 
> > > I'm not getting a crash, but can't successfully setup a region.
> > > Upon adding the final target
> > > It's failing in check_last_peer() as pos < distance.
> > > Seems distance is 4 which makes me think it's using the wrong level of the heirarchy for
> > > some reason or that distance check is wrong.
> > > Wasn't a good idea to just skip that step though as it goes boom - though
> > > stack trace is not useful.    
> > 
> > Turns out really weird corruption happens if you accidentally back two type3 devices
> > with the same memory device. Who would have thought it :)
> > 
> > That aside ignoring the check_last_peer() failure seems to make everything work for this
> > topology.  I'm not seeing the crash, so my guess is we fixed it somewhere along the way.
> > 
> > Now for the fun one.  I've replicated the crash if we have
> > 
> > 1HB 1*RP 1SW, 4SW-DSP, 4Type3
> > 
> > Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be programmed
> > but the null pointer dereference isn't related to that.
> > 
> > The bug is straight forward.  Not all decoders have commit callbacks... Will send out
> > a possible fix shortly.
> >   
> For completeness I'm carrying this hack because I haven't gotten my head
> around the right fix for check_last_peer() failing on this test topology.
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c49d9a5f1091..275e143bd748 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>                                 rc = check_last_peer(cxled, ep, cxl_rr,
>                                                      distance);
>                                 if (rc)
> -                                       return rc;
> +                                       //      return rc;
>                                 goto out_target_set;
>                         }
>                 goto add_target;

I'm still carrying this hack and still haven't worked out the right fix.

Suggestions welcome!  If not I'll hopefully get some time on this
towards the end of the week.

Jonathan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [BUG] cxl can not create region
@ 2022-10-10 16:20                 ` Jonathan Cameron via
  0 siblings, 0 replies; 38+ messages in thread
From: Jonathan Cameron via @ 2022-10-10 16:20 UTC (permalink / raw)
  To: Jonathan Cameron via; +Cc: Dan Williams, Bobo WL, linux-cxl, qemu-arm

On Fri, 19 Aug 2022 09:46:55 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Thu, 18 Aug 2022 17:37:40 +0100
> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> 
> > On Wed, 17 Aug 2022 17:16:19 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >   
> > > On Thu, 11 Aug 2022 17:46:55 -0700
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >     
> > > > Dan Williams wrote:      
> > > > > Bobo WL wrote:        
> > > > > > Hi Dan,
> > > > > > 
> > > > > > Thanks for your reply!
> > > > > > 
> > > > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:        
> > > > > > >
> > > > > > > What is the output of:
> > > > > > >
> > > > > > >     cxl list -MDTu -d decoder0.0
> > > > > > >
> > > > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
> > > > > > > at least not in the specified order, or that validation check is broken.        
> > > > > > 
> > > > > > Command "cxl list -MDTu -d decoder0.0" output:        
> > > > > 
> > > > > Thanks for this, I think I know the problem, but will try some
> > > > > experiments with cxl_test first.        
> > > > 
> > > > Hmm, so my cxl_test experiment unfortunately passed so I'm not
> > > > reproducing the failure mode. This is the result of creating x4 region
> > > > with devices directly attached to a single host-bridge:
> > > > 
> > > > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
> > > > {
> > > >   "region":"region8",
> > > >   "resource":"0xf1f0000000",
> > > >   "size":"1024.00 MiB (1073.74 MB)",
> > > >   "interleave_ways":4,
> > > >   "interleave_granularity":256,
> > > >   "decode_state":"commit",
> > > >   "mappings":[
> > > >     {
> > > >       "position":3,
> > > >       "memdev":"mem11",
> > > >       "decoder":"decoder21.0"
> > > >     },
> > > >     {
> > > >       "position":2,
> > > >       "memdev":"mem9",
> > > >       "decoder":"decoder19.0"
> > > >     },
> > > >     {
> > > >       "position":1,
> > > >       "memdev":"mem10",
> > > >       "decoder":"decoder20.0"
> > > >     },
> > > >     {
> > > >       "position":0,
> > > >       "memdev":"mem12",
> > > >       "decoder":"decoder22.0"
> > > >     }
> > > >   ]
> > > > }
> > > > cxl region: cmd_create_region: created 1 region
> > > >       
> > > > > Did the commit_store() crash stop reproducing with latest cxl/preview
> > > > > branch?        
> > > > 
> > > > I missed the answer to this question.
> > > > 
> > > > All of these changes are now in Linus' tree perhaps give that a try and
> > > > post the debug log again?      
> > > 
> > > Hi Dan,
> > > 
> > > I've moved onto looking at this one.
> > > 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that up
> > > at some stage), 1 switch, 4 downstream switch ports each with a type 3
> > > 
> > > I'm not getting a crash, but can't successfully setup a region.
> > > Upon adding the final target
> > > It's failing in check_last_peer() as pos < distance.
> > > Seems distance is 4 which makes me think it's using the wrong level of the heirarchy for
> > > some reason or that distance check is wrong.
> > > Wasn't a good idea to just skip that step though as it goes boom - though
> > > stack trace is not useful.    
> > 
> > Turns out really weird corruption happens if you accidentally back two type3 devices
> > with the same memory device. Who would have thought it :)
> > 
> > That aside ignoring the check_last_peer() failure seems to make everything work for this
> > topology.  I'm not seeing the crash, so my guess is we fixed it somewhere along the way.
> > 
> > Now for the fun one.  I've replicated the crash if we have
> > 
> > 1HB 1*RP 1SW, 4SW-DSP, 4Type3
> > 
> > Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be programmed
> > but the null pointer dereference isn't related to that.
> > 
> > The bug is straight forward.  Not all decoders have commit callbacks... Will send out
> > a possible fix shortly.
> >   
> For completeness I'm carrying this hack because I haven't gotten my head
> around the right fix for check_last_peer() failing on this test topology.
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c49d9a5f1091..275e143bd748 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>                                 rc = check_last_peer(cxled, ep, cxl_rr,
>                                                      distance);
>                                 if (rc)
> -                                       return rc;
> +                                       //      return rc;
>                                 goto out_target_set;
>                         }
>                 goto add_target;

I'm still carrying this hack and still haven't worked out the right fix.

Suggestions welcome!  If not I'll hopefully get some time on this
towards the end of the week.

Jonathan


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-10-10 16:24 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05  2:20 [BUG] cxl can not create region Bobo WL
2022-08-08 12:37 ` Jonathan Cameron
2022-08-08 12:37   ` Jonathan Cameron via
2022-08-09 13:07   ` Bobo WL
2022-08-09 16:08     ` Jonathan Cameron
2022-08-09 16:08       ` Jonathan Cameron via
2022-08-11 17:08       ` Jonathan Cameron via
2022-08-11 17:08         ` Jonathan Cameron
2022-08-12 15:44         ` Jonathan Cameron
2022-08-12 15:44           ` Jonathan Cameron via
2022-08-12 16:03           ` Dan Williams
2022-08-12 16:15             ` Jonathan Cameron
2022-08-12 16:15               ` Jonathan Cameron via
2022-08-15 14:18               ` Jonathan Cameron
2022-08-15 14:18                 ` Jonathan Cameron via
2022-08-15 14:55                 ` Jonathan Cameron
2022-08-15 14:55                   ` Jonathan Cameron via
2022-08-15 15:07                   ` Peter Maydell
2022-08-15 17:04           ` Jonathan Cameron
2022-08-15 17:04             ` Jonathan Cameron via
2022-08-15 17:14             ` Jonathan Cameron
2022-08-15 17:14               ` Jonathan Cameron via
2022-08-15 22:55             ` Dan Williams
2022-08-17 11:25               ` Jonathan Cameron
2022-08-17 11:25                 ` Jonathan Cameron via
2022-08-08 15:58 ` Dan Williams
2022-08-09 13:12   ` Bobo WL
2022-08-09 15:17     ` Dan Williams
2022-08-11  3:10       ` Bobo WL
2022-08-12  0:46       ` Dan Williams
2022-08-17 16:16         ` Jonathan Cameron
2022-08-17 16:16           ` Jonathan Cameron via
2022-08-18 16:37           ` Jonathan Cameron
2022-08-18 16:37             ` Jonathan Cameron via
2022-08-19  8:46             ` Jonathan Cameron
2022-08-19  8:46               ` Jonathan Cameron via
2022-10-10 16:20               ` Jonathan Cameron
2022-10-10 16:20                 ` Jonathan Cameron via

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.