io_uring failure on parisc (32-bit userspace and 64-bit kernel)

All of lore.kernel.org
 help / color / mirror / Atom feed

* io_uring failure on parisc (32-bit userspace and 64-bit kernel)
@ 2023-02-12  9:47 Helge Deller
  2023-02-12 13:16 ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-12  9:47 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: John David Anglin, linux-parisc

Hi all,

We see io-uring failures on the parisc architecture with this testcase:
https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c

parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.

On a 64-bit kernel (6.1.11):
deller@parisc:~$ ./io_uring-test test.file
ret=0, wanted 4096
Submitted=4, completed=1, bytes=0
-> failure

strace shows:
io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
brk(NULL)                               = 0x4ae000
brk(0x4cf000)                           = 0x4cf000
io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0


Running the same testcase on a 32-bit kernel (6.1.11) works:
root@debian:~# ./io_uring-test test.file
Submitted=4, completed=4, bytes=16384
-> ok.

strace:
io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
brk(NULL)                               = 0x15000
brk(0x36000)                            = 0x36000
io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4

I'm happy to test any patch if someone has an idea....

Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12  9:47 io_uring failure on parisc (32-bit userspace and 64-bit kernel) Helge Deller
@ 2023-02-12 13:16 ` Jens Axboe
  2023-02-12 13:28   ` Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-12 13:16 UTC (permalink / raw)
  To: Helge Deller, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 2:47?AM, Helge Deller wrote:
> Hi all,
> 
> We see io-uring failures on the parisc architecture with this testcase:
> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
> 
> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
> 
> On a 64-bit kernel (6.1.11):
> deller@parisc:~$ ./io_uring-test test.file
> ret=0, wanted 4096
> Submitted=4, completed=1, bytes=0
> -> failure
> 
> strace shows:
> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
> brk(NULL)                               = 0x4ae000
> brk(0x4cf000)                           = 0x4cf000
> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
> 
> 
> Running the same testcase on a 32-bit kernel (6.1.11) works:
> root@debian:~# ./io_uring-test test.file
> Submitted=4, completed=4, bytes=16384
> -> ok.
> 
> strace:
> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
> brk(NULL)                               = 0x15000
> brk(0x36000)                            = 0x36000
> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
> 
> I'm happy to test any patch if someone has an idea....

No idea what this could be, to be honest. I tried your qemu vm image,
and it does boot, but it's missing keys to be able to update apt and
install packages... After fiddling with this for 30 min I gave up, any
chance you can update the sid image? Given how slow this thing is
running, it'd take me all day to do a fresh install and I have to admit
I'm not THAT motivated about parisc to do that :)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 13:16 ` Jens Axboe
@ 2023-02-12 13:28   ` Helge Deller
  2023-02-12 13:35     ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-12 13:28 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 14:16, Jens Axboe wrote:
> On 2/12/23 2:47?AM, Helge Deller wrote:
>> Hi all,
>>
>> We see io-uring failures on the parisc architecture with this testcase:
>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>
>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>
>> On a 64-bit kernel (6.1.11):
>> deller@parisc:~$ ./io_uring-test test.file
>> ret=0, wanted 4096
>> Submitted=4, completed=1, bytes=0
>> -> failure
>>
>> strace shows:
>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>> brk(NULL)                               = 0x4ae000
>> brk(0x4cf000)                           = 0x4cf000
>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>
>>
>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>> root@debian:~# ./io_uring-test test.file
>> Submitted=4, completed=4, bytes=16384
>> -> ok.
>>
>> strace:
>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>> brk(NULL)                               = 0x15000
>> brk(0x36000)                            = 0x36000
>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>
>> I'm happy to test any patch if someone has an idea....
>
> No idea what this could be, to be honest. I tried your qemu vm image,
> and it does boot, but it's missing keys to be able to update apt and
> install packages... After fiddling with this for 30 min I gave up, any
> chance you can update the sid image? Given how slow this thing is
> running, it'd take me all day to do a fresh install and I have to admit
> I'm not THAT motivated about parisc to do that :)

Yes, I will update that image, but qemu currently only supports a 32-bit PA-RISC
CPU which can only run the 32-bit kernel. So even if I update it, you won't be
able to reproduce it, as it only happens with the 64-bit kernel.
I'm sure it's some kind of missing 32-to-64bit translation in the kernel, which
triggers only big-endian machines.

Does powerpc with a 64-bit ppc64 kernel work?
I'd assume it will show the same issue.

I will try to add some printks and compare the output of 32- and 64-bit kernels.
If you have some suggestion where to add such (which?) debug code, it would help me a lot.

Thank you,
Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 13:28   ` Helge Deller
@ 2023-02-12 13:35     ` Jens Axboe
  2023-02-12 14:00       ` Jens Axboe
  2023-02-12 14:03       ` Helge Deller
  0 siblings, 2 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-12 13:35 UTC (permalink / raw)
  To: Helge Deller, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 6:28?AM, Helge Deller wrote:
> On 2/12/23 14:16, Jens Axboe wrote:
>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>> Hi all,
>>>
>>> We see io-uring failures on the parisc architecture with this testcase:
>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>
>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>
>>> On a 64-bit kernel (6.1.11):
>>> deller@parisc:~$ ./io_uring-test test.file
>>> ret=0, wanted 4096
>>> Submitted=4, completed=1, bytes=0
>>> -> failure
>>>
>>> strace shows:
>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>> brk(NULL)                               = 0x4ae000
>>> brk(0x4cf000)                           = 0x4cf000
>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>
>>>
>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>> root@debian:~# ./io_uring-test test.file
>>> Submitted=4, completed=4, bytes=16384
>>> -> ok.
>>>
>>> strace:
>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>> brk(NULL)                               = 0x15000
>>> brk(0x36000)                            = 0x36000
>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>
>>> I'm happy to test any patch if someone has an idea....
>>
>> No idea what this could be, to be honest. I tried your qemu vm image,
>> and it does boot, but it's missing keys to be able to update apt and
>> install packages... After fiddling with this for 30 min I gave up, any
>> chance you can update the sid image? Given how slow this thing is
>> running, it'd take me all day to do a fresh install and I have to admit
>> I'm not THAT motivated about parisc to do that :)
> 
> Yes, I will update that image, but qemu currently only supports a
> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
> update it, you won't be able to reproduce it, as it only happens with
> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
> translation in the kernel, which triggers only big-endian machines.

I built my own kernel for it, so that should be fine, correct? We'll see
soon enough, managed to disable enough checks on the debian-10 image to
actually make it install packages.

> Does powerpc with a 64-bit ppc64 kernel work?
> I'd assume it will show the same issue.

No idea... Only stuff I use and test on is x86-64/32 and arm64.

> I will try to add some printks and compare the output of 32- and
> 64-bit kernels. If you have some suggestion where to add such (which?)
> debug code, it would help me a lot.

I'd just try:

echo 1 > /sys/kernel/debug/tracing/events/io_uring

on both kernels and run that example. I do wonder if it's some O_DIRECT
thing, does the example work if you just remove O_DIRECT from the file
open?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 13:35     ` Jens Axboe
@ 2023-02-12 14:00       ` Jens Axboe
  2023-02-12 14:03       ` Helge Deller
  1 sibling, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-12 14:00 UTC (permalink / raw)
  To: Helge Deller, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 6:35 AM, Jens Axboe wrote:
> On 2/12/23 6:28?AM, Helge Deller wrote:
>> On 2/12/23 14:16, Jens Axboe wrote:
>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>> Hi all,
>>>>
>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>
>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>
>>>> On a 64-bit kernel (6.1.11):
>>>> deller@parisc:~$ ./io_uring-test test.file
>>>> ret=0, wanted 4096
>>>> Submitted=4, completed=1, bytes=0
>>>> -> failure
>>>>
>>>> strace shows:
>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>> brk(NULL)                               = 0x4ae000
>>>> brk(0x4cf000)                           = 0x4cf000
>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>
>>>>
>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>> root@debian:~# ./io_uring-test test.file
>>>> Submitted=4, completed=4, bytes=16384
>>>> -> ok.
>>>>
>>>> strace:
>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>> brk(NULL)                               = 0x15000
>>>> brk(0x36000)                            = 0x36000
>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>
>>>> I'm happy to test any patch if someone has an idea....
>>>
>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>> and it does boot, but it's missing keys to be able to update apt and
>>> install packages... After fiddling with this for 30 min I gave up, any
>>> chance you can update the sid image? Given how slow this thing is
>>> running, it'd take me all day to do a fresh install and I have to admit
>>> I'm not THAT motivated about parisc to do that :)
>>
>> Yes, I will update that image, but qemu currently only supports a
>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>> update it, you won't be able to reproduce it, as it only happens with
>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>> translation in the kernel, which triggers only big-endian machines.
> 
> I built my own kernel for it, so that should be fine, correct? We'll see
> soon enough, managed to disable enough checks on the debian-10 image to
> actually make it install packages.

Oh, qemu doesn't support 64-bit parisc... Totally missed that, just
had to find out for myself.

I know io_uring runs fine on s390 which is big endian iirc, and for
io_uring itself, there's no swapping or ordering going on or assumed.
So a bit puzzled on what this would be. But:

>> I will try to add some printks and compare the output of 32- and
>> 64-bit kernels. If you have some suggestion where to add such (which?)
>> debug code, it would help me a lot.
> 
> I'd just try:
> 
> echo 1 > /sys/kernel/debug/tracing/events/io_uring

This might help shed some light on it for you.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 13:35     ` Jens Axboe
  2023-02-12 14:00       ` Jens Axboe
@ 2023-02-12 14:03       ` Helge Deller
  2023-02-12 19:35         ` Helge Deller
  1 sibling, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-12 14:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 14:35, Jens Axboe wrote:
> On 2/12/23 6:28?AM, Helge Deller wrote:
>> On 2/12/23 14:16, Jens Axboe wrote:
>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>> Hi all,
>>>>
>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>
>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>
>>>> On a 64-bit kernel (6.1.11):
>>>> deller@parisc:~$ ./io_uring-test test.file
>>>> ret=0, wanted 4096
>>>> Submitted=4, completed=1, bytes=0
>>>> -> failure
>>>>
>>>> strace shows:
>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>> brk(NULL)                               = 0x4ae000
>>>> brk(0x4cf000)                           = 0x4cf000
>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>
>>>>
>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>> root@debian:~# ./io_uring-test test.file
>>>> Submitted=4, completed=4, bytes=16384
>>>> -> ok.
>>>>
>>>> strace:
>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>> brk(NULL)                               = 0x15000
>>>> brk(0x36000)                            = 0x36000
>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>
>>>> I'm happy to test any patch if someone has an idea....
>>>
>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>> and it does boot, but it's missing keys to be able to update apt and
>>> install packages... After fiddling with this for 30 min I gave up, any
>>> chance you can update the sid image? Given how slow this thing is
>>> running, it'd take me all day to do a fresh install and I have to admit
>>> I'm not THAT motivated about parisc to do that :)
>>
>> Yes, I will update that image, but qemu currently only supports a
>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>> update it, you won't be able to reproduce it, as it only happens with
>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>> translation in the kernel, which triggers only big-endian machines.
>
> I built my own kernel for it, so that should be fine, correct?

No, as qemu won't boot the 64-bit kernel.

> We'll see soon enough, managed to disable enough checks on the
> debian-10 image to actually make it install packages.
>
>> Does powerpc with a 64-bit ppc64 kernel work?
>> I'd assume it will show the same issue.
>
> No idea... Only stuff I use and test on is x86-64/32 and arm64.

Would be interesting if someone could test...

>> I will try to add some printks and compare the output of 32- and
>> 64-bit kernels. If you have some suggestion where to add such (which?)
>> debug code, it would help me a lot.
>
> I'd just try:
>
> echo 1 > /sys/kernel/debug/tracing/events/io_uring

I'll try, but will take some time...

> on both kernels and run that example. I do wonder if it's some O_DIRECT
> thing, does the example work if you just remove O_DIRECT from the file
> open?

No, still fails with O_DIRECT removed.

Thanks!
Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 14:03       ` Helge Deller
@ 2023-02-12 19:35         ` Helge Deller
  2023-02-12 19:42           ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-12 19:35 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 15:03, Helge Deller wrote:
> On 2/12/23 14:35, Jens Axboe wrote:
>> On 2/12/23 6:28?AM, Helge Deller wrote:
>>> On 2/12/23 14:16, Jens Axboe wrote:
>>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>>> Hi all,
>>>>>
>>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>>
>>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>>
>>>>> On a 64-bit kernel (6.1.11):
>>>>> deller@parisc:~$ ./io_uring-test test.file
>>>>> ret=0, wanted 4096
>>>>> Submitted=4, completed=1, bytes=0
>>>>> -> failure
>>>>>
>>>>> strace shows:
>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>>> brk(NULL)                               = 0x4ae000
>>>>> brk(0x4cf000)                           = 0x4cf000
>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>>
>>>>>
>>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>>> root@debian:~# ./io_uring-test test.file
>>>>> Submitted=4, completed=4, bytes=16384
>>>>> -> ok.
>>>>>
>>>>> strace:
>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>>> brk(NULL)                               = 0x15000
>>>>> brk(0x36000)                            = 0x36000
>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>>
>>>>> I'm happy to test any patch if someone has an idea....
>>>>
>>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>>> and it does boot, but it's missing keys to be able to update apt and
>>>> install packages... After fiddling with this for 30 min I gave up, any
>>>> chance you can update the sid image? Given how slow this thing is
>>>> running, it'd take me all day to do a fresh install and I have to admit
>>>> I'm not THAT motivated about parisc to do that :)
>>>
>>> Yes, I will update that image, but qemu currently only supports a
>>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>>> update it, you won't be able to reproduce it, as it only happens with
>>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>>> translation in the kernel, which triggers only big-endian machines.
>>
>> I built my own kernel for it, so that should be fine, correct?
>
> No, as qemu won't boot the 64-bit kernel.
>
>> We'll see soon enough, managed to disable enough checks on the
>> debian-10 image to actually make it install packages.
>>
>>> Does powerpc with a 64-bit ppc64 kernel work?
>>> I'd assume it will show the same issue.
>>
>> No idea... Only stuff I use and test on is x86-64/32 and arm64.
>
> Would be interesting if someone could test...
>
>>> I will try to add some printks and compare the output of 32- and
>>> 64-bit kernels. If you have some suggestion where to add such (which?)
>>> debug code, it would help me a lot.
>>
>> I'd just try:
>>
>> echo 1 > /sys/kernel/debug/tracing/events/io_uring
>
> I'll try, but will take some time...
>

At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
ctx->cached_sq_head is 0 in both cases.

Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 19:35         ` Helge Deller
@ 2023-02-12 19:42           ` Jens Axboe
  2023-02-12 20:01             ` Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-12 19:42 UTC (permalink / raw)
  To: Helge Deller, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 12:35?PM, Helge Deller wrote:
> On 2/12/23 15:03, Helge Deller wrote:
>> On 2/12/23 14:35, Jens Axboe wrote:
>>> On 2/12/23 6:28?AM, Helge Deller wrote:
>>>> On 2/12/23 14:16, Jens Axboe wrote:
>>>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>>>
>>>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>>>
>>>>>> On a 64-bit kernel (6.1.11):
>>>>>> deller@parisc:~$ ./io_uring-test test.file
>>>>>> ret=0, wanted 4096
>>>>>> Submitted=4, completed=1, bytes=0
>>>>>> -> failure
>>>>>>
>>>>>> strace shows:
>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>>>> brk(NULL)                               = 0x4ae000
>>>>>> brk(0x4cf000)                           = 0x4cf000
>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>>>
>>>>>>
>>>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>>>> root@debian:~# ./io_uring-test test.file
>>>>>> Submitted=4, completed=4, bytes=16384
>>>>>> -> ok.
>>>>>>
>>>>>> strace:
>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>>>> brk(NULL)                               = 0x15000
>>>>>> brk(0x36000)                            = 0x36000
>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>>>
>>>>>> I'm happy to test any patch if someone has an idea....
>>>>>
>>>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>>>> and it does boot, but it's missing keys to be able to update apt and
>>>>> install packages... After fiddling with this for 30 min I gave up, any
>>>>> chance you can update the sid image? Given how slow this thing is
>>>>> running, it'd take me all day to do a fresh install and I have to admit
>>>>> I'm not THAT motivated about parisc to do that :)
>>>>
>>>> Yes, I will update that image, but qemu currently only supports a
>>>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>>>> update it, you won't be able to reproduce it, as it only happens with
>>>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>>>> translation in the kernel, which triggers only big-endian machines.
>>>
>>> I built my own kernel for it, so that should be fine, correct?
>>
>> No, as qemu won't boot the 64-bit kernel.
>>
>>> We'll see soon enough, managed to disable enough checks on the
>>> debian-10 image to actually make it install packages.
>>>
>>>> Does powerpc with a 64-bit ppc64 kernel work?
>>>> I'd assume it will show the same issue.
>>>
>>> No idea... Only stuff I use and test on is x86-64/32 and arm64.
>>
>> Would be interesting if someone could test...
>>
>>>> I will try to add some printks and compare the output of 32- and
>>>> 64-bit kernels. If you have some suggestion where to add such (which?)
>>>> debug code, it would help me a lot.
>>>
>>> I'd just try:
>>>
>>> echo 1 > /sys/kernel/debug/tracing/events/io_uring
>>
>> I'll try, but will take some time...
>>
> 
> At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
> ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
> ctx->cached_sq_head is 0 in both cases.

cached_sq_head will get updated as sqes are consumed, but since sq.tail
is zero, there's nothing to submit as far as io_uring is concerned.

Can you dump addresses/offsets of the sq and cq heads/tails in userspace
and in the kernel? They are u32, so same size of 32 and 64-bit.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 19:42           ` Jens Axboe
@ 2023-02-12 20:01             ` Helge Deller
  2023-02-12 21:48               ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-12 20:01 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 20:42, Jens Axboe wrote:
> On 2/12/23 12:35?PM, Helge Deller wrote:
>> On 2/12/23 15:03, Helge Deller wrote:
>>> On 2/12/23 14:35, Jens Axboe wrote:
>>>> On 2/12/23 6:28?AM, Helge Deller wrote:
>>>>> On 2/12/23 14:16, Jens Axboe wrote:
>>>>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>>>>
>>>>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>>>>
>>>>>>> On a 64-bit kernel (6.1.11):
>>>>>>> deller@parisc:~$ ./io_uring-test test.file
>>>>>>> ret=0, wanted 4096
>>>>>>> Submitted=4, completed=1, bytes=0
>>>>>>> -> failure
>>>>>>>
>>>>>>> strace shows:
>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>>>>> brk(NULL)                               = 0x4ae000
>>>>>>> brk(0x4cf000)                           = 0x4cf000
>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>>>>
>>>>>>>
>>>>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>>>>> root@debian:~# ./io_uring-test test.file
>>>>>>> Submitted=4, completed=4, bytes=16384
>>>>>>> -> ok.
>>>>>>>
>>>>>>> strace:
>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>>>>> brk(NULL)                               = 0x15000
>>>>>>> brk(0x36000)                            = 0x36000
>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>>>>
>>>>>>> I'm happy to test any patch if someone has an idea....
>>>>>>
>>>>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>>>>> and it does boot, but it's missing keys to be able to update apt and
>>>>>> install packages... After fiddling with this for 30 min I gave up, any
>>>>>> chance you can update the sid image? Given how slow this thing is
>>>>>> running, it'd take me all day to do a fresh install and I have to admit
>>>>>> I'm not THAT motivated about parisc to do that :)
>>>>>
>>>>> Yes, I will update that image, but qemu currently only supports a
>>>>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>>>>> update it, you won't be able to reproduce it, as it only happens with
>>>>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>>>>> translation in the kernel, which triggers only big-endian machines.
>>>>
>>>> I built my own kernel for it, so that should be fine, correct?
>>>
>>> No, as qemu won't boot the 64-bit kernel.
>>>
>>>> We'll see soon enough, managed to disable enough checks on the
>>>> debian-10 image to actually make it install packages.
>>>>
>>>>> Does powerpc with a 64-bit ppc64 kernel work?
>>>>> I'd assume it will show the same issue.
>>>>
>>>> No idea... Only stuff I use and test on is x86-64/32 and arm64.
>>>
>>> Would be interesting if someone could test...
>>>
>>>>> I will try to add some printks and compare the output of 32- and
>>>>> 64-bit kernels. If you have some suggestion where to add such (which?)
>>>>> debug code, it would help me a lot.
>>>>
>>>> I'd just try:
>>>>
>>>> echo 1 > /sys/kernel/debug/tracing/events/io_uring
>>>
>>> I'll try, but will take some time...
>>>
>>
>> At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
>> ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
>> ctx->cached_sq_head is 0 in both cases.
>
> cached_sq_head will get updated as sqes are consumed, but since sq.tail
> is zero, there's nothing to submit as far as io_uring is concerned.
>
> Can you dump addresses/offsets of the sq and cq heads/tails in userspace
> and in the kernel? They are u32, so same size of 32 and 64-bit.

For both kernels (32- and 64-bit) I get:
p->sq_off.head = 0  p->sq_off.tail = 16
p->cq_off.head = 32  p->cq_off.tail = 48

Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 20:01             ` Helge Deller
@ 2023-02-12 21:48               ` Jens Axboe
  2023-02-12 22:20                 ` Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-12 21:48 UTC (permalink / raw)
  To: Helge Deller, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 1:01?PM, Helge Deller wrote:
> On 2/12/23 20:42, Jens Axboe wrote:
>> On 2/12/23 12:35?PM, Helge Deller wrote:
>>> On 2/12/23 15:03, Helge Deller wrote:
>>>> On 2/12/23 14:35, Jens Axboe wrote:
>>>>> On 2/12/23 6:28?AM, Helge Deller wrote:
>>>>>> On 2/12/23 14:16, Jens Axboe wrote:
>>>>>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>>>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>>>>>
>>>>>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>>>>>
>>>>>>>> On a 64-bit kernel (6.1.11):
>>>>>>>> deller@parisc:~$ ./io_uring-test test.file
>>>>>>>> ret=0, wanted 4096
>>>>>>>> Submitted=4, completed=1, bytes=0
>>>>>>>> -> failure
>>>>>>>>
>>>>>>>> strace shows:
>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>>>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>>>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>>>>>> brk(NULL)                               = 0x4ae000
>>>>>>>> brk(0x4cf000)                           = 0x4cf000
>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>>>>>
>>>>>>>>
>>>>>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>>>>>> root@debian:~# ./io_uring-test test.file
>>>>>>>> Submitted=4, completed=4, bytes=16384
>>>>>>>> -> ok.
>>>>>>>>
>>>>>>>> strace:
>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>>>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>>>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>>>>>> brk(NULL)                               = 0x15000
>>>>>>>> brk(0x36000)                            = 0x36000
>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>>>>>
>>>>>>>> I'm happy to test any patch if someone has an idea....
>>>>>>>
>>>>>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>>>>>> and it does boot, but it's missing keys to be able to update apt and
>>>>>>> install packages... After fiddling with this for 30 min I gave up, any
>>>>>>> chance you can update the sid image? Given how slow this thing is
>>>>>>> running, it'd take me all day to do a fresh install and I have to admit
>>>>>>> I'm not THAT motivated about parisc to do that :)
>>>>>>
>>>>>> Yes, I will update that image, but qemu currently only supports a
>>>>>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>>>>>> update it, you won't be able to reproduce it, as it only happens with
>>>>>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>>>>>> translation in the kernel, which triggers only big-endian machines.
>>>>>
>>>>> I built my own kernel for it, so that should be fine, correct?
>>>>
>>>> No, as qemu won't boot the 64-bit kernel.
>>>>
>>>>> We'll see soon enough, managed to disable enough checks on the
>>>>> debian-10 image to actually make it install packages.
>>>>>
>>>>>> Does powerpc with a 64-bit ppc64 kernel work?
>>>>>> I'd assume it will show the same issue.
>>>>>
>>>>> No idea... Only stuff I use and test on is x86-64/32 and arm64.
>>>>
>>>> Would be interesting if someone could test...
>>>>
>>>>>> I will try to add some printks and compare the output of 32- and
>>>>>> 64-bit kernels. If you have some suggestion where to add such (which?)
>>>>>> debug code, it would help me a lot.
>>>>>
>>>>> I'd just try:
>>>>>
>>>>> echo 1 > /sys/kernel/debug/tracing/events/io_uring
>>>>
>>>> I'll try, but will take some time...
>>>>
>>>
>>> At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
>>> ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
>>> ctx->cached_sq_head is 0 in both cases.
>>
>> cached_sq_head will get updated as sqes are consumed, but since sq.tail
>> is zero, there's nothing to submit as far as io_uring is concerned.
>>
>> Can you dump addresses/offsets of the sq and cq heads/tails in userspace
>> and in the kernel? They are u32, so same size of 32 and 64-bit.
> 
> For both kernels (32- and 64-bit) I get:
> p->sq_off.head = 0  p->sq_off.tail = 16
> p->cq_off.head = 32  p->cq_off.tail = 48

So all that looks as expected. Is it perhaps some mmap thing on 64-bit
kernels? The kernel isn't seeing the updates. You could add the below
debugging, and keep your kernel side stuff. Sounds like they don't quite
agree.


diff --git a/examples/io_uring-test.c b/examples/io_uring-test.c
index 1a685360bff6..f1cfda90c018 100644
--- a/examples/io_uring-test.c
+++ b/examples/io_uring-test.c
@@ -73,7 +73,9 @@ int main(int argc, char *argv[])
 			break;
 	} while (1);
 
+	printf("pre-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
 	ret = io_uring_submit(&ring);
+	printf("post-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
 	if (ret < 0) {
 		fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
 		return 1;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 21:48               ` Jens Axboe
@ 2023-02-12 22:20                 ` Helge Deller
  2023-02-12 22:31                   ` Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-12 22:20 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 22:48, Jens Axboe wrote:
> On 2/12/23 1:01?PM, Helge Deller wrote:
>> On 2/12/23 20:42, Jens Axboe wrote:
>>> On 2/12/23 12:35?PM, Helge Deller wrote:
>>>> On 2/12/23 15:03, Helge Deller wrote:
>>>>> On 2/12/23 14:35, Jens Axboe wrote:
>>>>>> On 2/12/23 6:28?AM, Helge Deller wrote:
>>>>>>> On 2/12/23 14:16, Jens Axboe wrote:
>>>>>>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>>>>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>>>>>>
>>>>>>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>>>>>>
>>>>>>>>> On a 64-bit kernel (6.1.11):
>>>>>>>>> deller@parisc:~$ ./io_uring-test test.file
>>>>>>>>> ret=0, wanted 4096
>>>>>>>>> Submitted=4, completed=1, bytes=0
>>>>>>>>> -> failure
>>>>>>>>>
>>>>>>>>> strace shows:
>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>>>>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>>>>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>>>>>>> brk(NULL)                               = 0x4ae000
>>>>>>>>> brk(0x4cf000)                           = 0x4cf000
>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>>>>>>> root@debian:~# ./io_uring-test test.file
>>>>>>>>> Submitted=4, completed=4, bytes=16384
>>>>>>>>> -> ok.
>>>>>>>>>
>>>>>>>>> strace:
>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>>>>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>>>>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>>>>>>> brk(NULL)                               = 0x15000
>>>>>>>>> brk(0x36000)                            = 0x36000
>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>>>>>>
>>>>>>>>> I'm happy to test any patch if someone has an idea....
>>>>>>>>
>>>>>>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>>>>>>> and it does boot, but it's missing keys to be able to update apt and
>>>>>>>> install packages... After fiddling with this for 30 min I gave up, any
>>>>>>>> chance you can update the sid image? Given how slow this thing is
>>>>>>>> running, it'd take me all day to do a fresh install and I have to admit
>>>>>>>> I'm not THAT motivated about parisc to do that :)
>>>>>>>
>>>>>>> Yes, I will update that image, but qemu currently only supports a
>>>>>>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>>>>>>> update it, you won't be able to reproduce it, as it only happens with
>>>>>>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>>>>>>> translation in the kernel, which triggers only big-endian machines.
>>>>>>
>>>>>> I built my own kernel for it, so that should be fine, correct?
>>>>>
>>>>> No, as qemu won't boot the 64-bit kernel.
>>>>>
>>>>>> We'll see soon enough, managed to disable enough checks on the
>>>>>> debian-10 image to actually make it install packages.
>>>>>>
>>>>>>> Does powerpc with a 64-bit ppc64 kernel work?
>>>>>>> I'd assume it will show the same issue.
>>>>>>
>>>>>> No idea... Only stuff I use and test on is x86-64/32 and arm64.
>>>>>
>>>>> Would be interesting if someone could test...
>>>>>
>>>>>>> I will try to add some printks and compare the output of 32- and
>>>>>>> 64-bit kernels. If you have some suggestion where to add such (which?)
>>>>>>> debug code, it would help me a lot.
>>>>>>
>>>>>> I'd just try:
>>>>>>
>>>>>> echo 1 > /sys/kernel/debug/tracing/events/io_uring
>>>>>
>>>>> I'll try, but will take some time...
>>>>>
>>>>
>>>> At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
>>>> ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
>>>> ctx->cached_sq_head is 0 in both cases.
>>>
>>> cached_sq_head will get updated as sqes are consumed, but since sq.tail
>>> is zero, there's nothing to submit as far as io_uring is concerned.
>>>
>>> Can you dump addresses/offsets of the sq and cq heads/tails in userspace
>>> and in the kernel? They are u32, so same size of 32 and 64-bit.
>>
>> For both kernels (32- and 64-bit) I get:
>> p->sq_off.head = 0  p->sq_off.tail = 16
>> p->cq_off.head = 32  p->cq_off.tail = 48
>
> So all that looks as expected. Is it perhaps some mmap thing on 64-bit
> kernels? The kernel isn't seeing the updates. You could add the below
> debugging, and keep your kernel side stuff. Sounds like they don't quite
> agree.
>
>
> diff --git a/examples/io_uring-test.c b/examples/io_uring-test.c
> index 1a685360bff6..f1cfda90c018 100644
> --- a/examples/io_uring-test.c
> +++ b/examples/io_uring-test.c
> @@ -73,7 +73,9 @@ int main(int argc, char *argv[])
>   			break;
>   	} while (1);
>
> +	printf("pre-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>   	ret = io_uring_submit(&ring);
> +	printf("post-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>   	if (ret < 0) {
>   		fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
>   		return 1;

Result is:
pre-submit sq head/tail 0/0, 0/4
..
post-submit sq head/tail 0/4, 4/4

Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 22:20                 ` Helge Deller
@ 2023-02-12 22:31                   ` Helge Deller
  2023-02-13 16:15                     ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-12 22:31 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 23:20, Helge Deller wrote:
> On 2/12/23 22:48, Jens Axboe wrote:
>> On 2/12/23 1:01?PM, Helge Deller wrote:
>>> On 2/12/23 20:42, Jens Axboe wrote:
>>>> On 2/12/23 12:35?PM, Helge Deller wrote:
>>>>> On 2/12/23 15:03, Helge Deller wrote:
>>>>>> On 2/12/23 14:35, Jens Axboe wrote:
>>>>>>> On 2/12/23 6:28?AM, Helge Deller wrote:
>>>>>>>> On 2/12/23 14:16, Jens Axboe wrote:
>>>>>>>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>>>>>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>>>>>>>
>>>>>>>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>>>>>>>
>>>>>>>>>> On a 64-bit kernel (6.1.11):
>>>>>>>>>> deller@parisc:~$ ./io_uring-test test.file
>>>>>>>>>> ret=0, wanted 4096
>>>>>>>>>> Submitted=4, completed=1, bytes=0
>>>>>>>>>> -> failure
>>>>>>>>>>
>>>>>>>>>> strace shows:
>>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>>>>>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>>>>>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>>>>>>>> brk(NULL)                               = 0x4ae000
>>>>>>>>>> brk(0x4cf000)                           = 0x4cf000
>>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>>>>>>>> root@debian:~# ./io_uring-test test.file
>>>>>>>>>> Submitted=4, completed=4, bytes=16384
>>>>>>>>>> -> ok.
>>>>>>>>>>
>>>>>>>>>> strace:
>>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>>>>>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>>>>>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>>>>>>>> brk(NULL)                               = 0x15000
>>>>>>>>>> brk(0x36000)                            = 0x36000
>>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>>>>>>>
>>>>>>>>>> I'm happy to test any patch if someone has an idea....
>>>>>>>>>
>>>>>>>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>>>>>>>> and it does boot, but it's missing keys to be able to update apt and
>>>>>>>>> install packages... After fiddling with this for 30 min I gave up, any
>>>>>>>>> chance you can update the sid image? Given how slow this thing is
>>>>>>>>> running, it'd take me all day to do a fresh install and I have to admit
>>>>>>>>> I'm not THAT motivated about parisc to do that :)
>>>>>>>>
>>>>>>>> Yes, I will update that image, but qemu currently only supports a
>>>>>>>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>>>>>>>> update it, you won't be able to reproduce it, as it only happens with
>>>>>>>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>>>>>>>> translation in the kernel, which triggers only big-endian machines.
>>>>>>>
>>>>>>> I built my own kernel for it, so that should be fine, correct?
>>>>>>
>>>>>> No, as qemu won't boot the 64-bit kernel.
>>>>>>
>>>>>>> We'll see soon enough, managed to disable enough checks on the
>>>>>>> debian-10 image to actually make it install packages.
>>>>>>>
>>>>>>>> Does powerpc with a 64-bit ppc64 kernel work?
>>>>>>>> I'd assume it will show the same issue.
>>>>>>>
>>>>>>> No idea... Only stuff I use and test on is x86-64/32 and arm64.
>>>>>>
>>>>>> Would be interesting if someone could test...
>>>>>>
>>>>>>>> I will try to add some printks and compare the output of 32- and
>>>>>>>> 64-bit kernels. If you have some suggestion where to add such (which?)
>>>>>>>> debug code, it would help me a lot.
>>>>>>>
>>>>>>> I'd just try:
>>>>>>>
>>>>>>> echo 1 > /sys/kernel/debug/tracing/events/io_uring
>>>>>>
>>>>>> I'll try, but will take some time...
>>>>>>
>>>>>
>>>>> At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
>>>>> ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
>>>>> ctx->cached_sq_head is 0 in both cases.
>>>>
>>>> cached_sq_head will get updated as sqes are consumed, but since sq.tail
>>>> is zero, there's nothing to submit as far as io_uring is concerned.
>>>>
>>>> Can you dump addresses/offsets of the sq and cq heads/tails in userspace
>>>> and in the kernel? They are u32, so same size of 32 and 64-bit.
>>>
>>> For both kernels (32- and 64-bit) I get:
>>> p->sq_off.head = 0  p->sq_off.tail = 16
>>> p->cq_off.head = 32  p->cq_off.tail = 48
>>
>> So all that looks as expected. Is it perhaps some mmap thing on 64-bit
>> kernels? The kernel isn't seeing the updates. You could add the below
>> debugging, and keep your kernel side stuff. Sounds like they don't quite
>> agree.
>>
>>
>> diff --git a/examples/io_uring-test.c b/examples/io_uring-test.c
>> index 1a685360bff6..f1cfda90c018 100644
>> --- a/examples/io_uring-test.c
>> +++ b/examples/io_uring-test.c
>> @@ -73,7 +73,9 @@ int main(int argc, char *argv[])
>>               break;
>>       } while (1);
>>
>> +    printf("pre-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>>       ret = io_uring_submit(&ring);
>> +    printf("post-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>>       if (ret < 0) {
>>           fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
>>           return 1;
>
> Result is:
> pre-submit sq head/tail 0/0, 0/4
> ..
> post-submit sq head/tail 0/4, 4/4

I have to correct myself!
The problem exists on both, 32- and 64-bit kernels.
My current testing with 32-bit kernel was on qemu, but after booting the same kernel
on a physical box, I see the testcase failing on the 32-bit kernel too.

So, probably some cache-flushing / alias-handling is needed....
This is a 1-CPU box, so SMP isn't involved.

Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-12 22:31                   ` Helge Deller
@ 2023-02-13 16:15                     ` Jens Axboe
  2023-02-13 20:59                       ` Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-13 16:15 UTC (permalink / raw)
  To: Helge Deller, io-uring; +Cc: John David Anglin, linux-parisc

On 2/12/23 3:31?PM, Helge Deller wrote:
> On 2/12/23 23:20, Helge Deller wrote:
>> On 2/12/23 22:48, Jens Axboe wrote:
>>> On 2/12/23 1:01?PM, Helge Deller wrote:
>>>> On 2/12/23 20:42, Jens Axboe wrote:
>>>>> On 2/12/23 12:35?PM, Helge Deller wrote:
>>>>>> On 2/12/23 15:03, Helge Deller wrote:
>>>>>>> On 2/12/23 14:35, Jens Axboe wrote:
>>>>>>>> On 2/12/23 6:28?AM, Helge Deller wrote:
>>>>>>>>> On 2/12/23 14:16, Jens Axboe wrote:
>>>>>>>>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>>>>>>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>>>>>>>>
>>>>>>>>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>>>>>>>>
>>>>>>>>>>> On a 64-bit kernel (6.1.11):
>>>>>>>>>>> deller@parisc:~$ ./io_uring-test test.file
>>>>>>>>>>> ret=0, wanted 4096
>>>>>>>>>>> Submitted=4, completed=1, bytes=0
>>>>>>>>>>> -> failure
>>>>>>>>>>>
>>>>>>>>>>> strace shows:
>>>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>>>>>>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>>>>>>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>>>>>>>>> brk(NULL)                               = 0x4ae000
>>>>>>>>>>> brk(0x4cf000)                           = 0x4cf000
>>>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>>>>>>>>> root@debian:~# ./io_uring-test test.file
>>>>>>>>>>> Submitted=4, completed=4, bytes=16384
>>>>>>>>>>> -> ok.
>>>>>>>>>>>
>>>>>>>>>>> strace:
>>>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>>>>>>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>>>>>>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>>>>>>>>> brk(NULL)                               = 0x15000
>>>>>>>>>>> brk(0x36000)                            = 0x36000
>>>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>>>>>>>>
>>>>>>>>>>> I'm happy to test any patch if someone has an idea....
>>>>>>>>>>
>>>>>>>>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>>>>>>>>> and it does boot, but it's missing keys to be able to update apt and
>>>>>>>>>> install packages... After fiddling with this for 30 min I gave up, any
>>>>>>>>>> chance you can update the sid image? Given how slow this thing is
>>>>>>>>>> running, it'd take me all day to do a fresh install and I have to admit
>>>>>>>>>> I'm not THAT motivated about parisc to do that :)
>>>>>>>>>
>>>>>>>>> Yes, I will update that image, but qemu currently only supports a
>>>>>>>>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>>>>>>>>> update it, you won't be able to reproduce it, as it only happens with
>>>>>>>>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>>>>>>>>> translation in the kernel, which triggers only big-endian machines.
>>>>>>>>
>>>>>>>> I built my own kernel for it, so that should be fine, correct?
>>>>>>>
>>>>>>> No, as qemu won't boot the 64-bit kernel.
>>>>>>>
>>>>>>>> We'll see soon enough, managed to disable enough checks on the
>>>>>>>> debian-10 image to actually make it install packages.
>>>>>>>>
>>>>>>>>> Does powerpc with a 64-bit ppc64 kernel work?
>>>>>>>>> I'd assume it will show the same issue.
>>>>>>>>
>>>>>>>> No idea... Only stuff I use and test on is x86-64/32 and arm64.
>>>>>>>
>>>>>>> Would be interesting if someone could test...
>>>>>>>
>>>>>>>>> I will try to add some printks and compare the output of 32- and
>>>>>>>>> 64-bit kernels. If you have some suggestion where to add such (which?)
>>>>>>>>> debug code, it would help me a lot.
>>>>>>>>
>>>>>>>> I'd just try:
>>>>>>>>
>>>>>>>> echo 1 > /sys/kernel/debug/tracing/events/io_uring
>>>>>>>
>>>>>>> I'll try, but will take some time...
>>>>>>>
>>>>>>
>>>>>> At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
>>>>>> ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
>>>>>> ctx->cached_sq_head is 0 in both cases.
>>>>>
>>>>> cached_sq_head will get updated as sqes are consumed, but since sq.tail
>>>>> is zero, there's nothing to submit as far as io_uring is concerned.
>>>>>
>>>>> Can you dump addresses/offsets of the sq and cq heads/tails in userspace
>>>>> and in the kernel? They are u32, so same size of 32 and 64-bit.
>>>>
>>>> For both kernels (32- and 64-bit) I get:
>>>> p->sq_off.head = 0  p->sq_off.tail = 16
>>>> p->cq_off.head = 32  p->cq_off.tail = 48
>>>
>>> So all that looks as expected. Is it perhaps some mmap thing on 64-bit
>>> kernels? The kernel isn't seeing the updates. You could add the below
>>> debugging, and keep your kernel side stuff. Sounds like they don't quite
>>> agree.
>>>
>>>
>>> diff --git a/examples/io_uring-test.c b/examples/io_uring-test.c
>>> index 1a685360bff6..f1cfda90c018 100644
>>> --- a/examples/io_uring-test.c
>>> +++ b/examples/io_uring-test.c
>>> @@ -73,7 +73,9 @@ int main(int argc, char *argv[])
>>>               break;
>>>       } while (1);
>>>
>>> +    printf("pre-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>>>       ret = io_uring_submit(&ring);
>>> +    printf("post-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>>>       if (ret < 0) {
>>>           fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
>>>           return 1;
>>
>> Result is:
>> pre-submit sq head/tail 0/0, 0/4
>> ..
>> post-submit sq head/tail 0/4, 4/4
> 
> I have to correct myself!
> The problem exists on both, 32- and 64-bit kernels.
> My current testing with 32-bit kernel was on qemu, but after booting the same kernel
> on a physical box, I see the testcase failing on the 32-bit kernel too.
> 
> So, probably some cache-flushing / alias-handling is needed....
> This is a 1-CPU box, so SMP isn't involved.

Yep sounds like it. What's the caching architecture of parisc? Something
like this perhaps, but may not be complete as we'd need to do something
for the cqe writes too I think, not just the cq tail.

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index db623b3185c8..ab0d1297bb0e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2338,6 +2338,8 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
 	unsigned head, mask = ctx->sq_entries - 1;
 	unsigned sq_idx = ctx->cached_sq_head++ & mask;
 
+	flush_dcache_page(virt_to_page(ctx->sq_array + sq_idx));
+
 	/*
 	 * The cached sq head (or cq tail) serves two purposes:
 	 *
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index ab4b2a1c3b7e..b132f44a9364 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -220,6 +220,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
 {
 	/* order cqe stores with ring update */
 	smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
+	flush_dcache_page(virt_to_page(ctx->rings));
 }
 
 /* requires smb_mb() prior, see wq_has_sleeper() */

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-13 16:15                     ` Jens Axboe
@ 2023-02-13 20:59                       ` Helge Deller
  2023-02-13 21:05                         ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-13 20:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: John David Anglin, linux-parisc

On 2/13/23 17:15, Jens Axboe wrote:
> On 2/12/23 3:31?PM, Helge Deller wrote:
>> On 2/12/23 23:20, Helge Deller wrote:
>>> On 2/12/23 22:48, Jens Axboe wrote:
>>>> On 2/12/23 1:01?PM, Helge Deller wrote:
>>>>> On 2/12/23 20:42, Jens Axboe wrote:
>>>>>> On 2/12/23 12:35?PM, Helge Deller wrote:
>>>>>>> On 2/12/23 15:03, Helge Deller wrote:
>>>>>>>> On 2/12/23 14:35, Jens Axboe wrote:
>>>>>>>>> On 2/12/23 6:28?AM, Helge Deller wrote:
>>>>>>>>>> On 2/12/23 14:16, Jens Axboe wrote:
>>>>>>>>>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>>>>>>>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>>>>>>>>>
>>>>>>>>>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>>>>>>>>>
>>>>>>>>>>>> On a 64-bit kernel (6.1.11):
>>>>>>>>>>>> deller@parisc:~$ ./io_uring-test test.file
>>>>>>>>>>>> ret=0, wanted 4096
>>>>>>>>>>>> Submitted=4, completed=1, bytes=0
>>>>>>>>>>>> -> failure
>>>>>>>>>>>>
>>>>>>>>>>>> strace shows:
>>>>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>>>>>>>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>>>>>>>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>>>>>>>>>> brk(NULL)                               = 0x4ae000
>>>>>>>>>>>> brk(0x4cf000)                           = 0x4cf000
>>>>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>>>>>>>>>> root@debian:~# ./io_uring-test test.file
>>>>>>>>>>>> Submitted=4, completed=4, bytes=16384
>>>>>>>>>>>> -> ok.
>>>>>>>>>>>>
>>>>>>>>>>>> strace:
>>>>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>>>>>>>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>>>>>>>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>>>>>>>>>> brk(NULL)                               = 0x15000
>>>>>>>>>>>> brk(0x36000)                            = 0x36000
>>>>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>>>>>>>>>
>>>>>>>>>>>> I'm happy to test any patch if someone has an idea....
>>>>>>>>>>>
>>>>>>>>>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>>>>>>>>>> and it does boot, but it's missing keys to be able to update apt and
>>>>>>>>>>> install packages... After fiddling with this for 30 min I gave up, any
>>>>>>>>>>> chance you can update the sid image? Given how slow this thing is
>>>>>>>>>>> running, it'd take me all day to do a fresh install and I have to admit
>>>>>>>>>>> I'm not THAT motivated about parisc to do that :)
>>>>>>>>>>
>>>>>>>>>> Yes, I will update that image, but qemu currently only supports a
>>>>>>>>>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>>>>>>>>>> update it, you won't be able to reproduce it, as it only happens with
>>>>>>>>>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>>>>>>>>>> translation in the kernel, which triggers only big-endian machines.
>>>>>>>>>
>>>>>>>>> I built my own kernel for it, so that should be fine, correct?
>>>>>>>>
>>>>>>>> No, as qemu won't boot the 64-bit kernel.
>>>>>>>>
>>>>>>>>> We'll see soon enough, managed to disable enough checks on the
>>>>>>>>> debian-10 image to actually make it install packages.
>>>>>>>>>
>>>>>>>>>> Does powerpc with a 64-bit ppc64 kernel work?
>>>>>>>>>> I'd assume it will show the same issue.
>>>>>>>>>
>>>>>>>>> No idea... Only stuff I use and test on is x86-64/32 and arm64.
>>>>>>>>
>>>>>>>> Would be interesting if someone could test...
>>>>>>>>
>>>>>>>>>> I will try to add some printks and compare the output of 32- and
>>>>>>>>>> 64-bit kernels. If you have some suggestion where to add such (which?)
>>>>>>>>>> debug code, it would help me a lot.
>>>>>>>>>
>>>>>>>>> I'd just try:
>>>>>>>>>
>>>>>>>>> echo 1 > /sys/kernel/debug/tracing/events/io_uring
>>>>>>>>
>>>>>>>> I'll try, but will take some time...
>>>>>>>>
>>>>>>>
>>>>>>> At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
>>>>>>> ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
>>>>>>> ctx->cached_sq_head is 0 in both cases.
>>>>>>
>>>>>> cached_sq_head will get updated as sqes are consumed, but since sq.tail
>>>>>> is zero, there's nothing to submit as far as io_uring is concerned.
>>>>>>
>>>>>> Can you dump addresses/offsets of the sq and cq heads/tails in userspace
>>>>>> and in the kernel? They are u32, so same size of 32 and 64-bit.
>>>>>
>>>>> For both kernels (32- and 64-bit) I get:
>>>>> p->sq_off.head = 0  p->sq_off.tail = 16
>>>>> p->cq_off.head = 32  p->cq_off.tail = 48
>>>>
>>>> So all that looks as expected. Is it perhaps some mmap thing on 64-bit
>>>> kernels? The kernel isn't seeing the updates. You could add the below
>>>> debugging, and keep your kernel side stuff. Sounds like they don't quite
>>>> agree.
>>>>
>>>>
>>>> diff --git a/examples/io_uring-test.c b/examples/io_uring-test.c
>>>> index 1a685360bff6..f1cfda90c018 100644
>>>> --- a/examples/io_uring-test.c
>>>> +++ b/examples/io_uring-test.c
>>>> @@ -73,7 +73,9 @@ int main(int argc, char *argv[])
>>>>                break;
>>>>        } while (1);
>>>>
>>>> +    printf("pre-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>>>>        ret = io_uring_submit(&ring);
>>>> +    printf("post-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>>>>        if (ret < 0) {
>>>>            fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
>>>>            return 1;
>>>
>>> Result is:
>>> pre-submit sq head/tail 0/0, 0/4
>>> ..
>>> post-submit sq head/tail 0/4, 4/4
>>
>> I have to correct myself!
>> The problem exists on both, 32- and 64-bit kernels.
>> My current testing with 32-bit kernel was on qemu, but after booting the same kernel
>> on a physical box, I see the testcase failing on the 32-bit kernel too.
>>
>> So, probably some cache-flushing / alias-handling is needed....
>> This is a 1-CPU box, so SMP isn't involved.
>
> Yep sounds like it. What's the caching architecture of parisc?

parisc is Virtually Indexed, Physically Tagged (VIPT).


> Something
> like this perhaps, but may not be complete as we'd need to do something
> for the cqe writes too I think, not just the cq tail.
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index db623b3185c8..ab0d1297bb0e 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2338,6 +2338,8 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
>   	unsigned head, mask = ctx->sq_entries - 1;
>   	unsigned sq_idx = ctx->cached_sq_head++ & mask;
>
> +	flush_dcache_page(virt_to_page(ctx->sq_array + sq_idx));
> +
>   	/*
>   	 * The cached sq head (or cq tail) serves two purposes:
>   	 *
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index ab4b2a1c3b7e..b132f44a9364 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -220,6 +220,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
>   {
>   	/* order cqe stores with ring update */
>   	smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
> +	flush_dcache_page(virt_to_page(ctx->rings));
>   }
>
>   /* requires smb_mb() prior, see wq_has_sleeper() */

Thanks for the patch!
Sadly it doesn't fix the problem, as the kernel still sees
ctx->rings->sq.tail as being 0.
Interestingly it worked once (not reproduceable) directly after bootup,
which indicates that we at least look at the right address from kernel side.

So, still needs more debugging/testing.

Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-13 20:59                       ` Helge Deller
@ 2023-02-13 21:05                         ` Jens Axboe
  2023-02-13 22:05                           ` Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-13 21:05 UTC (permalink / raw)
  To: Helge Deller, io-uring; +Cc: John David Anglin, linux-parisc

On 2/13/23 1:59?PM, Helge Deller wrote:
>> Yep sounds like it. What's the caching architecture of parisc?
> 
> parisc is Virtually Indexed, Physically Tagged (VIPT).

That's what I assumed, so virtual aliasing is what we're dealing with
here.

> Thanks for the patch!
> Sadly it doesn't fix the problem, as the kernel still sees
> ctx->rings->sq.tail as being 0.
> Interestingly it worked once (not reproduceable) directly after bootup,
> which indicates that we at least look at the right address from kernel side.
> 
> So, still needs more debugging/testing.

It's not like this is untested stuff, so yeah it'll generally be
correct, it just seems that parisc is a bit odd in that the virtual
aliasing occurs between the kernel and userspace addresses too. At least
that's what it seems like.

But I wonder if what needs flushing is the user side, not the kernel
side? Either that, or my patch is not flushing the right thing on the
kernel side.

Is it possible to flush it from the userspace side? Presumable that's
what we'd need on the sqe side, and then the kernel side for the cqe
filling. So probably the patch is half-way correct :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-13 21:05                         ` Jens Axboe
@ 2023-02-13 22:05                           ` Helge Deller
  2023-02-13 22:50                             ` John David Anglin
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-13 22:05 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: John David Anglin, linux-parisc

On 2/13/23 22:05, Jens Axboe wrote:
> On 2/13/23 1:59?PM, Helge Deller wrote:
>>> Yep sounds like it. What's the caching architecture of parisc?
>>
>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>
> That's what I assumed, so virtual aliasing is what we're dealing with
> here.
>
>> Thanks for the patch!
>> Sadly it doesn't fix the problem, as the kernel still sees
>> ctx->rings->sq.tail as being 0.
>> Interestingly it worked once (not reproduceable) directly after bootup,
>> which indicates that we at least look at the right address from kernel side.
>>
>> So, still needs more debugging/testing.
>
> It's not like this is untested stuff, so yeah it'll generally be
> correct, it just seems that parisc is a bit odd in that the virtual
> aliasing occurs between the kernel and userspace addresses too. At least
> that's what it seems like.

True.

> But I wonder if what needs flushing is the user side, not the kernel
> side? Either that, or my patch is not flushing the right thing on the
> kernel side.
>
> Is it possible to flush it from the userspace side? Presumable that's
> what we'd need on the sqe side, and then the kernel side for the cqe
> filling. So probably the patch is half-way correct :-)

I hacked up in __io_uring_flush_sq() in liburing/src/queue.c this code
(which I hope is correct):
                 if (!(ring->flags & IORING_SETUP_SQPOLL))
                         IO_URING_WRITE_ONCE(*sq->ktail, tail);
                 else
                         io_uring_smp_store_release(sq->ktail, tail);
         } /* ADDED: */
         { int i;  unsigned long p = (unsigned long)sq->ktail & ~(4096-1);
           fprintf(stderr, "FLUSH CACHE OF PAGE %lx\n", p);
           for (i=0; i < 4096; i += 8)
                 asm volatile("fdc 0(%0)" : : "r" (p+i));
         }

The kernel sometimes sees the tail value now (it fails afterwards, but that's ok for now).
But I'm not sure yet if this is really the effect of the fdc (flush data cache instruction),
or pure luck because the aliasing of the userspace address and kernel address matches in
a sucessful run.
For me it seems as it's the aliasing which makes it work sometimes.

In this regard I wonder why we don't provide the cacheflush syscall on parisc....

Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
  2023-02-13 22:05                           ` Helge Deller
@ 2023-02-13 22:50                             ` John David Anglin
  2023-02-14 23:09                               ` io_uring failure on parisc with VIPT caches Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: John David Anglin @ 2023-02-13 22:50 UTC (permalink / raw)
  To: Helge Deller, Jens Axboe, io-uring; +Cc: linux-parisc

On 2023-02-13 5:05 p.m., Helge Deller wrote:
> On 2/13/23 22:05, Jens Axboe wrote:
>> On 2/13/23 1:59?PM, Helge Deller wrote:
>>>> Yep sounds like it. What's the caching architecture of parisc?
>>>
>>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>>
>> That's what I assumed, so virtual aliasing is what we're dealing with
>> here.
>>
>>> Thanks for the patch!
>>> Sadly it doesn't fix the problem, as the kernel still sees
>>> ctx->rings->sq.tail as being 0.
>>> Interestingly it worked once (not reproduceable) directly after bootup,
>>> which indicates that we at least look at the right address from kernel side.
>>>
>>> So, still needs more debugging/testing.
>>
>> It's not like this is untested stuff, so yeah it'll generally be
>> correct, it just seems that parisc is a bit odd in that the virtual
>> aliasing occurs between the kernel and userspace addresses too. At least
>> that's what it seems like.
>
> True.
>
>> But I wonder if what needs flushing is the user side, not the kernel
>> side? Either that, or my patch is not flushing the right thing on the
>> kernel side.
>>
>> Is it possible to flush it from the userspace side? Presumable that's
>> what we'd need on the sqe side, and then the kernel side for the cqe
>> filling. So probably the patch is half-way correct :-)
>
> I hacked up in __io_uring_flush_sq() in liburing/src/queue.c this code
> (which I hope is correct):
>                 if (!(ring->flags & IORING_SETUP_SQPOLL))
>                         IO_URING_WRITE_ONCE(*sq->ktail, tail);
>                 else
>                         io_uring_smp_store_release(sq->ktail, tail);
>         } /* ADDED: */
>         { int i;  unsigned long p = (unsigned long)sq->ktail & ~(4096-1);
>           fprintf(stderr, "FLUSH CACHE OF PAGE %lx\n", p);
>           for (i=0; i < 4096; i += 8)
>                 asm volatile("fdc 0(%0)" : : "r" (p+i));
>         }
>
> The kernel sometimes sees the tail value now (it fails afterwards, but that's ok for now).
> But I'm not sure yet if this is really the effect of the fdc (flush data cache instruction),
> or pure luck because the aliasing of the userspace address and kernel address matches in
> a sucessful run.
If the user and kernel aliases are not equivalent, the kernel must also flush the page to
invalidate any lines that may be present in the cache before trying to access the data in the page.
> For me it seems as it's the aliasing which makes it work sometimes.
>
> In this regard I wonder why we don't provide the cacheflush syscall on parisc....
The kernel knows the cache stride and can optimize the flush.  But it needs to handle non access TLB
faults on userspace.  Userspace can also do flushes.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-13 22:50                             ` John David Anglin
@ 2023-02-14 23:09                               ` Helge Deller
  2023-02-14 23:29                                 ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-14 23:09 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-parisc, John David Anglin, James Bottomley

* John David Anglin <dave.anglin@bell.net>:
> On 2023-02-13 5:05 p.m., Helge Deller wrote:
> > On 2/13/23 22:05, Jens Axboe wrote:
> > > On 2/13/23 1:59?PM, Helge Deller wrote:
> > > > > Yep sounds like it. What's the caching architecture of parisc?
> > > >
> > > > parisc is Virtually Indexed, Physically Tagged (VIPT).
> > >
> > > That's what I assumed, so virtual aliasing is what we're dealing with
> > > here.
> > >
> > > > Thanks for the patch!
> > > > Sadly it doesn't fix the problem, as the kernel still sees
> > > > ctx->rings->sq.tail as being 0.
> > > > Interestingly it worked once (not reproduceable) directly after bootup,
> > > > which indicates that we at least look at the right address from kernel side.
> > > >
> > > > So, still needs more debugging/testing.
> > >
> > > It's not like this is untested stuff, so yeah it'll generally be
> > > correct, it just seems that parisc is a bit odd in that the virtual
> > > aliasing occurs between the kernel and userspace addresses too. At least
> > > that's what it seems like.
> >
> > True.
> >
> > > But I wonder if what needs flushing is the user side, not the kernel
> > > side? Either that, or my patch is not flushing the right thing on the
> > > kernel side.


The patch below seems to fix the issue.

I've successfuly tested it with the io_uring-test testcase on
physical parisc machines with 32- and 64-bit 6.1.11 kernels.

The idea is similiar on how a file is mmapped shared by two
userspace processes by keeping the lower bits of the virtual address
the same.

Cache flushes from userspace don't seem to be needed.

I think similiar code is needed for mips (uses SHMLBA 0x40000) and
some other architectures....

Helge


From efde7ed7ad380a924448b8ab8ea30d52782aa8e6 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller@gmx.de>
Date: Tue, 14 Feb 2023 23:41:14 +0100
Subject: [PATCH] io_uring: DRAFT Fix io_uring on machines with VIPT caches

This is a DRAFT patch to fix io_uring to function on machines
with VIPT caches (like PA-RISC).
It will currently only compile on parisc, because of the usage
of the SHM_COLOUR constant.

Basic idea is to ensure that the page colour matches between the kernel
ring address and mmap'ed userspace address and by flushing the caches
before accessing the rings.

Signed-off-by: Helge Deller <deller@gmx.de>

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 862e05e6691d..606e23671453 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2208,6 +2208,8 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
 	unsigned head, mask = ctx->sq_entries - 1;
 	unsigned sq_idx = ctx->cached_sq_head++ & mask;

+	flush_dcache_page(virt_to_page(ctx->sq_array + sq_idx));
+
 	/*
 	 * The cached sq head (or cq tail) serves two purposes:
 	 *
@@ -2238,6 +2240,9 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 	unsigned int left;
 	int ret;

+	struct io_rings *rings = ctx->rings;
+	flush_dcache_page(virt_to_page(rings));
+
 	if (unlikely(!entries))
 		return 0;
 	/* make sure SQ entry isn't read before tail */
@@ -3059,6 +3067,51 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
 }

+unsigned long
+io_uring_mmu_get_unmapped_area(struct file *filp, unsigned long addr,
+				  unsigned long len, unsigned long pgoff,
+				  unsigned long flags)
+{
+	struct mm_struct *mm = current->mm;
+	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
+	struct vm_unmapped_area_info info;
+	void *ptr;
+
+	ptr = io_uring_validate_mmap_request(filp, pgoff, len);
+	if (IS_ERR(ptr))
+		return -ENOMEM;
+
+
+	/* we do not support requesting a specific address */
+	if (addr)
+		return -EINVAL;
+
+	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+	info.length = len;
+	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
+	info.high_limit = arch_get_mmap_base(addr, mm->mmap_base);
+	info.align_mask = PAGE_MASK & (SHM_COLOUR - 1);
+	info.align_offset = (unsigned long)ptr & (SHM_COLOUR - 1);
+
+	addr = vm_unmapped_area(&info);
+
+	/*
+	 * A failed mmap() very likely causes application failure,
+	 * so fall back to the bottom-up function here. This scenario
+	 * can happen with large stack limits and large mmap()
+	 * allocations.
+	 */
+	if (offset_in_page(addr)) {
+		VM_BUG_ON(addr != -ENOMEM);
+		info.flags = 0;
+		info.low_limit = TASK_UNMAPPED_BASE;
+		info.high_limit = mmap_end;
+		addr = vm_unmapped_area(&info);
+	}
+
+	return addr;
+}
+
 #else /* !CONFIG_MMU */

 static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
@@ -3273,6 +3326,8 @@ static const struct file_operations io_uring_fops = {
 #ifndef CONFIG_MMU
 	.get_unmapped_area = io_uring_nommu_get_unmapped_area,
 	.mmap_capabilities = io_uring_nommu_mmap_capabilities,
+#else
+	.get_unmapped_area = io_uring_mmu_get_unmapped_area,
 #endif
 	.poll		= io_uring_poll,
 #ifdef CONFIG_PROC_FS
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 90b675c65b84..b8bc682ef240 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -204,6 +204,7 @@ static inline void io_ring_submit_lock(struct io_ring_ctx *ctx,
 static inline void io_commit_cqring(struct io_ring_ctx *ctx)
 {
 	/* order cqe stores with ring update */
+	flush_dcache_page(virt_to_page(ctx->rings));
 	smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
 }


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-14 23:09                               ` io_uring failure on parisc with VIPT caches Helge Deller
@ 2023-02-14 23:29                                 ` Jens Axboe
  2023-02-15  2:12                                   ` John David Anglin
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-14 23:29 UTC (permalink / raw)
  To: Helge Deller, io-uring, linux-parisc, John David Anglin, James Bottomley

On 2/14/23 4:09 PM, Helge Deller wrote:
> * John David Anglin <dave.anglin@bell.net>:
>> On 2023-02-13 5:05 p.m., Helge Deller wrote:
>>> On 2/13/23 22:05, Jens Axboe wrote:
>>>> On 2/13/23 1:59?PM, Helge Deller wrote:
>>>>>> Yep sounds like it. What's the caching architecture of parisc?
>>>>>
>>>>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>>>>
>>>> That's what I assumed, so virtual aliasing is what we're dealing with
>>>> here.
>>>>
>>>>> Thanks for the patch!
>>>>> Sadly it doesn't fix the problem, as the kernel still sees
>>>>> ctx->rings->sq.tail as being 0.
>>>>> Interestingly it worked once (not reproduceable) directly after bootup,
>>>>> which indicates that we at least look at the right address from kernel side.
>>>>>
>>>>> So, still needs more debugging/testing.
>>>>
>>>> It's not like this is untested stuff, so yeah it'll generally be
>>>> correct, it just seems that parisc is a bit odd in that the virtual
>>>> aliasing occurs between the kernel and userspace addresses too. At least
>>>> that's what it seems like.
>>>
>>> True.
>>>
>>>> But I wonder if what needs flushing is the user side, not the kernel
>>>> side? Either that, or my patch is not flushing the right thing on the
>>>> kernel side.
> 
> 
> The patch below seems to fix the issue.
> 
> I've successfuly tested it with the io_uring-test testcase on
> physical parisc machines with 32- and 64-bit 6.1.11 kernels.
> 
> The idea is similiar on how a file is mmapped shared by two
> userspace processes by keeping the lower bits of the virtual address
> the same.
> 
> Cache flushes from userspace don't seem to be needed.

Are they from the kernel side, if the lower bits mean we end up
with the same coloring? Because I think this is a bit of a big
hammer, in terms of overhead for flushing. As an example, on arm64
that is perfectly fine with the existing code, it's about a 20-25%
performance hit.

Other little complaints too in terms of which pages to flush, eg
it's only the first page that is flushed but the ring may be
larger than that. But those are mostly moot if we can just guarantee
the lowest bits fixes the aliasing.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-14 23:29                                 ` Jens Axboe
@ 2023-02-15  2:12                                   ` John David Anglin
  2023-02-15 15:16                                     ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: John David Anglin @ 2023-02-15  2:12 UTC (permalink / raw)
  To: Jens Axboe, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2023-02-14 6:29 p.m., Jens Axboe wrote:
> On 2/14/23 4:09 PM, Helge Deller wrote:
>> * John David Anglin<dave.anglin@bell.net>:
>>> On 2023-02-13 5:05 p.m., Helge Deller wrote:
>>>> On 2/13/23 22:05, Jens Axboe wrote:
>>>>> On 2/13/23 1:59?PM, Helge Deller wrote:
>>>>>>> Yep sounds like it. What's the caching architecture of parisc?
>>>>>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>>>>> That's what I assumed, so virtual aliasing is what we're dealing with
>>>>> here.
>>>>>
>>>>>> Thanks for the patch!
>>>>>> Sadly it doesn't fix the problem, as the kernel still sees
>>>>>> ctx->rings->sq.tail as being 0.
>>>>>> Interestingly it worked once (not reproduceable) directly after bootup,
>>>>>> which indicates that we at least look at the right address from kernel side.
>>>>>>
>>>>>> So, still needs more debugging/testing.
>>>>> It's not like this is untested stuff, so yeah it'll generally be
>>>>> correct, it just seems that parisc is a bit odd in that the virtual
>>>>> aliasing occurs between the kernel and userspace addresses too. At least
>>>>> that's what it seems like.
>>>> True.
>>>>
>>>>> But I wonder if what needs flushing is the user side, not the kernel
>>>>> side? Either that, or my patch is not flushing the right thing on the
>>>>> kernel side.
>> The patch below seems to fix the issue.
>>
>> I've successfuly tested it with the io_uring-test testcase on
>> physical parisc machines with 32- and 64-bit 6.1.11 kernels.
>>
>> The idea is similiar on how a file is mmapped shared by two
>> userspace processes by keeping the lower bits of the virtual address
>> the same.
>>
>> Cache flushes from userspace don't seem to be needed.
> Are they from the kernel side, if the lower bits mean we end up
> with the same coloring? Because I think this is a bit of a big
> hammer, in terms of overhead for flushing. As an example, on arm64
> that is perfectly fine with the existing code, it's about a 20-25%
> performance hit.
The io_uring-test testcase still works on rp3440 with the kernel flushes removed.

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15  2:12                                   ` John David Anglin
@ 2023-02-15 15:16                                     ` Jens Axboe
  2023-02-15 15:52                                       ` Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 15:16 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/14/23 7:12?PM, John David Anglin wrote:
> On 2023-02-14 6:29 p.m., Jens Axboe wrote:
>> On 2/14/23 4:09?PM, Helge Deller wrote:
>>> * John David Anglin<dave.anglin@bell.net>:
>>>> On 2023-02-13 5:05 p.m., Helge Deller wrote:
>>>>> On 2/13/23 22:05, Jens Axboe wrote:
>>>>>> On 2/13/23 1:59?PM, Helge Deller wrote:
>>>>>>>> Yep sounds like it. What's the caching architecture of parisc?
>>>>>>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>>>>>> That's what I assumed, so virtual aliasing is what we're dealing with
>>>>>> here.
>>>>>>
>>>>>>> Thanks for the patch!
>>>>>>> Sadly it doesn't fix the problem, as the kernel still sees
>>>>>>> ctx->rings->sq.tail as being 0.
>>>>>>> Interestingly it worked once (not reproduceable) directly after bootup,
>>>>>>> which indicates that we at least look at the right address from kernel side.
>>>>>>>
>>>>>>> So, still needs more debugging/testing.
>>>>>> It's not like this is untested stuff, so yeah it'll generally be
>>>>>> correct, it just seems that parisc is a bit odd in that the virtual
>>>>>> aliasing occurs between the kernel and userspace addresses too. At least
>>>>>> that's what it seems like.
>>>>> True.
>>>>>
>>>>>> But I wonder if what needs flushing is the user side, not the kernel
>>>>>> side? Either that, or my patch is not flushing the right thing on the
>>>>>> kernel side.
>>> The patch below seems to fix the issue.
>>>
>>> I've successfuly tested it with the io_uring-test testcase on
>>> physical parisc machines with 32- and 64-bit 6.1.11 kernels.
>>>
>>> The idea is similiar on how a file is mmapped shared by two
>>> userspace processes by keeping the lower bits of the virtual address
>>> the same.
>>>
>>> Cache flushes from userspace don't seem to be needed.
>> Are they from the kernel side, if the lower bits mean we end up
>> with the same coloring? Because I think this is a bit of a big
>> hammer, in terms of overhead for flushing. As an example, on arm64
>> that is perfectly fine with the existing code, it's about a 20-25%
>> performance hit.
>
> The io_uring-test testcase still works on rp3440 with the kernel
> flushes removed.

That's what I suspected, the important bit here is just aligning it for
identical coloring. Can you confirm if the below works for you? Had to
fiddle it a bit to get it to work without coloring.


diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index db623b3185c8..1d4562067949 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -72,6 +72,7 @@
 #include <linux/io_uring.h>
 #include <linux/audit.h>
 #include <linux/security.h>
+#include <asm/shmparam.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -3200,6 +3201,51 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
 }
 
+static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
+			unsigned long addr, unsigned long len,
+			unsigned long pgoff, unsigned long flags)
+{
+	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
+	struct vm_unmapped_area_info info;
+	void *ptr;
+
+	ptr = io_uring_validate_mmap_request(filp, pgoff, len);
+	if (IS_ERR(ptr))
+		return -ENOMEM;
+
+	/* we do not support requesting a specific address */
+	if (addr)
+		return -EINVAL;
+
+	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+	info.length = len;
+	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
+	info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
+	info.align_mask = PAGE_MASK;
+	info.align_offset = (unsigned long) ptr;
+#ifdef SHM_COLOUR
+	info.align_mask &= (SHM_COLOUR - 1);
+	info.align_offset &= (SHM_COLOUR - 1)
+#endif
+
+	/*
+	 * A failed mmap() very likely causes application failure,
+	 * so fall back to the bottom-up function here. This scenario
+	 * can happen with large stack limits and large mmap()
+	 * allocations.
+	 */
+	addr = vm_unmapped_area(&info);
+	if (offset_in_page(addr)) {
+		VM_BUG_ON(addr != -ENOMEM);
+		info.flags = 0;
+		info.low_limit = TASK_UNMAPPED_BASE;
+		info.high_limit = mmap_end;
+		addr = vm_unmapped_area(&info);
+	}
+
+	return addr;
+}
+
 #else /* !CONFIG_MMU */
 
 static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
@@ -3414,6 +3460,8 @@ static const struct file_operations io_uring_fops = {
 #ifndef CONFIG_MMU
 	.get_unmapped_area = io_uring_nommu_get_unmapped_area,
 	.mmap_capabilities = io_uring_nommu_mmap_capabilities,
+#else
+	.get_unmapped_area = io_uring_mmu_get_unmapped_area,
 #endif
 	.poll		= io_uring_poll,
 #ifdef CONFIG_PROC_FS

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 15:16                                     ` Jens Axboe
@ 2023-02-15 15:52                                       ` Helge Deller
  2023-02-15 15:56                                         ` Jens Axboe
  2023-02-15 16:18                                         ` John David Anglin
  0 siblings, 2 replies; 48+ messages in thread
From: Helge Deller @ 2023-02-15 15:52 UTC (permalink / raw)
  To: Jens Axboe, John David Anglin, io-uring, linux-parisc, James Bottomley

On 2/15/23 16:16, Jens Axboe wrote:
> On 2/14/23 7:12?PM, John David Anglin wrote:
>> On 2023-02-14 6:29 p.m., Jens Axboe wrote:
>>> On 2/14/23 4:09?PM, Helge Deller wrote:
>>>> * John David Anglin<dave.anglin@bell.net>:
>>>>> On 2023-02-13 5:05 p.m., Helge Deller wrote:
>>>>>> On 2/13/23 22:05, Jens Axboe wrote:
>>>>>>> On 2/13/23 1:59?PM, Helge Deller wrote:
>>>>>>>>> Yep sounds like it. What's the caching architecture of parisc?
>>>>>>>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>>>>>>> That's what I assumed, so virtual aliasing is what we're dealing with
>>>>>>> here.
>>>>>>>
>>>>>>>> Thanks for the patch!
>>>>>>>> Sadly it doesn't fix the problem, as the kernel still sees
>>>>>>>> ctx->rings->sq.tail as being 0.
>>>>>>>> Interestingly it worked once (not reproduceable) directly after bootup,
>>>>>>>> which indicates that we at least look at the right address from kernel side.
>>>>>>>>
>>>>>>>> So, still needs more debugging/testing.
>>>>>>> It's not like this is untested stuff, so yeah it'll generally be
>>>>>>> correct, it just seems that parisc is a bit odd in that the virtual
>>>>>>> aliasing occurs between the kernel and userspace addresses too. At least
>>>>>>> that's what it seems like.
>>>>>> True.
>>>>>>
>>>>>>> But I wonder if what needs flushing is the user side, not the kernel
>>>>>>> side? Either that, or my patch is not flushing the right thing on the
>>>>>>> kernel side.
>>>> The patch below seems to fix the issue.
>>>>
>>>> I've successfuly tested it with the io_uring-test testcase on
>>>> physical parisc machines with 32- and 64-bit 6.1.11 kernels.
>>>>
>>>> The idea is similiar on how a file is mmapped shared by two
>>>> userspace processes by keeping the lower bits of the virtual address
>>>> the same.
>>>>
>>>> Cache flushes from userspace don't seem to be needed.
>>> Are they from the kernel side, if the lower bits mean we end up
>>> with the same coloring? Because I think this is a bit of a big
>>> hammer, in terms of overhead for flushing. As an example, on arm64
>>> that is perfectly fine with the existing code, it's about a 20-25%
>>> performance hit.
>>
>> The io_uring-test testcase still works on rp3440 with the kernel
>> flushes removed.
>
> That's what I suspected, the important bit here is just aligning it for
> identical coloring. Can you confirm if the below works for you? Had to
> fiddle it a bit to get it to work without coloring.

Yes, the patch works for me on 32- and 64-bit, even with PA8900 CPUs...

Is there maybe somewhere a more detailled testcase which I could try too?

Some nits below...

> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index db623b3185c8..1d4562067949 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -72,6 +72,7 @@
>   #include <linux/io_uring.h>
>   #include <linux/audit.h>
>   #include <linux/security.h>
> +#include <asm/shmparam.h>
>
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/io_uring.h>
> @@ -3200,6 +3201,51 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>   	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>   }
>
> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
> +			unsigned long addr, unsigned long len,
> +			unsigned long pgoff, unsigned long flags)
> +{
> +	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
> +	struct vm_unmapped_area_info info;
> +	void *ptr;
> +
> +	ptr = io_uring_validate_mmap_request(filp, pgoff, len);
> +	if (IS_ERR(ptr))
> +		return -ENOMEM;
> +
> +	/* we do not support requesting a specific address */
> +	if (addr)
> +		return -EINVAL;

With this ^ we disallow users to provide a proposed address.
I think this is ok and I suggest to keep it that way.

Alternatively one could check the given address against the
alignment which is calculated below, but this will make the
code IMHO unnecessary bigger.

> +
> +	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> +	info.length = len;
> +	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
> +	info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
> +	info.align_mask = PAGE_MASK;
> +	info.align_offset = (unsigned long) ptr;

For parisc I introduced SHM_COLOUR because it allows userspace
to map a shared file initially at any PAGE_SIZE-aligned address.
Only if then a second user maps the same file, the aliasing will be enforced.

Other platforms just have SHMLBA, and for some SHMLBA is > PAGE_SIZE.
So, instead of above code, this untested code might be better for those other
platforms ?
info.align_mask = PAGE_MASK & (SHMLBA - 1);
info.align_offset = (unsigned long)ptr & (SHMLBA - 1);

this is ok ->
> +#ifdef SHM_COLOUR
> +	info.align_mask &= (SHM_COLOUR - 1);
> +	info.align_offset &= (SHM_COLOUR - 1)

^^ misses a ";" at the end.

Helge

> +#endif
> +
> +	/*
> +	 * A failed mmap() very likely causes application failure,
> +	 * so fall back to the bottom-up function here. This scenario
> +	 * can happen with large stack limits and large mmap()
> +	 * allocations.
> +	 */
> +	addr = vm_unmapped_area(&info);
> +	if (offset_in_page(addr)) {
> +		VM_BUG_ON(addr != -ENOMEM);
> +		info.flags = 0;
> +		info.low_limit = TASK_UNMAPPED_BASE;
> +		info.high_limit = mmap_end;
> +		addr = vm_unmapped_area(&info);
> +	}
> +
> +	return addr;
> +}
> +
>   #else /* !CONFIG_MMU */
>
>   static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
> @@ -3414,6 +3460,8 @@ static const struct file_operations io_uring_fops = {
>   #ifndef CONFIG_MMU
>   	.get_unmapped_area = io_uring_nommu_get_unmapped_area,
>   	.mmap_capabilities = io_uring_nommu_mmap_capabilities,
> +#else
> +	.get_unmapped_area = io_uring_mmu_get_unmapped_area,
>   #endif
>   	.poll		= io_uring_poll,
>   #ifdef CONFIG_PROC_FS
>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 15:52                                       ` Helge Deller
@ 2023-02-15 15:56                                         ` Jens Axboe
  2023-02-15 16:02                                           ` Helge Deller
  2023-02-15 16:38                                           ` John David Anglin
  2023-02-15 16:18                                         ` John David Anglin
  1 sibling, 2 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 15:56 UTC (permalink / raw)
  To: Helge Deller, John David Anglin, io-uring, linux-parisc, James Bottomley

On 2/15/23 8:52?AM, Helge Deller wrote:
> On 2/15/23 16:16, Jens Axboe wrote:
>> On 2/14/23 7:12?PM, John David Anglin wrote:
>>> On 2023-02-14 6:29 p.m., Jens Axboe wrote:
>>>> On 2/14/23 4:09?PM, Helge Deller wrote:
>>>>> * John David Anglin<dave.anglin@bell.net>:
>>>>>> On 2023-02-13 5:05 p.m., Helge Deller wrote:
>>>>>>> On 2/13/23 22:05, Jens Axboe wrote:
>>>>>>>> On 2/13/23 1:59?PM, Helge Deller wrote:
>>>>>>>>>> Yep sounds like it. What's the caching architecture of parisc?
>>>>>>>>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>>>>>>>> That's what I assumed, so virtual aliasing is what we're dealing with
>>>>>>>> here.
>>>>>>>>
>>>>>>>>> Thanks for the patch!
>>>>>>>>> Sadly it doesn't fix the problem, as the kernel still sees
>>>>>>>>> ctx->rings->sq.tail as being 0.
>>>>>>>>> Interestingly it worked once (not reproduceable) directly after bootup,
>>>>>>>>> which indicates that we at least look at the right address from kernel side.
>>>>>>>>>
>>>>>>>>> So, still needs more debugging/testing.
>>>>>>>> It's not like this is untested stuff, so yeah it'll generally be
>>>>>>>> correct, it just seems that parisc is a bit odd in that the virtual
>>>>>>>> aliasing occurs between the kernel and userspace addresses too. At least
>>>>>>>> that's what it seems like.
>>>>>>> True.
>>>>>>>
>>>>>>>> But I wonder if what needs flushing is the user side, not the kernel
>>>>>>>> side? Either that, or my patch is not flushing the right thing on the
>>>>>>>> kernel side.
>>>>> The patch below seems to fix the issue.
>>>>>
>>>>> I've successfuly tested it with the io_uring-test testcase on
>>>>> physical parisc machines with 32- and 64-bit 6.1.11 kernels.
>>>>>
>>>>> The idea is similiar on how a file is mmapped shared by two
>>>>> userspace processes by keeping the lower bits of the virtual address
>>>>> the same.
>>>>>
>>>>> Cache flushes from userspace don't seem to be needed.
>>>> Are they from the kernel side, if the lower bits mean we end up
>>>> with the same coloring? Because I think this is a bit of a big
>>>> hammer, in terms of overhead for flushing. As an example, on arm64
>>>> that is perfectly fine with the existing code, it's about a 20-25%
>>>> performance hit.
>>>
>>> The io_uring-test testcase still works on rp3440 with the kernel
>>> flushes removed.
>>
>> That's what I suspected, the important bit here is just aligning it for
>> identical coloring. Can you confirm if the below works for you? Had to
>> fiddle it a bit to get it to work without coloring.
> 
> Yes, the patch works for me on 32- and 64-bit, even with PA8900 CPUs...
> 
> Is there maybe somewhere a more detailled testcase which I could try too?

Just git clone liburing:

git clone git://git.kernel.dk/liburing

and run make && make runtests in there, that'll go through the whole
regression suite.

> Some nits below...
> 
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index db623b3185c8..1d4562067949 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -72,6 +72,7 @@
>>   #include <linux/io_uring.h>
>>   #include <linux/audit.h>
>>   #include <linux/security.h>
>> +#include <asm/shmparam.h>
>>
>>   #define CREATE_TRACE_POINTS
>>   #include <trace/events/io_uring.h>
>> @@ -3200,6 +3201,51 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>>       return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>>   }
>>
>> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>> +            unsigned long addr, unsigned long len,
>> +            unsigned long pgoff, unsigned long flags)
>> +{
>> +    const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
>> +    struct vm_unmapped_area_info info;
>> +    void *ptr;
>> +
>> +    ptr = io_uring_validate_mmap_request(filp, pgoff, len);
>> +    if (IS_ERR(ptr))
>> +        return -ENOMEM;
>> +
>> +    /* we do not support requesting a specific address */
>> +    if (addr)
>> +        return -EINVAL;
> 
> With this ^ we disallow users to provide a proposed address.
> I think this is ok and I suggest to keep it that way.
> 
> Alternatively one could check the given address against the
> alignment which is calculated below, but this will make the
> code IMHO unnecessary bigger.

liburing won't provide an address, so I'd say let's just keep it as-is.

>> +
>> +    info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>> +    info.length = len;
>> +    info.low_limit = max(PAGE_SIZE, mmap_min_addr);
>> +    info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
>> +    info.align_mask = PAGE_MASK;
>> +    info.align_offset = (unsigned long) ptr;
> 
> For parisc I introduced SHM_COLOUR because it allows userspace
> to map a shared file initially at any PAGE_SIZE-aligned address.
> Only if then a second user maps the same file, the aliasing will be enforced.
> 
> Other platforms just have SHMLBA, and for some SHMLBA is > PAGE_SIZE.
> So, instead of above code, this untested code might be better for those other
> platforms ?
> info.align_mask = PAGE_MASK & (SHMLBA - 1);
> info.align_offset = (unsigned long)ptr & (SHMLBA - 1);

Yeah, I did peek at SHMLBA as well and it seems more common. Could you
test that and send out a "real" patch so we can get it queued up?

> this is ok ->
>> +#ifdef SHM_COLOUR
>> +    info.align_mask &= (SHM_COLOUR - 1);
>> +    info.align_offset &= (SHM_COLOUR - 1)
> 
> ^^ misses a ";" at the end.

Oops indeed.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 15:56                                         ` Jens Axboe
@ 2023-02-15 16:02                                           ` Helge Deller
  2023-02-15 16:04                                             ` Jens Axboe
  2023-02-15 16:38                                           ` John David Anglin
  1 sibling, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-15 16:02 UTC (permalink / raw)
  To: Jens Axboe, John David Anglin, io-uring, linux-parisc, James Bottomley

On 2/15/23 16:56, Jens Axboe wrote:
> On 2/15/23 8:52?AM, Helge Deller wrote:
>> On 2/15/23 16:16, Jens Axboe wrote:
>>> On 2/14/23 7:12?PM, John David Anglin wrote:
>>>> On 2023-02-14 6:29 p.m., Jens Axboe wrote:
>>>>> On 2/14/23 4:09?PM, Helge Deller wrote:
>>>>>> * John David Anglin<dave.anglin@bell.net>:
>>>>>>> On 2023-02-13 5:05 p.m., Helge Deller wrote:
>>>>>>>> On 2/13/23 22:05, Jens Axboe wrote:
>>>>>>>>> On 2/13/23 1:59?PM, Helge Deller wrote:
>>>>>>>>>>> Yep sounds like it. What's the caching architecture of parisc?
>>>>>>>>>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>>>>>>>>> That's what I assumed, so virtual aliasing is what we're dealing with
>>>>>>>>> here.
>>>>>>>>>
>>>>>>>>>> Thanks for the patch!
>>>>>>>>>> Sadly it doesn't fix the problem, as the kernel still sees
>>>>>>>>>> ctx->rings->sq.tail as being 0.
>>>>>>>>>> Interestingly it worked once (not reproduceable) directly after bootup,
>>>>>>>>>> which indicates that we at least look at the right address from kernel side.
>>>>>>>>>>
>>>>>>>>>> So, still needs more debugging/testing.
>>>>>>>>> It's not like this is untested stuff, so yeah it'll generally be
>>>>>>>>> correct, it just seems that parisc is a bit odd in that the virtual
>>>>>>>>> aliasing occurs between the kernel and userspace addresses too. At least
>>>>>>>>> that's what it seems like.
>>>>>>>> True.
>>>>>>>>
>>>>>>>>> But I wonder if what needs flushing is the user side, not the kernel
>>>>>>>>> side? Either that, or my patch is not flushing the right thing on the
>>>>>>>>> kernel side.
>>>>>> The patch below seems to fix the issue.
>>>>>>
>>>>>> I've successfuly tested it with the io_uring-test testcase on
>>>>>> physical parisc machines with 32- and 64-bit 6.1.11 kernels.
>>>>>>
>>>>>> The idea is similiar on how a file is mmapped shared by two
>>>>>> userspace processes by keeping the lower bits of the virtual address
>>>>>> the same.
>>>>>>
>>>>>> Cache flushes from userspace don't seem to be needed.
>>>>> Are they from the kernel side, if the lower bits mean we end up
>>>>> with the same coloring? Because I think this is a bit of a big
>>>>> hammer, in terms of overhead for flushing. As an example, on arm64
>>>>> that is perfectly fine with the existing code, it's about a 20-25%
>>>>> performance hit.
>>>>
>>>> The io_uring-test testcase still works on rp3440 with the kernel
>>>> flushes removed.
>>>
>>> That's what I suspected, the important bit here is just aligning it for
>>> identical coloring. Can you confirm if the below works for you? Had to
>>> fiddle it a bit to get it to work without coloring.
>>
>> Yes, the patch works for me on 32- and 64-bit, even with PA8900 CPUs...
>>
>> Is there maybe somewhere a more detailled testcase which I could try too?
>
> Just git clone liburing:
>
> git clone git://git.kernel.dk/liburing
>
> and run make && make runtests in there, that'll go through the whole
> regression suite.

Thanks!
I'll test.

>> Some nits below...
>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index db623b3185c8..1d4562067949 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -72,6 +72,7 @@
>>>    #include <linux/io_uring.h>
>>>    #include <linux/audit.h>
>>>    #include <linux/security.h>
>>> +#include <asm/shmparam.h>
>>>
>>>    #define CREATE_TRACE_POINTS
>>>    #include <trace/events/io_uring.h>
>>> @@ -3200,6 +3201,51 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>>>        return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>>>    }
>>>
>>> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>>> +            unsigned long addr, unsigned long len,
>>> +            unsigned long pgoff, unsigned long flags)
>>> +{
>>> +    const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
>>> +    struct vm_unmapped_area_info info;
>>> +    void *ptr;
>>> +
>>> +    ptr = io_uring_validate_mmap_request(filp, pgoff, len);
>>> +    if (IS_ERR(ptr))
>>> +        return -ENOMEM;
>>> +
>>> +    /* we do not support requesting a specific address */
>>> +    if (addr)
>>> +        return -EINVAL;
>>
>> With this ^ we disallow users to provide a proposed address.
>> I think this is ok and I suggest to keep it that way.
>>
>> Alternatively one could check the given address against the
>> alignment which is calculated below, but this will make the
>> code IMHO unnecessary bigger.
>
> liburing won't provide an address, so I'd say let's just keep it as-is.

Good.

>>> +
>>> +    info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>>> +    info.length = len;
>>> +    info.low_limit = max(PAGE_SIZE, mmap_min_addr);
>>> +    info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
>>> +    info.align_mask = PAGE_MASK;
>>> +    info.align_offset = (unsigned long) ptr;
>>
>> For parisc I introduced SHM_COLOUR because it allows userspace
>> to map a shared file initially at any PAGE_SIZE-aligned address.
>> Only if then a second user maps the same file, the aliasing will be enforced.
>>
>> Other platforms just have SHMLBA, and for some SHMLBA is > PAGE_SIZE.
>> So, instead of above code, this untested code might be better for those other
>> platforms ?
>> info.align_mask = PAGE_MASK & (SHMLBA - 1);
>> info.align_offset = (unsigned long)ptr & (SHMLBA - 1);
>
> Yeah, I did peek at SHMLBA as well and it seems more common. Could you
> test that and send out a "real" patch so we can get it queued up?

Sure, I'll do.

Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 16:02                                           ` Helge Deller
@ 2023-02-15 16:04                                             ` Jens Axboe
  2023-02-15 21:40                                               ` Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 16:04 UTC (permalink / raw)
  To: Helge Deller, John David Anglin, io-uring, linux-parisc, James Bottomley

On 2/15/23 9:02?AM, Helge Deller wrote:
>>>> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>>>> +            unsigned long addr, unsigned long len,
>>>> +            unsigned long pgoff, unsigned long flags)
>>>> +{
>>>> +    const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
>>>> +    struct vm_unmapped_area_info info;
>>>> +    void *ptr;
>>>> +
>>>> +    ptr = io_uring_validate_mmap_request(filp, pgoff, len);
>>>> +    if (IS_ERR(ptr))
>>>> +        return -ENOMEM;
>>>> +
>>>> +    /* we do not support requesting a specific address */
>>>> +    if (addr)
>>>> +        return -EINVAL;
>>>
>>> With this ^ we disallow users to provide a proposed address.
>>> I think this is ok and I suggest to keep it that way.
>>>
>>> Alternatively one could check the given address against the
>>> alignment which is calculated below, but this will make the
>>> code IMHO unnecessary bigger.
>>
>> liburing won't provide an address, so I'd say let's just keep it as-is.
> 
> Good.

Maybe it'd be saner to add that alignment check? Just in case someone is
passing in an address already, we could just make it fail for SHMLBA !=
0 (or SHM_COLOUR, I'll leave that to you).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 15:52                                       ` Helge Deller
  2023-02-15 15:56                                         ` Jens Axboe
@ 2023-02-15 16:18                                         ` John David Anglin
  1 sibling, 0 replies; 48+ messages in thread
From: John David Anglin @ 2023-02-15 16:18 UTC (permalink / raw)
  To: Helge Deller, Jens Axboe, io-uring, linux-parisc, James Bottomley

On 2023-02-15 10:52 a.m., Helge Deller wrote:
>>>> Are they from the kernel side, if the lower bits mean we end up
>>>> with the same coloring? Because I think this is a bit of a big
>>>> hammer, in terms of overhead for flushing. As an example, on arm64
>>>> that is perfectly fine with the existing code, it's about a 20-25%
>>>> performance hit.
>>>
>>> The io_uring-test testcase still works on rp3440 with the kernel
>>> flushes removed.
>>
>> That's what I suspected, the important bit here is just aligning it for
>> identical coloring. Can you confirm if the below works for you? Had to
>> fiddle it a bit to get it to work without coloring.
>
> Yes, the patch works for me on 32- and 64-bit, even with PA8900 CPUs...
>
> Is there maybe somewhere a more detailled testcase which I could try too?
We need to look at liburing and mariadb testsuites.  Mariadb testsuite failed last night
on rp3440.  So, I don't think we have a full solution for machines with PA8800 and PA8900
CPUs.

As I have said in the past, I don't think we have a consistent alias boundary for these
machines because of their L2 cache design.

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 15:56                                         ` Jens Axboe
  2023-02-15 16:02                                           ` Helge Deller
@ 2023-02-15 16:38                                           ` John David Anglin
  2023-02-15 17:01                                             ` Jens Axboe
  1 sibling, 1 reply; 48+ messages in thread
From: John David Anglin @ 2023-02-15 16:38 UTC (permalink / raw)
  To: Jens Axboe, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2023-02-15 10:56 a.m., Jens Axboe wrote:
>> Is there maybe somewhere a more detailled testcase which I could try too?
> Just git clone liburing:
>
> git clone git://git.kernel.dk/liburing
>
> and run make && make runtests in there, that'll go through the whole
> regression suite.
Here are test results for Debian liburing 2.3-3 (hppa) with Helge's original patch:
https://buildd.debian.org/status/fetch.php?pkg=liburing&arch=hppa&ver=2.3-3&stamp=1676478898&raw=0

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 16:38                                           ` John David Anglin
@ 2023-02-15 17:01                                             ` Jens Axboe
  2023-02-15 19:00                                               ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 17:01 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/15/23 9:38 AM, John David Anglin wrote:
> On 2023-02-15 10:56 a.m., Jens Axboe wrote:
>>> Is there maybe somewhere a more detailled testcase which I could try too?
>> Just git clone liburing:
>>
>> git clone git://git.kernel.dk/liburing
>>
>> and run make && make runtests in there, that'll go through the whole
>> regression suite.
> Here are test results for Debian liburing 2.3-3 (hppa) with Helge's original patch:
> https://buildd.debian.org/status/fetch.php?pkg=liburing&arch=hppa&ver=2.3-3&stamp=1676478898&raw=0

Most of the test failures seem to be related to O_DIRECT opens, which
I'm guessing is because it's run on an fs without O_DIRECT support?
Outside of that, I think some of the syzbot cases are just generally
broken on various archs.

Lastly, there's a few of these:

Running test buf-ring.t                                             bad run 0/0 = -233

and similar (like -223) which I really don't know what is, where do
these values come from? Ah hang on, they are in the parisc errno,
so that'd be -ENOBUFS and -EOPNOTSUPP. I wonder if there's some
discrepancy between the kernel and user side errno values here?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 17:01                                             ` Jens Axboe
@ 2023-02-15 19:00                                               ` Jens Axboe
  2023-02-15 19:16                                                 ` Jens Axboe
  2023-02-15 19:20                                                 ` John David Anglin
  0 siblings, 2 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 19:00 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/15/23 10:01?AM, Jens Axboe wrote:
> On 2/15/23 9:38?AM, John David Anglin wrote:
>> On 2023-02-15 10:56 a.m., Jens Axboe wrote:
>>>> Is there maybe somewhere a more detailled testcase which I could try too?
>>> Just git clone liburing:
>>>
>>> git clone git://git.kernel.dk/liburing
>>>
>>> and run make && make runtests in there, that'll go through the whole
>>> regression suite.
>> Here are test results for Debian liburing 2.3-3 (hppa) with Helge's original patch:
>> https://buildd.debian.org/status/fetch.php?pkg=liburing&arch=hppa&ver=2.3-3&stamp=1676478898&raw=0
> 
> Most of the test failures seem to be related to O_DIRECT opens, which
> I'm guessing is because it's run on an fs without O_DIRECT support?
> Outside of that, I think some of the syzbot cases are just generally
> broken on various archs.
> 
> Lastly, there's a few of these:
> 
> Running test buf-ring.t                                             bad run 0/0 = -233
> 
> and similar (like -223) which I really don't know what is, where do
> these values come from? Ah hang on, they are in the parisc errno,
> so that'd be -ENOBUFS and -EOPNOTSUPP. I wonder if there's some
> discrepancy between the kernel and user side errno values here?

I ran the tests in qemu, but didn't see the weird differences in errno
values here between the kernel and userspace. As an example of the above
one:

root@debian:~/liburing# test/buf-ring.t 
root@debian:~/liburing# echo $?
0

it runs fine here. The other failure cases:

917257daa0fe.t: this is due to syzbot using hard wired values, I changed
it to symbolic mmap flags, and ditto for all the other tests where that
was the issue.

accept.t: setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
fails here, no idea why.

xattr.t: works for me

The rest look like either errno value mismatches, or O_DIRECT not
working for you. This was tested with 6.2-rc7+ git, so a recent kernel.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 19:00                                               ` Jens Axboe
@ 2023-02-15 19:16                                                 ` Jens Axboe
  2023-02-15 20:27                                                   ` John David Anglin
  2023-02-15 19:20                                                 ` John David Anglin
  1 sibling, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 19:16 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/15/23 12:00?PM, Jens Axboe wrote:
> accept.t: setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
> fails here, no idea why.

I'm wrong on this one, it's actually opening a socket with O_NONBLOCK
that fails with EINVAL, and then we pass -1 to the above. I saw
something on boot on O_NONBLOCK being the wrong value (I think from
systemd), so maybe another case of userspace and the kernel not agreeing
on what values flags/errno has?

In any case, with the silly syzbot mmap stuff fixed up, I'm not seeing
anything odd. A few tests will time out as they simply run too slowly
emulated for me, but apart from that, seems fine. This is running with
Helge's patch, though not sure if that is required running emulated.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 19:00                                               ` Jens Axboe
  2023-02-15 19:16                                                 ` Jens Axboe
@ 2023-02-15 19:20                                                 ` John David Anglin
  2023-02-15 19:24                                                   ` Jens Axboe
  1 sibling, 1 reply; 48+ messages in thread
From: John David Anglin @ 2023-02-15 19:20 UTC (permalink / raw)
  To: Jens Axboe, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2023-02-15 2:00 p.m., Jens Axboe wrote:
> accept.t: setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
> fails here, no idea why.
The socket call fails, so fd is -1.

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 19:20                                                 ` John David Anglin
@ 2023-02-15 19:24                                                   ` Jens Axboe
  0 siblings, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 19:24 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/15/23 12:20 PM, John David Anglin wrote:
> On 2023-02-15 2:00 p.m., Jens Axboe wrote:
>> accept.t: setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
>> fails here, no idea why.
> The socket call fails, so fd is -1.

Yeah, see followup. I dug further into it, and it looks like a bug in
the test in that it uses O_NONBLOCK rather than SOCK_NONBLOCK. Most
other platforms have those the same, but not parisc. I fixed up the
test and now accept works fine:

root@debian:~/liburing# test/accept.t
root@debian:~/liburing# echo $?
0

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 19:16                                                 ` Jens Axboe
@ 2023-02-15 20:27                                                   ` John David Anglin
  2023-02-15 20:37                                                     ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: John David Anglin @ 2023-02-15 20:27 UTC (permalink / raw)
  To: Jens Axboe, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2023-02-15 2:16 p.m., Jens Axboe wrote:
> In any case, with the silly syzbot mmap stuff fixed up, I'm not seeing
> anything odd. A few tests will time out as they simply run too slowly
> emulated for me, but apart from that, seems fine. This is running with
> Helge's patch, though not sure if that is required running emulated.
I'm seeing two problematic tests:

test buf-ring.t generates on console:
TCP: request_sock_TCP: Possible SYN flooding on port 8495. Sending cookies.  Check SNMP counters.

System crashes running test buf-ring.t.

dave@mx3210:~/gnu/liburing/liburing$ make runtests
make[1]: Entering directory '/home/dave/gnu/liburing/liburing/src'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/dave/gnu/liburing/liburing/src'
make[1]: Entering directory '/home/dave/gnu/liburing/liburing/test'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/dave/gnu/liburing/liburing/test'
make[1]: Entering directory '/home/dave/gnu/liburing/liburing/examples'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/dave/gnu/liburing/liburing/examples'
make[1]: Entering directory '/home/dave/gnu/liburing/liburing/test'
Running test 232c93d07b74.t 5 sec [5]
Running test 35fa71a030ca.t 5 sec [5]
Running test 500f9fbadef8.t 25 sec [25]
Running test 7ad0e4b2f83c.t 1 sec [1]
Running test 8a9973408177.t 1 sec [0]
Running test 917257daa0fe.t 0 sec [0]
Running test a0908ae19763.t 0 sec [0]
Running test a4c0b3decb33.t Test a4c0b3decb33.t timed out (may not be a failure)
Running test accept.t 1 sec [1]
Running test accept-link.t 1 sec [1]
Running test accept-reuse.t 0 sec [0]
Running test accept-test.t 0 sec [0]
Running test across-fork.t 0 sec [0]
Running test b19062a56726.t 0 sec [0]
Running test b5837bd5311d.t 0 sec [0]
Running test buf-ring.t bad run 0/0 = -233
test_running(1) failed
Test buf-ring.t failed with ret 1
Running test ce593a6c480a.t 1 sec [1]
Running test close-opath.t 0 sec [0]
Running test connect.t 0 sec [0]
Running test cq-full.t 0 sec [0]
Running test cq-overflow.t 12 sec [11]
Running test cq-peek-batch.t 0 sec [0]
Running test cq-ready.t 0 sec [0]
Running test cq-size.t 0 sec [0]
Running test d4ae271dfaae.t 0 sec [1]
Running test d77a67ed5f27.t 0 sec [0]
Running test defer.t 4 sec [3]
Running test defer-taskrun.t 0 sec [0]
Running test double-poll-crash.t Skipped
Running test drop-submit.t 0 sec [0]
Running test eeed8b54e0df.t 0 sec [0]
Running test empty-eownerdead.t 0 sec [0]
Running test eploop.t 0 sec [0]
Running test eventfd.t 0 sec [0]
Running test eventfd-disable.t 0 sec [0]
Running test eventfd-reg.t 0 sec [0]
Running test eventfd-ring.t 1 sec [0]
Running test evloop.t 0 sec [0]
Running test exec-target.t 0 sec [0]
Running test exit-no-cleanup.t 0 sec [0]
Running test fadvise.t 0 sec [1]
Running test fallocate.t 0 sec [0]
Running test fc2a85cb02ef.t Test needs failslab/fail_futex/fail_page_alloc enabled, skipped
Skipped
Running test fd-pass.t 0 sec [0]
Running test file-register.t 4 sec [4]
Running test files-exit-hang-poll.t 1 sec [1]
Running test files-exit-hang-timeout.t 1 sec [2]
Running test file-update.t 0 sec [0]
Running test file-verify.t Found 262144, wanted 786432
Buffered novec reg test failed
Test file-verify.t failed with ret 1
Running test fixed-buf-iter.t 0 sec [0]
Running test fixed-link.t 0 sec [0]
Running test fixed-reuse.t 0 sec [0]
Running test fpos.t 0 sec [1]
Running test fsnotify.t Skipped
Running test fsync.t 0 sec [0]
Running test hardlink.t 0 sec [0]
Running test io-cancel.t 3 sec [4]
Running test iopoll.t 2 sec [2]
Running test iopoll-leak.t 0 sec [0]
Running test iopoll-overflow.t 1 sec [1]
Running test io_uring_enter.t 0 sec [1]
Running test io_uring_passthrough.t Skipped
Running test io_uring_register.t Unable to map a huge page.  Try increasing /proc/sys/vm/nr_hugepages by at least 1.
Skipping the hugepage test
0 sec [0]
Running test io_uring_setup.t 0 sec [0]
Running test lfs-openat.t 0 sec [0]
Running test lfs-openat-write.t 0 sec [0]
Running test link.t 0 sec [0]
Running test link_drain.t 3 sec [3]
Running test link-timeout.t 1 sec [1]
Running test madvise.t 0 sec [1]
Running test mkdir.t 0 sec [0]
Running test msg-ring.t 0 sec [0]
Running test msg-ring-flags.t Skipped
Running test msg-ring-overflow.t 0 sec [0]
Running test multicqes_drain.t 26 sec [26]
Running test nolibc.t Skipped
Running test nop-all-sizes.t 0 sec [0]
Running test nop.t 0 sec [1]
Running test openat2.t 0 sec [0]
Running test open-close.t 0 sec [0]
Running test open-direct-link.t 0 sec [0]
Running test open-direct-pick.t 0 sec [0]
Running test personality.t Not root, skipping
0 sec [0]
Running test pipe-bug.t 6 sec [6]
Running test pipe-eof.t 0 sec [1]
Running test pipe-reuse.t 0 sec [0]
Running test poll.t 1 sec [0]
Running test poll-cancel.t 0 sec [0]
Running test poll-cancel-all.t 0 sec [0]
Running test poll-cancel-ton.t 0 sec [1]
Running test poll-link.t 1 sec [0]
Running test poll-many.t 18 sec [19]
Running test poll-mshot-overflow.t 0 sec [0]
Running test poll-mshot-update.t 21 sec
Running test poll-race.t 2 sec
Running test poll-race-mshot.t Bad cqe res -233
Bad cqe res -233
Bad cqe res -233
Bad cqe res -233
...

This run was on ext4 file system.

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 20:27                                                   ` John David Anglin
@ 2023-02-15 20:37                                                     ` Jens Axboe
  2023-02-15 21:06                                                       ` John David Anglin
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 20:37 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/15/23 1:27?PM, John David Anglin wrote:
> On 2023-02-15 2:16 p.m., Jens Axboe wrote:
>> In any case, with the silly syzbot mmap stuff fixed up, I'm not seeing
>> anything odd. A few tests will time out as they simply run too slowly
>> emulated for me, but apart from that, seems fine. This is running with
>> Helge's patch, though not sure if that is required running emulated.
> I'm seeing two problematic tests:
> 
> test buf-ring.t generates on console:
> TCP: request_sock_TCP: Possible SYN flooding on port 8495. Sending
> cookies.  Check SNMP counters.

Pretty sure that's from connect.t, not from buf-ring.t. But yes, this
happens on all platforms, haven't looked into it. The test works, but
would be nice to clean that up.

> System crashes running test buf-ring.t.

Huh, what's the crash?

> Running test buf-ring.t bad run 0/0 = -233

THis one, and the similar -223 ones, you need to try and dig into that.
It doesn't reproduce for me, and it very much seems like the test case
having a different view of what -ENOBUFS looks like and hence it fails
when the kernel passes down something that is -ENOBUFS internally, but
doesn't match the app -ENOBUFS value. Are you running a 64-bit kernel?
Would that cause any differences?

I don't see this on qemu with the 32-bit kernel, nor does it happen on
other platforms.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 20:37                                                     ` Jens Axboe
@ 2023-02-15 21:06                                                       ` John David Anglin
  2023-02-15 21:38                                                         ` Jens Axboe
  2023-02-15 21:39                                                         ` John David Anglin
  0 siblings, 2 replies; 48+ messages in thread
From: John David Anglin @ 2023-02-15 21:06 UTC (permalink / raw)
  To: Jens Axboe, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2023-02-15 3:37 p.m., Jens Axboe wrote:
>> System crashes running test buf-ring.t.
> Huh, what's the crash?
Not much info.  System log indicates an HPMC occurred. Unfortunately, recovery code doesn't work.
>
>> Running test buf-ring.t bad run 0/0 = -233
> THis one, and the similar -223 ones, you need to try and dig into that.
> It doesn't reproduce for me, and it very much seems like the test case
> having a different view of what -ENOBUFS looks like and hence it fails
> when the kernel passes down something that is -ENOBUFS internally, but
> doesn't match the app -ENOBUFS value. Are you running a 64-bit kernel?
> Would that cause any differences?
I'm running a 64-bit kernel (6.1.12).

I believe 32 and 64-bit kernels have same error codes.

I see three places in io_uring where -ENOBUFS is returned.  They have similar code:

retry_multishot:
         if (io_do_buffer_select(req)) {
                 void __user *buf;
                 size_t len = sr->len;

                 buf = io_buffer_select(req, &len, issue_flags);
                 if (!buf)
                         return -ENOBUFS;
>
> I don't see this on qemu with the 32-bit kernel, nor does it happen on
> other platforms.

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 21:06                                                       ` John David Anglin
@ 2023-02-15 21:38                                                         ` Jens Axboe
  2023-02-15 21:39                                                         ` John David Anglin
  1 sibling, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 21:38 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/15/23 2:06 PM, John David Anglin wrote:
> On 2023-02-15 3:37 p.m., Jens Axboe wrote:
>>> System crashes running test buf-ring.t.
>> Huh, what's the crash?
> Not much info.  System log indicates an HPMC occurred. Unfortunately, recovery code doesn't work.
>>
>>> Running test buf-ring.t bad run 0/0 = -233
>> THis one, and the similar -223 ones, you need to try and dig into that.
>> It doesn't reproduce for me, and it very much seems like the test case
>> having a different view of what -ENOBUFS looks like and hence it fails
>> when the kernel passes down something that is -ENOBUFS internally, but
>> doesn't match the app -ENOBUFS value. Are you running a 64-bit kernel?
>> Would that cause any differences?
> I'm running a 64-bit kernel (6.1.12).
> 
> I believe 32 and 64-bit kernels have same error codes.
> 
> I see three places in io_uring where -ENOBUFS is returned.  They have similar code:
> 
> retry_multishot:
>         if (io_do_buffer_select(req)) {
>                 void __user *buf;
>                 size_t len = sr->len;
> 
>                 buf = io_buffer_select(req, &len, issue_flags);
>                 if (!buf)
>                         return -ENOBUFS;

buf-ring is using a kernel mapped user ring, so may indeed be the same
kind of coloring issue that we saw with the main rings.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 21:06                                                       ` John David Anglin
  2023-02-15 21:38                                                         ` Jens Axboe
@ 2023-02-15 21:39                                                         ` John David Anglin
  2023-02-15 22:10                                                           ` John David Anglin
  2023-02-15 23:03                                                           ` Jens Axboe
  1 sibling, 2 replies; 48+ messages in thread
From: John David Anglin @ 2023-02-15 21:39 UTC (permalink / raw)
  To: Jens Axboe, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2023-02-15 4:06 p.m., John David Anglin wrote:
> On 2023-02-15 3:37 p.m., Jens Axboe wrote:
>>> System crashes running test buf-ring.t.
>> Huh, what's the crash?
> Not much info.  System log indicates an HPMC occurred. Unfortunately, recovery code doesn't work.
The following occurred running buf-ring.t under gdb:

INFO: task kworker/u64:9:18319 blocked for more than 123 seconds.
       Not tainted 6.1.12+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u64:9   state:D stack:0     pid:18319 ppid:2 flags:0x00000000
Workqueue: events_unbound io_ring_exit_work
Backtrace:
  [<0000000040b5c210>] __schedule+0x2e8/0x7f0
  [<0000000040b5c7d0>] schedule+0xb8/0x1d0
  [<0000000040b66534>] schedule_timeout+0x11c/0x1b0
  [<0000000040b5d71c>] __wait_for_common+0x194/0x2e8
  [<0000000040b5d8ac>] wait_for_completion+0x3c/0x50
  [<0000000040b46508>] io_ring_exit_work+0x3d8/0x4d0
  [<0000000040268da8>] process_one_work+0x238/0x520
  [<00000000402692a4>] worker_thread+0x214/0x778
  [<0000000040276f94>] kthread+0x24c/0x258
  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28

INFO: task kworker/u64:10:18320 blocked for more than 123 seconds.
       Not tainted 6.1.12+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u64:10  state:D stack:0     pid:18320 ppid:2 flags:0x00000000
Workqueue: events_unbound io_ring_exit_work
Backtrace:
  [<0000000040b5c210>] __schedule+0x2e8/0x7f0
  [<0000000040b5c7d0>] schedule+0xb8/0x1d0
  [<0000000040b66534>] schedule_timeout+0x11c/0x1b0
  [<0000000040b5d71c>] __wait_for_common+0x194/0x2e8
  [<0000000040b5d8ac>] wait_for_completion+0x3c/0x50
  [<0000000040b46508>] io_ring_exit_work+0x3d8/0x4d0
  [<0000000040268da8>] process_one_work+0x238/0x520
  [<00000000402692a4>] worker_thread+0x214/0x778
  [<0000000040276f94>] kthread+0x24c/0x258
  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28

gdb was sitting at a break at line 328.

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 16:04                                             ` Jens Axboe
@ 2023-02-15 21:40                                               ` Helge Deller
  2023-02-15 23:04                                                 ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Helge Deller @ 2023-02-15 21:40 UTC (permalink / raw)
  To: Jens Axboe, John David Anglin; +Cc: io-uring

Here is an updated patch which
- should support other platforms which needs aliasing
- allows users to pass in an address in mmap(). This is checked
  and returned -EINVAL if it does not fullfill the aliasing.
  (this part is untested up to now!)


Jens, I think you need to add the "_FILE_OFFSET_BITS=64" define
when compiling your testsuite, e.g. for lfs-openat.t and lfs-openat-write.t

Helge


diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 862e05e6691d..d89fe16878dc 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -72,6 +72,7 @@
 #include <linux/io_uring.h>
 #include <linux/audit.h>
 #include <linux/security.h>
+#include <asm/shmparam.h>

 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -3059,6 +3060,63 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
 }

+static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
+			unsigned long addr0, unsigned long len,
+			unsigned long pgoff, unsigned long flags)
+{
+	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
+	struct vm_unmapped_area_info info;
+	unsigned long addr;
+	void *ptr;
+
+	ptr = io_uring_validate_mmap_request(filp, pgoff, len);
+	if (IS_ERR(ptr))
+		return -ENOMEM;
+
+	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+	info.length = len;
+	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
+	info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
+#ifdef SHM_COLOUR
+	info.align_mask = PAGE_MASK & (SHM_COLOUR - 1UL);;
+#else
+	info.align_mask = PAGE_MASK & (SHMLBA - 1UL);
+#endif
+	info.align_offset = (unsigned long) ptr;
+
+	if (addr0) {
+		/* check page alignment and shm aliasing */
+		if ((addr0 & (PAGE_SIZE - 1UL) ||
+		    ((addr0 & info.align_mask) !=
+			(info.align_offset & info.align_mask))))
+			return -EINVAL;
+		info.low_limit = max(addr0, info.low_limit);
+		info.high_limit = min(addr0 + len, info.high_limit);
+	}
+
+	/*
+	 * A failed mmap() very likely causes application failure,
+	 * so fall back to the bottom-up function here. This scenario
+	 * can happen with large stack limits and large mmap()
+	 * allocations.
+	 */
+	addr = vm_unmapped_area(&info);
+
+	/* if address was given, check against found address */
+	if (addr0 && addr != addr0)
+		return -EINVAL;
+
+	if (offset_in_page(addr)) {
+		VM_BUG_ON(addr != -ENOMEM);
+		info.flags = 0;
+		info.low_limit = TASK_UNMAPPED_BASE;
+		info.high_limit = mmap_end;
+		addr = vm_unmapped_area(&info);
+	}
+
+	return addr;
+}
+
 #else /* !CONFIG_MMU */

 static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
@@ -3273,6 +3331,8 @@ static const struct file_operations io_uring_fops = {
 #ifndef CONFIG_MMU
 	.get_unmapped_area = io_uring_nommu_get_unmapped_area,
 	.mmap_capabilities = io_uring_nommu_mmap_capabilities,
+#else
+	.get_unmapped_area = io_uring_mmu_get_unmapped_area,
 #endif
 	.poll		= io_uring_poll,
 #ifdef CONFIG_PROC_FS


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 21:39                                                         ` John David Anglin
@ 2023-02-15 22:10                                                           ` John David Anglin
  2023-02-15 23:02                                                             ` Jens Axboe
  2023-02-15 23:03                                                           ` Jens Axboe
  1 sibling, 1 reply; 48+ messages in thread
From: John David Anglin @ 2023-02-15 22:10 UTC (permalink / raw)
  To: Jens Axboe, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2023-02-15 4:39 p.m., John David Anglin wrote:
> On 2023-02-15 4:06 p.m., John David Anglin wrote:
>> On 2023-02-15 3:37 p.m., Jens Axboe wrote:
>>>> System crashes running test buf-ring.t.
>>> Huh, what's the crash?
>> Not much info.  System log indicates an HPMC occurred. Unfortunately, recovery code doesn't work.
> The following occurred running buf-ring.t under gdb:
>
> INFO: task kworker/u64:9:18319 blocked for more than 123 seconds.
>       Not tainted 6.1.12+ #4
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u64:9   state:D stack:0     pid:18319 ppid:2 flags:0x00000000
> Workqueue: events_unbound io_ring_exit_work
> Backtrace:
>  [<0000000040b5c210>] __schedule+0x2e8/0x7f0
>  [<0000000040b5c7d0>] schedule+0xb8/0x1d0
>  [<0000000040b66534>] schedule_timeout+0x11c/0x1b0
>  [<0000000040b5d71c>] __wait_for_common+0x194/0x2e8
>  [<0000000040b5d8ac>] wait_for_completion+0x3c/0x50
>  [<0000000040b46508>] io_ring_exit_work+0x3d8/0x4d0
>  [<0000000040268da8>] process_one_work+0x238/0x520
>  [<00000000402692a4>] worker_thread+0x214/0x778
>  [<0000000040276f94>] kthread+0x24c/0x258
>  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28
>
> INFO: task kworker/u64:10:18320 blocked for more than 123 seconds.
>       Not tainted 6.1.12+ #4
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u64:10  state:D stack:0     pid:18320 ppid:2 flags:0x00000000
> Workqueue: events_unbound io_ring_exit_work
> Backtrace:
>  [<0000000040b5c210>] __schedule+0x2e8/0x7f0
>  [<0000000040b5c7d0>] schedule+0xb8/0x1d0
>  [<0000000040b66534>] schedule_timeout+0x11c/0x1b0
>  [<0000000040b5d71c>] __wait_for_common+0x194/0x2e8
>  [<0000000040b5d8ac>] wait_for_completion+0x3c/0x50
>  [<0000000040b46508>] io_ring_exit_work+0x3d8/0x4d0
>  [<0000000040268da8>] process_one_work+0x238/0x520
>  [<00000000402692a4>] worker_thread+0x214/0x778
>  [<0000000040276f94>] kthread+0x24c/0x258
>  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28
>
> gdb was sitting at a break at line 328.
With Helge's latest patch, we get a software lockup:

TCP: request_sock_TCP: Possible SYN flooding on port 31309. Sending cookies.  Check SNMP counters.
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u64:13:14621]
Modules linked in: binfmt_misc ext4 crc16 jbd2 ext2 mbcache sg ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse nfsd 
ip_tables x_tables ipv6 autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi ses enclosure scsi_transport_sas crc64_rocksoft crc64 sr_mod uas usb_storage 
cdrom ohci_pci ehci_pci ohci_hcd pata_cmd64x ehci_hcd sym53c8xx libata scsi_transport_spi usbcore tg3 scsi_mod scsi_common usb_common
CPU: 0 PID: 14621 Comm: kworker/u64:13 Not tainted 6.1.12+ #5
Hardware name: 9000/800/rp3440
Workqueue: events_unbound io_ring_exit_work

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000001011001111111100001111 Not tainted
r00-03  000000ff082cff0f 00000000670cc880 00000000406e64f0 00000000670cc990
r04-07  0000000040c099c0 00000000629d1158 0000000000000000 000000006a406800
r08-11  00000000629d10e0 00000000629d1d98 0000000000000003 00000000670cc888
r12-15  00000000670cc960 000000000000002e 0000000040c709c0 0000000000000000
r16-19  0000000040c709c0 0000000040c709c0 0000000059dc6d60 0000000000000000
r20-23  0000000000000001 0000000000000001 0000000000000000 0000000064252110
r24-27  0000000000000003 00000000670cc888 0000000000000000 0000000040c099c0
r28-31  0000000059dc6d60 00000000670cc960 00000000670cca10 0000000000000001
sr00-03  000000000116c400 000000000116c400 0000000000000000 000000000116c400
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040683df0 0000000040683e38
  IIR: 0f5010df    ISR: 0000000000000000  IOR: 00000000670ccbc0
  CPU:        0   CR30: 0000000059dc6d60 CR31: ffffffffffffefff
  ORIG_R28: 0000000040203070
  IAOQ[0]: iocb_bio_iopoll+0x30/0x80
  IAOQ[1]: iocb_bio_iopoll+0x78/0x80
  RP(r2): io_do_iopoll+0xa8/0x4b0
Backtrace:
  [<00000000406e64f0>] io_do_iopoll+0xa8/0x4b0
  [<0000000040b45d88>] io_uring_try_cancel_requests+0x2a0/0x6a8
  [<0000000040b4628c>] io_ring_exit_work+0xfc/0x4d0
  [<0000000040268da8>] process_one_work+0x238/0x520
  [<00000000402692a4>] worker_thread+0x214/0x778
  [<0000000040276f94>] kthread+0x24c/0x258
  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28

watchdog: BUG: soft lockup - CPU#0 stuck for 49s! [kworker/u64:13:14621]
Modules linked in: binfmt_misc ext4 crc16 jbd2 ext2 mbcache sg ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse nfsd 
ip_tables x_tables ipv6 autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi ses enclosure scsi_transport_sas crc64_rocksoft crc64 sr_mod uas usb_storage 
cdrom ohci_pci ehci_pci ohci_hcd pata_cmd64x ehci_hcd sym53c8xx libata scsi_transport_spi usbcore tg3 scsi_mod scsi_common usb_common
CPU: 0 PID: 14621 Comm: kworker/u64:13 Tainted: G             L 6.1.12+ #5
Hardware name: 9000/800/rp3440
Workqueue: events_unbound io_ring_exit_work

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111100001111 Tainted: G             L
r00-03  000000ff0804ff0f 00000000670cc880 00000000406e64f0 00000000670cc880
r04-07  0000000040c099c0 00000000629d1f58 0000000000000000 000000006a406800
r08-11  00000000629d1b60 00000000629d1bd8 0000000000000003 00000000670cc888
r12-15  00000000670cc960 000000000000002e 0000000040c709c0 0000000000000000
r16-19  0000000040c709c0 0000000040c709c0 0000000059dc6d60 0000000000000000
r20-23  0000000000000001 0000000000000001 0000000000000000 0000000064252110
r24-27  0000000000000003 00000000670cc888 00000000670cc888 0000000040c099c0
r28-31  00000000629d1f58 00000000670cc960 00000000670cc990 0000000059dc6d60
sr00-03  000000000116c400 000000000116c400 0000000000000000 000000000116c400
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000406e6528 00000000406e652c
  IIR: 50bf3f11    ISR: 000000000116c400  IOR: 0000000000000000
  CPU:        0   CR30: 0000000059dc6d60 CR31: ffffffffffffefff
  ORIG_R28: 0000000000000000
  IAOQ[0]: io_do_iopoll+0xe0/0x4b0
  IAOQ[1]: io_do_iopoll+0xe4/0x4b0
  RP(r2): io_do_iopoll+0xa8/0x4b0
Backtrace:
  [<0000000040b45d88>] io_uring_try_cancel_requests+0x2a0/0x6a8
  [<0000000040b4628c>] io_ring_exit_work+0xfc/0x4d0
  [<0000000040268da8>] process_one_work+0x238/0x520
  [<00000000402692a4>] worker_thread+0x214/0x778
  [<0000000040276f94>] kthread+0x24c/0x258
  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28

rcu: INFO: rcu_sched self-detected stall on CPU
rcu:    0-....: (5979 ticks this GP) idle=b23c/1/0x4000000000000002 softirq=36165/36165 fqs=2989
         (t=6000 jiffies g=69677 q=1954 ncpus=4)
CPU: 0 PID: 14621 Comm: kworker/u64:13 Tainted: G             L 6.1.12+ #5
Hardware name: 9000/800/rp3440
Workqueue: events_unbound io_ring_exit_work

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111100001111 Tainted: G             L
r00-03  000000ff0804ff0f 00000000670cc880 00000000406e64f0 00000000670cc990
r04-07  0000000040c099c0 0000000069aa6d98 0000000000000000 000000006a406800
r08-11  0000000069aa6d20 00000000629d1f58 0000000000000003 00000000670cc888
r12-15  00000000670cc960 000000000000002e 0000000040c709c0 0000000000000000
r16-19  0000000040c709c0 0000000040c709c0 0000000059dc6d60 0000000000000000
r20-23  0000000000000001 0000000000000001 0000000000000000 0000000064252110
r24-27  0000000000000003 00000000670cc888 0000000000000000 0000000040c099c0
r28-31  0000000000000000 00000000670cc960 00000000670cca10 0000000059dc6d60
sr00-03  000000000116c400 000000000116c400 0000000000000000 000000000116c400
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040683e24 0000000040683e28
  IIR: 0c6110c2    ISR: 00000000670cc850  IOR: 00000000670ccbc0
  CPU:        0   CR30: 0000000059dc6d60 CR31: ffffffffffffefff
  ORIG_R28: 0000000040c709c0
  IAOQ[0]: iocb_bio_iopoll+0x64/0x80
  IAOQ[1]: iocb_bio_iopoll+0x68/0x80
  RP(r2): io_do_iopoll+0xa8/0x4b0
Backtrace:
  [<00000000406e64f0>] io_do_iopoll+0xa8/0x4b0
  [<0000000040b45d88>] io_uring_try_cancel_requests+0x2a0/0x6a8
  [<0000000040b4628c>] io_ring_exit_work+0xfc/0x4d0
  [<0000000040268da8>] process_one_work+0x238/0x520
  [<00000000402692a4>] worker_thread+0x214/0x778
  [<0000000040276f94>] kthread+0x24c/0x258
  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28

watchdog: BUG: soft lockup - CPU#0 stuck for 82s! [kworker/u64:13:14621]
Modules linked in: binfmt_misc ext4 crc16 jbd2 ext2 mbcache sg ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse nfsd 
ip_tables x_tables ipv6 autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi ses enclosure scsi_transport_sas crc64_rocksoft crc64 sr_mod uas usb_storage 
cdrom ohci_pci ehci_pci ohci_hcd pata_cmd64x ehci_hcd sym53c8xx libata scsi_transport_spi usbcore tg3 scsi_mod scsi_common usb_common
CPU: 0 PID: 14621 Comm: kworker/u64:13 Tainted: G             L 6.1.12+ #5
Hardware name: 9000/800/rp3440
Workqueue: events_unbound io_ring_exit_work

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111100001111 Tainted: G             L
r00-03  000000ff0804ff0f 00000000670cc880 00000000406e64f0 00000000670cc880
r04-07  0000000040c099c0 00000000629d1078 0000000000000000 000000006a406800
r08-11  00000000629d1000 00000000629d1158 0000000000000003 00000000670cc888
r12-15  00000000670cc960 000000000000002e 0000000040c709c0 0000000000000000
r16-19  0000000040c709c0 0000000040c709c0 0000000059dc6d60 0000000000000000
r20-23  0000000000000001 0000000000000001 0000000000000000 0000000064252110
r24-27  0000000000000003 00000000670cc888 00000000629d1000 0000000040c099c0
r28-31  0000000040ba7940 00000000670cc960 00000000670cc990 0000000000000002
sr00-03  000000000116c400 000000000116c400 0000000000000000 000000000116c400
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000406e64e4 00000000406e64e8
  IIR: 53820020    ISR: 000000000116c400  IOR: 0000000000000000
  CPU:        0   CR30: 0000000059dc6d60 CR31: ffffffffffffefff
  ORIG_R28: 0000000000000000
  IAOQ[0]: io_do_iopoll+0x9c/0x4b0
  IAOQ[1]: io_do_iopoll+0xa0/0x4b0
  RP(r2): io_do_iopoll+0xa8/0x4b0
Backtrace:
  [<0000000040b45d88>] io_uring_try_cancel_requests+0x2a0/0x6a8
  [<0000000040b4628c>] io_ring_exit_work+0xfc/0x4d0
  [<0000000040268da8>] process_one_work+0x238/0x520
  [<00000000402692a4>] worker_thread+0x214/0x778
  [<0000000040276f94>] kthread+0x24c/0x258
  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28

watchdog: BUG: soft lockup - CPU#0 stuck for 108s! [kworker/u64:13:14621]
Modules linked in: binfmt_misc ext4 crc16 jbd2 ext2 mbcache sg ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse nfsd 
ip_tables x_tables ipv6 autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi ses enclosure scsi_transport_sas crc64_rocksoft crc64 sr_mod uas usb_storage 
cdrom ohci_pci ehci_pci ohci_hcd pata_cmd64x ehci_hcd sym53c8xx libata scsi_transport_spi usbcore tg3 scsi_mod scsi_common usb_common
CPU: 0 PID: 14621 Comm: kworker/u64:13 Tainted: G             L 6.1.12+ #5
Hardware name: 9000/800/rp3440
Workqueue: events_unbound io_ring_exit_work

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000001011001111111100001111 Tainted: G             L
r00-03  000000ff082cff0f 00000000670cc880 00000000406e64f0 00000000670cc990
r04-07  0000000040c099c0 0000000069aa63f8 0000000000000000 000000006a406800
r08-11  0000000069aa6380 0000000069aa6d98 0000000000000003 00000000670cc888
r12-15  00000000670cc960 000000000000002e 0000000040c709c0 0000000000000000
r16-19  0000000040c709c0 0000000040c709c0 0000000059dc6d60 0000000000000000
r20-23  0000000000000001 0000000000000001 0000000000000000 0000000064252110
r24-27  0000000000000003 00000000670cc888 0000000000000000 0000000040c099c0
r28-31  0000000059dc6d60 00000000670cc960 00000000670cca10 0000000000000001
sr00-03  000000000116c400 000000000116c400 0000000000000000 000000000116c400
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040683df0 0000000040683e38
  IIR: 0f5010df    ISR: 00000000670cc850  IOR: 0000000000000001
  CPU:        0   CR30: 0000000059dc6d60 CR31: ffffffffffffefff
  ORIG_R28: 0000000040c709c0
  IAOQ[0]: iocb_bio_iopoll+0x30/0x80
  IAOQ[1]: iocb_bio_iopoll+0x78/0x80
  RP(r2): io_do_iopoll+0xa8/0x4b0
Backtrace:
  [<00000000406e64f0>] io_do_iopoll+0xa8/0x4b0
  [<0000000040b45d88>] io_uring_try_cancel_requests+0x2a0/0x6a8
  [<0000000040b4628c>] io_ring_exit_work+0xfc/0x4d0
  [<0000000040268da8>] process_one_work+0x238/0x520
  [<00000000402692a4>] worker_thread+0x214/0x778
  [<0000000040276f94>] kthread+0x24c/0x258
  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28

watchdog: BUG: soft lockup - CPU#0 stuck for 134s! [kworker/u64:13:14621]
Modules linked in: binfmt_misc ext4 crc16 jbd2 ext2 mbcache sg ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse nfsd 
ip_tables x_tables ipv6 autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi ses enclosure scsi_transport_sas crc64_rocksoft crc64 sr_mod uas usb_storage 
cdrom ohci_pci ehci_pci ohci_hcd pata_cmd64x ehci_hcd sym53c8xx libata scsi_transport_spi usbcore tg3 scsi_mod scsi_common usb_common
CPU: 0 PID: 14621 Comm: kworker/u64:13 Tainted: G             L 6.1.12+ #5
Hardware name: 9000/800/rp3440
Workqueue: events_unbound io_ring_exit_work

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000001011001111111100001111 Tainted: G             L
r00-03  000000ff082cff0f 00000000670cc880 00000000406e64f0 00000000670cc990
r04-07  0000000040c099c0 00000000629d1e78 0000000000000000 000000006a406800
r08-11  00000000629d1e00 00000000629d1938 0000000000000003 00000000670cc888
r12-15  00000000670cc960 000000000000002e 0000000040c709c0 0000000000000000
r16-19  0000000040c709c0 0000000040c709c0 0000000059dc6d60 0000000000000000
r20-23  0000000000000001 0000000000000001 0000000000000000 0000000064252110
r24-27  0000000000000003 00000000670cc888 0000000000000000 0000000040c099c0
r28-31  0000000059dc6d60 00000000670cc960 00000000670cca10 0000000000000001
sr00-03  000000000116c400 000000000116c400 0000000000000000 000000000116c400
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040683df0 0000000040683e38
  IIR: 0f5010df    ISR: 0000000000000000  IOR: 0000000040c709c0
  CPU:        0   CR30: 0000000059dc6d60 CR31: ffffffffffffefff
  ORIG_R28: 0000000040203070
  IAOQ[0]: iocb_bio_iopoll+0x30/0x80
  IAOQ[1]: iocb_bio_iopoll+0x78/0x80
  RP(r2): io_do_iopoll+0xa8/0x4b0
Backtrace:
  [<00000000406e64f0>] io_do_iopoll+0xa8/0x4b0
  [<0000000040b45d88>] io_uring_try_cancel_requests+0x2a0/0x6a8
  [<0000000040b4628c>] io_ring_exit_work+0xfc/0x4d0
  [<0000000040268da8>] process_one_work+0x238/0x520
  [<00000000402692a4>] worker_thread+0x214/0x778
  [<0000000040276f94>] kthread+0x24c/0x258
  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28

Running test 232c93d07b74.t 4 sec [5]
Running test 35fa71a030ca.t 5 sec [5]
Running test 500f9fbadef8.t 25 sec [25]
Running test 7ad0e4b2f83c.t 1 sec [1]
Running test 8a9973408177.t 0 sec [1]
Running test 917257daa0fe.t 0 sec [0]
Running test a0908ae19763.t 0 sec [0]
Running test a4c0b3decb33.t Test a4c0b3decb33.t timed out (may not be a failure)
Running test accept.t 2 sec [1]
Running test accept-link.t 0 sec [1]
Running test accept-reuse.t 0 sec [0]
Running test accept-test.t 0 sec [0]
Running test across-fork.t 0 sec [0]
Running test b19062a56726.t 0 sec [0]
Running test b5837bd5311d.t 0 sec [0]
Running test buf-ring.t bad run 0/0 = -233
test_running(1) failed
Test buf-ring.t failed with ret 1
Running test ce593a6c480a.t 1 sec [1]
Running test close-opath.t 0 sec [0]
Running test connect.t 0 sec [0]
Running test cq-full.t 0 sec [0]
Running test cq-overflow.t 11 sec [12]
Running test cq-peek-batch.t 0 sec [0]
Running test cq-ready.t 1 sec [0]
Running test cq-size.t 0 sec [0]
Running test d4ae271dfaae.t 0 sec [0]
Running test d77a67ed5f27.t 0 sec [0]
Running test defer.t 3 sec [4]
Running test defer-taskrun.t 1 sec [0]
Running test double-poll-crash.t Skipped
Running test drop-submit.t 0 sec [0]
Running test eeed8b54e0df.t 0 sec [0]
Running test empty-eownerdead.t 0 sec [0]
Running test eploop.t 0 sec [0]
Running test eventfd.t 0 sec [0]
Running test eventfd-disable.t 0 sec [0]
Running test eventfd-reg.t 0 sec [0]
Running test eventfd-ring.t 0 sec [1]
Running test evloop.t 0 sec [0]
Running test exec-target.t 0 sec [0]
Running test exit-no-cleanup.t 1 sec [0]
Running test fadvise.t 0 sec [0]
Running test fallocate.t 0 sec [0]
Running test fc2a85cb02ef.t Test needs failslab/fail_futex/fail_page_alloc enabled, skipped
Skipped
Running test fd-pass.t 0 sec [0]
Running test file-register.t 3 sec [4]
Running test files-exit-hang-poll.t 1 sec [1]
Running test files-exit-hang-timeout.t 1 sec [1]
Running test file-update.t 0 sec [0]
Running test file-verify.t Found 2784, wanted 527072
Buffered novec reg test failed
Test file-verify.t failed with ret 1
Running test fixed-buf-iter.t 0 sec [0]
Running test fixed-link.t 0 sec [0]
Running test fixed-reuse.t 0 sec [0]
Running test fpos.t 1 sec [0]
Running test fsnotify.t Skipped
Running test fsync.t 0 sec [0]
Running test hardlink.t 0 sec [0]
Running test io-cancel.t 3 sec [3]
Running test iopoll.t 7 sec [2]
Running test iopoll-leak.t 0 sec [0]
Running test iopoll-overflow.t 1 sec [1]
Running test io_uring_enter.t 1 sec [0]
Running test io_uring_passthrough.t Skipped
Running test io_uring_register.t Unable to map a huge page.  Try increasing /proc/sys/vm/nr_hugepages by at least 1.
Skipping the hugepage test

Message from syslogd@mx3210 at Feb 15 22:04:15 ...
  kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u64:13:14621]

Message from syslogd@mx3210 at Feb 15 22:04:43 ...
  kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 49s! [kworker/u64:13:14621]
Test io_uring_register.t timed out (may not be a failure)
Running test io_uring_setup.t 0 sec [0]
Running test lfs-openat.t 0 sec [0]
Running test lfs-openat-write.t 0 sec [0]
Running test link.t 0 sec [0]
Running test link_drain.t 3 sec [3]
Running test link-timeout.t 1 sec [1]
Running test madvise.t
Message from syslogd@mx3210 at Feb 15 22:05:19 ...
  kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 82s! [kworker/u64:13:14621]

Message from syslogd@mx3210 at Feb 15 22:05:47 ...
  kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 108s! [kworker/u64:13:14621]
^C^C^C^C^C^C^C^C
Message from syslogd@mx3210 at Feb 15 22:06:15 ...
  kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 134s! [kworker/u64:13:14621]

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 22:10                                                           ` John David Anglin
@ 2023-02-15 23:02                                                             ` Jens Axboe
  2023-02-15 23:43                                                               ` John David Anglin
  2023-02-16  2:40                                                               ` John David Anglin
  0 siblings, 2 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 23:02 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/15/23 3:10 PM, John David Anglin wrote:
> On 2023-02-15 4:39 p.m., John David Anglin wrote:
>> On 2023-02-15 4:06 p.m., John David Anglin wrote:
>>> On 2023-02-15 3:37 p.m., Jens Axboe wrote:
>>>>> System crashes running test buf-ring.t.
>>>> Huh, what's the crash?
>>> Not much info.  System log indicates an HPMC occurred. Unfortunately, recovery code doesn't work.
>> The following occurred running buf-ring.t under gdb:
>>
>> INFO: task kworker/u64:9:18319 blocked for more than 123 seconds.
>>       Not tainted 6.1.12+ #4
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/u64:9   state:D stack:0     pid:18319 ppid:2 flags:0x00000000
>> Workqueue: events_unbound io_ring_exit_work
>> Backtrace:
>>  [<0000000040b5c210>] __schedule+0x2e8/0x7f0
>>  [<0000000040b5c7d0>] schedule+0xb8/0x1d0
>>  [<0000000040b66534>] schedule_timeout+0x11c/0x1b0
>>  [<0000000040b5d71c>] __wait_for_common+0x194/0x2e8
>>  [<0000000040b5d8ac>] wait_for_completion+0x3c/0x50
>>  [<0000000040b46508>] io_ring_exit_work+0x3d8/0x4d0
>>  [<0000000040268da8>] process_one_work+0x238/0x520
>>  [<00000000402692a4>] worker_thread+0x214/0x778
>>  [<0000000040276f94>] kthread+0x24c/0x258
>>  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28
>>
>> INFO: task kworker/u64:10:18320 blocked for more than 123 seconds.
>>       Not tainted 6.1.12+ #4
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/u64:10  state:D stack:0     pid:18320 ppid:2 flags:0x00000000
>> Workqueue: events_unbound io_ring_exit_work
>> Backtrace:
>>  [<0000000040b5c210>] __schedule+0x2e8/0x7f0
>>  [<0000000040b5c7d0>] schedule+0xb8/0x1d0
>>  [<0000000040b66534>] schedule_timeout+0x11c/0x1b0
>>  [<0000000040b5d71c>] __wait_for_common+0x194/0x2e8
>>  [<0000000040b5d8ac>] wait_for_completion+0x3c/0x50
>>  [<0000000040b46508>] io_ring_exit_work+0x3d8/0x4d0
>>  [<0000000040268da8>] process_one_work+0x238/0x520
>>  [<00000000402692a4>] worker_thread+0x214/0x778
>>  [<0000000040276f94>] kthread+0x24c/0x258
>>  [<0000000040202020>] ret_from_kernel_thread+0x20/0x28
>>
>> gdb was sitting at a break at line 328.
> With Helge's latest patch, we get a software lockup:
> 
> TCP: request_sock_TCP: Possible SYN flooding on port 31309. Sending cookies.  Check SNMP counters.
> watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u64:13:14621]
> Modules linked in: binfmt_misc ext4 crc16 jbd2 ext2 mbcache sg ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse nfsd ip_tables x_tables ipv6 autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi ses enclosure scsi_transport_sas crc64_rocksoft crc64 sr_mod uas usb_storage cdrom ohci_pci ehci_pci ohci_hcd pata_cmd64x ehci_hcd sym53c8xx libata scsi_transport_spi usbcore tg3 scsi_mod scsi_common usb_common
> CPU: 0 PID: 14621 Comm: kworker/u64:13 Not tainted 6.1.12+ #5
> Hardware name: 9000/800/rp3440
> Workqueue: events_unbound io_ring_exit_work

This is not related to Helge's patch, 6.1-stable is just still missing:

commit fcc926bb857949dbfa51a7d95f3f5ebc657f198c
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Jan 27 09:28:13 2023 -0700

    io_uring: add a conditional reschedule to the IOPOLL cancelation loop

and I'm guessing you're running without preempt.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 21:39                                                         ` John David Anglin
  2023-02-15 22:10                                                           ` John David Anglin
@ 2023-02-15 23:03                                                           ` Jens Axboe
  1 sibling, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 23:03 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/15/23 2:39 PM, John David Anglin wrote:
> On 2023-02-15 4:06 p.m., John David Anglin wrote:
>> On 2023-02-15 3:37 p.m., Jens Axboe wrote:
>>>> System crashes running test buf-ring.t.
>>> Huh, what's the crash?
>> Not much info.  System log indicates an HPMC occurred. Unfortunately, recovery code doesn't work.
> The following occurred running buf-ring.t under gdb:
> 
> INFO: task kworker/u64:9:18319 blocked for more than 123 seconds.
>       Not tainted 6.1.12+ #4
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u64:9   state:D stack:0     pid:18319 ppid:2 flags:0x00000000
> Workqueue: events_unbound io_ring_exit_work
> Backtrace:

I don't think this is buf-ring, it's off the exit path which is offloaded
and most likely related to your iopoll report.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 21:40                                               ` Helge Deller
@ 2023-02-15 23:04                                                 ` Jens Axboe
  0 siblings, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-15 23:04 UTC (permalink / raw)
  To: Helge Deller, John David Anglin; +Cc: io-uring

On 2/15/23 2:40 PM, Helge Deller wrote:
> Here is an updated patch which
> - should support other platforms which needs aliasing
> - allows users to pass in an address in mmap(). This is checked
>   and returned -EINVAL if it does not fullfill the aliasing.
>   (this part is untested up to now!)
> 
> 
> Jens, I think you need to add the "_FILE_OFFSET_BITS=64" define
> when compiling your testsuite, e.g. for lfs-openat.t and lfs-openat-write.t

I fixed this earlier, just adding O_LARGEFILE to those two tests.
Did you test before that, or is there still an issue?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 23:02                                                             ` Jens Axboe
@ 2023-02-15 23:43                                                               ` John David Anglin
  2023-02-16  2:40                                                               ` John David Anglin
  1 sibling, 0 replies; 48+ messages in thread
From: John David Anglin @ 2023-02-15 23:43 UTC (permalink / raw)
  To: Jens Axboe, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2023-02-15 6:02 p.m., Jens Axboe wrote:
> and I'm guessing you're running without preempt.
Correct.

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-15 23:02                                                             ` Jens Axboe
  2023-02-15 23:43                                                               ` John David Anglin
@ 2023-02-16  2:40                                                               ` John David Anglin
  2023-02-16  2:50                                                                 ` Jens Axboe
  1 sibling, 1 reply; 48+ messages in thread
From: John David Anglin @ 2023-02-16  2:40 UTC (permalink / raw)
  To: Jens Axboe, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2023-02-15 6:02 p.m., Jens Axboe wrote:
> This is not related to Helge's patch, 6.1-stable is just still missing:
>
> commit fcc926bb857949dbfa51a7d95f3f5ebc657f198c
> Author: Jens Axboe<axboe@kernel.dk>
> Date:   Fri Jan 27 09:28:13 2023 -0700
>
>      io_uring: add a conditional reschedule to the IOPOLL cancelation loop
>
> and I'm guessing you're running without preempt.
With 6.2.0-rc8+, I had a different crash running poll-race-mshot.t:

Backtrace:


Kernel Fault: Code=15 (Data TLB miss fault) at addr 0000000000000000
CPU: 0 PID: 18265 Comm: poll-race-mshot Not tainted 6.2.0-rc8+ #1
Hardware name: 9000/800/rp3440

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00010000001001001001000111110000 Not tainted
r00-03  00000000102491f0 ffffffffffffffff 000000004020307c ffffffffffffffff
r04-07  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
r08-11  ffffffffffffffff 000000000407ef28 000000000407f838 8400000000800000
r12-15  0000000000000000 0000000040c424e0 0000000040c424e0 0000000040c424e0
r16-19  000000000407fd68 0000000063f08648 0000000040c424e0 000000000a085000
r20-23  00000000000d6b44 000000002faf0800 00000000000000ff 0000000000000002
r24-27  000000000407fa30 000000000407fd68 0000000000000000 0000000040c1e4e0
r28-31  400000000000de84 0000000000000000 0000000000000000 0000000000000002
sr00-03  0000000004081000 0000000000000000 0000000000000000 0000000004081de0
sr04-07  0000000004081000 0000000000000000 0000000000000000 00000000040815a8

IASQ: 0000000004081000 0000000000000000 IAOQ: 0000000000000000 0000000004081590
  IIR: 00000000    ISR: 0000000000000000  IOR: 0000000000000000
  CPU:        0   CR30: 000000004daf5700 CR31: ffffffffffffefff
  ORIG_R28: 0000000000000000
  IAOQ[0]: 0x0
  IAOQ[1]: linear_quiesce+0x0/0x18 [linear]
  RP(r2): intr_check_sig+0x0/0x3c
Backtrace:

Kernel panic - not syncing: Kernel Fault

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-16  2:40                                                               ` John David Anglin
@ 2023-02-16  2:50                                                                 ` Jens Axboe
  2023-02-16  8:24                                                                   ` Helge Deller
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2023-02-16  2:50 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, io-uring, linux-parisc, James Bottomley

On 2/15/23 7:40 PM, John David Anglin wrote:
> On 2023-02-15 6:02 p.m., Jens Axboe wrote:
>> This is not related to Helge's patch, 6.1-stable is just still missing:
>>
>> commit fcc926bb857949dbfa51a7d95f3f5ebc657f198c
>> Author: Jens Axboe<axboe@kernel.dk>
>> Date:   Fri Jan 27 09:28:13 2023 -0700
>>
>>      io_uring: add a conditional reschedule to the IOPOLL cancelation loop
>>
>> and I'm guessing you're running without preempt.
> With 6.2.0-rc8+, I had a different crash running poll-race-mshot.t:
> 
> Backtrace:
> 
> 
> Kernel Fault: Code=15 (Data TLB miss fault) at addr 0000000000000000
> CPU: 0 PID: 18265 Comm: poll-race-mshot Not tainted 6.2.0-rc8+ #1
> Hardware name: 9000/800/rp3440
> 
>      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> PSW: 00010000001001001001000111110000 Not tainted
> r00-03  00000000102491f0 ffffffffffffffff 000000004020307c ffffffffffffffff
> r04-07  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> r08-11  ffffffffffffffff 000000000407ef28 000000000407f838 8400000000800000
> r12-15  0000000000000000 0000000040c424e0 0000000040c424e0 0000000040c424e0
> r16-19  000000000407fd68 0000000063f08648 0000000040c424e0 000000000a085000
> r20-23  00000000000d6b44 000000002faf0800 00000000000000ff 0000000000000002
> r24-27  000000000407fa30 000000000407fd68 0000000000000000 0000000040c1e4e0
> r28-31  400000000000de84 0000000000000000 0000000000000000 0000000000000002
> sr00-03  0000000004081000 0000000000000000 0000000000000000 0000000004081de0
> sr04-07  0000000004081000 0000000000000000 0000000000000000 00000000040815a8
> 
> IASQ: 0000000004081000 0000000000000000 IAOQ: 0000000000000000 0000000004081590
>  IIR: 00000000    ISR: 0000000000000000  IOR: 0000000000000000
>  CPU:        0   CR30: 000000004daf5700 CR31: ffffffffffffefff
>  ORIG_R28: 0000000000000000
>  IAOQ[0]: 0x0
>  IAOQ[1]: linear_quiesce+0x0/0x18 [linear]
>  RP(r2): intr_check_sig+0x0/0x3c
> Backtrace:
> 
> Kernel panic - not syncing: Kernel Fault

This means very little to me, is it a NULL pointer deref? And where's
the backtrace?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-16  2:50                                                                 ` Jens Axboe
@ 2023-02-16  8:24                                                                   ` Helge Deller
  2023-02-16 15:22                                                                     ` Jens Axboe
  2023-02-16 20:35                                                                     ` John David Anglin
  0 siblings, 2 replies; 48+ messages in thread
From: Helge Deller @ 2023-02-16  8:24 UTC (permalink / raw)
  To: Jens Axboe, John David Anglin, io-uring, linux-parisc

On 2/16/23 03:50, Jens Axboe wrote:
> On 2/15/23 7:40 PM, John David Anglin wrote:
>> On 2023-02-15 6:02 p.m., Jens Axboe wrote:
>>> This is not related to Helge's patch, 6.1-stable is just still missing:
>>>
>>> commit fcc926bb857949dbfa51a7d95f3f5ebc657f198c
>>> Author: Jens Axboe<axboe@kernel.dk>
>>> Date:   Fri Jan 27 09:28:13 2023 -0700
>>>
>>>       io_uring: add a conditional reschedule to the IOPOLL cancelation loop
>>>
>>> and I'm guessing you're running without preempt.
>> With 6.2.0-rc8+, I had a different crash running poll-race-mshot.t:
>>
>> Backtrace:
>>
>>
>> Kernel Fault: Code=15 (Data TLB miss fault) at addr 0000000000000000
>> CPU: 0 PID: 18265 Comm: poll-race-mshot Not tainted 6.2.0-rc8+ #1
>> Hardware name: 9000/800/rp3440
>>
>>       YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
>> PSW: 00010000001001001001000111110000 Not tainted
>> r00-03  00000000102491f0 ffffffffffffffff 000000004020307c ffffffffffffffff
>> r04-07  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
>> r08-11  ffffffffffffffff 000000000407ef28 000000000407f838 8400000000800000
>> r12-15  0000000000000000 0000000040c424e0 0000000040c424e0 0000000040c424e0
>> r16-19  000000000407fd68 0000000063f08648 0000000040c424e0 000000000a085000
>> r20-23  00000000000d6b44 000000002faf0800 00000000000000ff 0000000000000002
>> r24-27  000000000407fa30 000000000407fd68 0000000000000000 0000000040c1e4e0
>> r28-31  400000000000de84 0000000000000000 0000000000000000 0000000000000002
>> sr00-03  0000000004081000 0000000000000000 0000000000000000 0000000004081de0
>> sr04-07  0000000004081000 0000000000000000 0000000000000000 00000000040815a8
>>
>> IASQ: 0000000004081000 0000000000000000 IAOQ: 0000000000000000 0000000004081590
>>   IIR: 00000000    ISR: 0000000000000000  IOR: 0000000000000000
>>   CPU:        0   CR30: 000000004daf5700 CR31: ffffffffffffefff
>>   ORIG_R28: 0000000000000000
>>   IAOQ[0]: 0x0
>>   IAOQ[1]: linear_quiesce+0x0/0x18 [linear]
>>   RP(r2): intr_check_sig+0x0/0x3c
>> Backtrace:
>>
>> Kernel panic - not syncing: Kernel Fault
>
> This means very little to me, is it a NULL pointer deref? And where's
> the backtrace?

I see iopoll.t triggering the kernel to hang on 32-bit kernel.
System gets unresponsive, bug with sysrq-l I get:

[  880.020641] sysrq: Show backtrace of all active CPUs
[  880.024123] sysrq: CPU0:
[  880.024123] CPU: 0 PID: 7549 Comm: kworker/u32:7 Not tainted 6.1.12-32bit+ #1595
[  880.024123] Hardware name: 9000/785/C3700
[  880.024123] Workqueue: events_unbound io_ring_exit_work
[  880.024123]
[  880.024123]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[  880.024123] PSW: 00000000000011001111111100001111 Not tainted
[  880.024123] r00-03  000cff0f 19610540 104f7b70 19610540
[  880.024123] r04-07  1921a278 00000000 192c8400 1921b508
[  880.024123] r08-11  00000003 0000002e 195fd050 00000004
[  880.024123] r12-15  192c8710 10a77000 00000000 00002000
[  880.024123] r16-19  1921a210 1240c000 1240c060 1924aff0
[  880.024123] r20-23  00000002 00000000 104b4384 00000020
[  880.024123] r24-27  00000003 19610548 1921a210 10aba968
[  880.024123] r28-31  1094f5c0 0000000e 196105c0 104f7b70
[  880.024123] sr00-03  00000000 00001695 00000000 00001695
[  880.024123] sr04-07  00000000 00000000 00000000 00000000
[  880.024123]
[  880.024123] IASQ: 00000000 00000000 IAOQ: 104f7b6c 104b4384
[  880.024123]  IIR: 081f0242    ISR: 00002000  IOR: 00000000
[  880.024123]  CPU:        0   CR30: 195fd050 CR31: d237ffff
[  880.024123]  ORIG_R28: 00000000
[  880.024123]  IAOQ[0]: io_do_iopoll+0xb4/0x3a4
[  880.024123]  IAOQ[1]: iocb_bio_iopoll+0x0/0x50
[  880.024123]  RP(r2): io_do_iopoll+0xb8/0x3a4
[  880.024123] Backtrace:
[  880.024123]  [<1092a2b0>] io_uring_try_cancel_requests+0x184/0x3b0
[  880.024123]  [<1092a57c>] io_ring_exit_work+0xa0/0x4c4
[  880.024123]  [<101cb448>] process_one_work+0x1c4/0x3cc
[  880.024123]  [<101cb7d8>] worker_thread+0x188/0x4b4
[  880.024123]  [<101d5910>] kthread+0xec/0xf4
[  880.024123]  [<1018801c>] ret_from_kernel_thread+0x1c/0x24

Helge

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-16  8:24                                                                   ` Helge Deller
@ 2023-02-16 15:22                                                                     ` Jens Axboe
  2023-02-16 20:35                                                                     ` John David Anglin
  1 sibling, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2023-02-16 15:22 UTC (permalink / raw)
  To: Helge Deller, John David Anglin, io-uring, linux-parisc

On 2/16/23 1:24 AM, Helge Deller wrote:
> On 2/16/23 03:50, Jens Axboe wrote:
>> On 2/15/23 7:40 PM, John David Anglin wrote:
>>> On 2023-02-15 6:02 p.m., Jens Axboe wrote:
>>>> This is not related to Helge's patch, 6.1-stable is just still missing:
>>>>
>>>> commit fcc926bb857949dbfa51a7d95f3f5ebc657f198c
>>>> Author: Jens Axboe<axboe@kernel.dk>
>>>> Date:   Fri Jan 27 09:28:13 2023 -0700
>>>>
>>>>       io_uring: add a conditional reschedule to the IOPOLL cancelation loop
>>>>
>>>> and I'm guessing you're running without preempt.
>>> With 6.2.0-rc8+, I had a different crash running poll-race-mshot.t:
>>>
>>> Backtrace:
>>>
>>>
>>> Kernel Fault: Code=15 (Data TLB miss fault) at addr 0000000000000000
>>> CPU: 0 PID: 18265 Comm: poll-race-mshot Not tainted 6.2.0-rc8+ #1
>>> Hardware name: 9000/800/rp3440
>>>
>>>       YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
>>> PSW: 00010000001001001001000111110000 Not tainted
>>> r00-03  00000000102491f0 ffffffffffffffff 000000004020307c ffffffffffffffff
>>> r04-07  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
>>> r08-11  ffffffffffffffff 000000000407ef28 000000000407f838 8400000000800000
>>> r12-15  0000000000000000 0000000040c424e0 0000000040c424e0 0000000040c424e0
>>> r16-19  000000000407fd68 0000000063f08648 0000000040c424e0 000000000a085000
>>> r20-23  00000000000d6b44 000000002faf0800 00000000000000ff 0000000000000002
>>> r24-27  000000000407fa30 000000000407fd68 0000000000000000 0000000040c1e4e0
>>> r28-31  400000000000de84 0000000000000000 0000000000000000 0000000000000002
>>> sr00-03  0000000004081000 0000000000000000 0000000000000000 0000000004081de0
>>> sr04-07  0000000004081000 0000000000000000 0000000000000000 00000000040815a8
>>>
>>> IASQ: 0000000004081000 0000000000000000 IAOQ: 0000000000000000 0000000004081590
>>>   IIR: 00000000    ISR: 0000000000000000  IOR: 0000000000000000
>>>   CPU:        0   CR30: 000000004daf5700 CR31: ffffffffffffefff
>>>   ORIG_R28: 0000000000000000
>>>   IAOQ[0]: 0x0
>>>   IAOQ[1]: linear_quiesce+0x0/0x18 [linear]
>>>   RP(r2): intr_check_sig+0x0/0x3c
>>> Backtrace:
>>>
>>> Kernel panic - not syncing: Kernel Fault
>>
>> This means very little to me, is it a NULL pointer deref? And where's
>> the backtrace?
> 
> I see iopoll.t triggering the kernel to hang on 32-bit kernel.
> System gets unresponsive, bug with sysrq-l I get:
> 
> [  880.020641] sysrq: Show backtrace of all active CPUs
> [  880.024123] sysrq: CPU0:
> [  880.024123] CPU: 0 PID: 7549 Comm: kworker/u32:7 Not tainted 6.1.12-32bit+ #1595
> [  880.024123] Hardware name: 9000/785/C3700
> [  880.024123] Workqueue: events_unbound io_ring_exit_work
> [  880.024123]
> [  880.024123]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [  880.024123] PSW: 00000000000011001111111100001111 Not tainted
> [  880.024123] r00-03  000cff0f 19610540 104f7b70 19610540
> [  880.024123] r04-07  1921a278 00000000 192c8400 1921b508
> [  880.024123] r08-11  00000003 0000002e 195fd050 00000004
> [  880.024123] r12-15  192c8710 10a77000 00000000 00002000
> [  880.024123] r16-19  1921a210 1240c000 1240c060 1924aff0
> [  880.024123] r20-23  00000002 00000000 104b4384 00000020
> [  880.024123] r24-27  00000003 19610548 1921a210 10aba968
> [  880.024123] r28-31  1094f5c0 0000000e 196105c0 104f7b70
> [  880.024123] sr00-03  00000000 00001695 00000000 00001695
> [  880.024123] sr04-07  00000000 00000000 00000000 00000000
> [  880.024123]
> [  880.024123] IASQ: 00000000 00000000 IAOQ: 104f7b6c 104b4384
> [  880.024123]  IIR: 081f0242    ISR: 00002000  IOR: 00000000
> [  880.024123]  CPU:        0   CR30: 195fd050 CR31: d237ffff
> [  880.024123]  ORIG_R28: 00000000
> [  880.024123]  IAOQ[0]: io_do_iopoll+0xb4/0x3a4
> [  880.024123]  IAOQ[1]: iocb_bio_iopoll+0x0/0x50
> [  880.024123]  RP(r2): io_do_iopoll+0xb8/0x3a4
> [  880.024123] Backtrace:
> [  880.024123]  [<1092a2b0>] io_uring_try_cancel_requests+0x184/0x3b0
> [  880.024123]  [<1092a57c>] io_ring_exit_work+0xa0/0x4c4
> [  880.024123]  [<101cb448>] process_one_work+0x1c4/0x3cc
> [  880.024123]  [<101cb7d8>] worker_thread+0x188/0x4b4
> [  880.024123]  [<101d5910>] kthread+0xec/0xf4
> [  880.024123]  [<1018801c>] ret_from_kernel_thread+0x1c/0x24

See the other email, a patch for that has been in the 6.3 for a
while and marked for backport.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: io_uring failure on parisc with VIPT caches
  2023-02-16  8:24                                                                   ` Helge Deller
  2023-02-16 15:22                                                                     ` Jens Axboe
@ 2023-02-16 20:35                                                                     ` John David Anglin
  1 sibling, 0 replies; 48+ messages in thread
From: John David Anglin @ 2023-02-16 20:35 UTC (permalink / raw)
  To: Helge Deller, Jens Axboe, io-uring, linux-parisc

On 2023-02-16 3:24 a.m., Helge Deller wrote:
> On 2/16/23 03:50, Jens Axboe wrote:
>> On 2/15/23 7:40 PM, John David Anglin wrote:
>>> On 2023-02-15 6:02 p.m., Jens Axboe wrote:
>>>> This is not related to Helge's patch, 6.1-stable is just still missing:
>>>>
>>>> commit fcc926bb857949dbfa51a7d95f3f5ebc657f198c
>>>> Author: Jens Axboe<axboe@kernel.dk>
>>>> Date:   Fri Jan 27 09:28:13 2023 -0700
>>>>
>>>>       io_uring: add a conditional reschedule to the IOPOLL cancelation loop
>>>>
>>>> and I'm guessing you're running without preempt.
>>> With 6.2.0-rc8+, I had a different crash running poll-race-mshot.t:
>>>
>>> Backtrace:
>>>
>>>
>>> Kernel Fault: Code=15 (Data TLB miss fault) at addr 0000000000000000
>>> CPU: 0 PID: 18265 Comm: poll-race-mshot Not tainted 6.2.0-rc8+ #1
>>> Hardware name: 9000/800/rp3440
>>>
>>>       YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
>>> PSW: 00010000001001001001000111110000 Not tainted
>>> r00-03  00000000102491f0 ffffffffffffffff 000000004020307c ffffffffffffffff
>>> r04-07  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
>>> r08-11  ffffffffffffffff 000000000407ef28 000000000407f838 8400000000800000
>>> r12-15  0000000000000000 0000000040c424e0 0000000040c424e0 0000000040c424e0
>>> r16-19  000000000407fd68 0000000063f08648 0000000040c424e0 000000000a085000
>>> r20-23  00000000000d6b44 000000002faf0800 00000000000000ff 0000000000000002
>>> r24-27  000000000407fa30 000000000407fd68 0000000000000000 0000000040c1e4e0
>>> r28-31  400000000000de84 0000000000000000 0000000000000000 0000000000000002
>>> sr00-03  0000000004081000 0000000000000000 0000000000000000 0000000004081de0
>>> sr04-07  0000000004081000 0000000000000000 0000000000000000 00000000040815a8
>>>
>>> IASQ: 0000000004081000 0000000000000000 IAOQ: 0000000000000000 0000000004081590
>>>   IIR: 00000000    ISR: 0000000000000000  IOR: 0000000000000000
>>>   CPU:        0   CR30: 000000004daf5700 CR31: ffffffffffffefff
>>>   ORIG_R28: 0000000000000000
>>>   IAOQ[0]: 0x0
>>>   IAOQ[1]: linear_quiesce+0x0/0x18 [linear]
>>>   RP(r2): intr_check_sig+0x0/0x3c
>>> Backtrace:
>>>
>>> Kernel panic - not syncing: Kernel Fault
>>
>> This means very little to me, is it a NULL pointer deref? And where's
>> the backtrace?
>
> I see iopoll.t triggering the kernel to hang on 32-bit kernel.
> System gets unresponsive, bug with sysrq-l I get:
>
> [  880.020641] sysrq: Show backtrace of all active CPUs
> [  880.024123] sysrq: CPU0:
> [  880.024123] CPU: 0 PID: 7549 Comm: kworker/u32:7 Not tainted 6.1.12-32bit+ #1595
> [  880.024123] Hardware name: 9000/785/C3700
> [  880.024123] Workqueue: events_unbound io_ring_exit_work
> [  880.024123]
> [  880.024123]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [  880.024123] PSW: 00000000000011001111111100001111 Not tainted
> [  880.024123] r00-03  000cff0f 19610540 104f7b70 19610540
> [  880.024123] r04-07  1921a278 00000000 192c8400 1921b508
> [  880.024123] r08-11  00000003 0000002e 195fd050 00000004
> [  880.024123] r12-15  192c8710 10a77000 00000000 00002000
> [  880.024123] r16-19  1921a210 1240c000 1240c060 1924aff0
> [  880.024123] r20-23  00000002 00000000 104b4384 00000020
> [  880.024123] r24-27  00000003 19610548 1921a210 10aba968
> [  880.024123] r28-31  1094f5c0 0000000e 196105c0 104f7b70
> [  880.024123] sr00-03  00000000 00001695 00000000 00001695
> [  880.024123] sr04-07  00000000 00000000 00000000 00000000
> [  880.024123]
> [  880.024123] IASQ: 00000000 00000000 IAOQ: 104f7b6c 104b4384
> [  880.024123]  IIR: 081f0242    ISR: 00002000  IOR: 00000000
> [  880.024123]  CPU:        0   CR30: 195fd050 CR31: d237ffff
> [  880.024123]  ORIG_R28: 00000000
> [  880.024123]  IAOQ[0]: io_do_iopoll+0xb4/0x3a4
> [  880.024123]  IAOQ[1]: iocb_bio_iopoll+0x0/0x50
> [  880.024123]  RP(r2): io_do_iopoll+0xb8/0x3a4
> [  880.024123] Backtrace:
> [  880.024123]  [<1092a2b0>] io_uring_try_cancel_requests+0x184/0x3b0
> [  880.024123]  [<1092a57c>] io_ring_exit_work+0xa0/0x4c4
> [  880.024123]  [<101cb448>] process_one_work+0x1c4/0x3cc
> [  880.024123]  [<101cb7d8>] worker_thread+0x188/0x4b4
> [  880.024123]  [<101d5910>] kthread+0xec/0xf4
> [  880.024123]  [<1018801c>] ret_from_kernel_thread+0x1c/0x24
I had updated to 6.2.0-rc8+ to avoid this issue.

I agree there's not a lot of helpful info in the dump.  Somehow, the code has branched to
location 0 and attempted to execute instruction 0.  RP points at intr_check_sig but not to
a valid return point for a call instruction.  In the dump above, SP is 0.  Maybe the stack
overflowed for the process?

I have run the test multiple times by itself.  It consistently generates a HPMC check.  The PIM
dump provides no more info than the above dump (i.e., kernel has tried to execute location 0).
It didn't appear SP had been clobbered in the PIM dump that I looked at.

Running the test under strace gives different points where the trace stops:

io_uring_setup(64, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=64, cq_entries=128, 
features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, 
sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=2144}, cq_off={head=32, tail=48, ring_mask=68, 
ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3

io_uring_enter(3, 64, 0, 0, NULL, 8)    = 64

io_uring_setup(64, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=64, cq_entries=128, 
features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, 
sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=2144}, cq_off={head=32, tail=48, ring_mask=68, 
ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
mmap2(NULL, 2400, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf8cad000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2023-02-16 20:36 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-12  9:47 io_uring failure on parisc (32-bit userspace and 64-bit kernel) Helge Deller
2023-02-12 13:16 ` Jens Axboe
2023-02-12 13:28   ` Helge Deller
2023-02-12 13:35     ` Jens Axboe
2023-02-12 14:00       ` Jens Axboe
2023-02-12 14:03       ` Helge Deller
2023-02-12 19:35         ` Helge Deller
2023-02-12 19:42           ` Jens Axboe
2023-02-12 20:01             ` Helge Deller
2023-02-12 21:48               ` Jens Axboe
2023-02-12 22:20                 ` Helge Deller
2023-02-12 22:31                   ` Helge Deller
2023-02-13 16:15                     ` Jens Axboe
2023-02-13 20:59                       ` Helge Deller
2023-02-13 21:05                         ` Jens Axboe
2023-02-13 22:05                           ` Helge Deller
2023-02-13 22:50                             ` John David Anglin
2023-02-14 23:09                               ` io_uring failure on parisc with VIPT caches Helge Deller
2023-02-14 23:29                                 ` Jens Axboe
2023-02-15  2:12                                   ` John David Anglin
2023-02-15 15:16                                     ` Jens Axboe
2023-02-15 15:52                                       ` Helge Deller
2023-02-15 15:56                                         ` Jens Axboe
2023-02-15 16:02                                           ` Helge Deller
2023-02-15 16:04                                             ` Jens Axboe
2023-02-15 21:40                                               ` Helge Deller
2023-02-15 23:04                                                 ` Jens Axboe
2023-02-15 16:38                                           ` John David Anglin
2023-02-15 17:01                                             ` Jens Axboe
2023-02-15 19:00                                               ` Jens Axboe
2023-02-15 19:16                                                 ` Jens Axboe
2023-02-15 20:27                                                   ` John David Anglin
2023-02-15 20:37                                                     ` Jens Axboe
2023-02-15 21:06                                                       ` John David Anglin
2023-02-15 21:38                                                         ` Jens Axboe
2023-02-15 21:39                                                         ` John David Anglin
2023-02-15 22:10                                                           ` John David Anglin
2023-02-15 23:02                                                             ` Jens Axboe
2023-02-15 23:43                                                               ` John David Anglin
2023-02-16  2:40                                                               ` John David Anglin
2023-02-16  2:50                                                                 ` Jens Axboe
2023-02-16  8:24                                                                   ` Helge Deller
2023-02-16 15:22                                                                     ` Jens Axboe
2023-02-16 20:35                                                                     ` John David Anglin
2023-02-15 23:03                                                           ` Jens Axboe
2023-02-15 19:20                                                 ` John David Anglin
2023-02-15 19:24                                                   ` Jens Axboe
2023-02-15 16:18                                         ` John David Anglin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.