All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Subject: Re: io_uring_prep_openat_direct() and link/drain
Date: Tue, 29 Mar 2022 20:31:33 +0200	[thread overview]
Message-ID: <CAJfpegs=GcTuXcor-pbhaAxDKeS5XRy5rwTGXUcZM0BYYUK2LA@mail.gmail.com> (raw)
In-Reply-To: <115fc7d1-9b9c-712b-e75d-39b2041df437@kernel.dk>

On Tue, 29 Mar 2022 at 20:26, Jens Axboe <axboe@kernel.dk> wrote:
>
> On 3/29/22 12:21 PM, Miklos Szeredi wrote:
> > On Tue, 29 Mar 2022 at 19:04, Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> On 3/29/22 10:08 AM, Jens Axboe wrote:
> >>> On 3/29/22 7:20 AM, Miklos Szeredi wrote:
> >>>> Hi,
> >>>>
> >>>> I'm trying to read multiple files with io_uring and getting stuck,
> >>>> because the link and drain flags don't seem to do what they are
> >>>> documented to do.
> >>>>
> >>>> Kernel is v5.17 and liburing is compiled from the git tree at
> >>>> 7a3a27b6a384 ("add tests for nonblocking accept sockets").
> >>>>
> >>>> Without those flags the attached example works some of the time, but
> >>>> that's probably accidental since ordering is not ensured.
> >>>>
> >>>> Adding the drain or link flags make it even worse (fail in casese that
> >>>> the unordered one didn't).
> >>>>
> >>>> What am I missing?
> >>>
> >>> I don't think you're missing anything, it looks like a bug. What you
> >>> want here is:
> >>>
> >>> prep_open_direct(sqe);
> >>> sqe->flags |= IOSQE_IO_LINK;
> >>> ...
> >>> prep_read(sqe);
> >
> > So with the below merge this works.   But if instead I do
> >
> > prep_open_direct(sqe);
> >  ...
> > prep_read(sqe);
> > sqe->flags |= IOSQE_IO_DRAIN;
> >
> > than it doesn't.  Shouldn't drain have a stronger ordering guarantee than link?
>
> I didn't test that, but I bet it's running into the same kind of issue
> wrt prep. Are you getting -EBADF? The drain will indeed ensure that
> _execution_ doesn't start until the previous requests have completed,
> but it's still prepared before.
>
> For your use case, IO_LINK is what you want and that must work.
>
> I'll check the drain case just in case, it may in fact work if you just
> edit the code base you're running now and remove these two lines from
> io_init_req():
>
> if (unlikely(!req->file)) {
> -        if (!ctx->submit_state.link.head)
> -                return -EBADF;
>         req->result = fd;
>         req->flags |= REQ_F_DEFERRED_FILE;
> }
>
> to not make it dependent on link.head. Probably not a bad idea in
> general, as the rest of the handlers have been audited for req->file
> usage in prep.

Nope, that results in the following Oops:

BUG: kernel NULL pointer dereference, address: 0000000000000044
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 1126 Comm: readfiles Not tainted
5.17.0-00065-g3287b182c9c3-dirty #623
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014
RIP: 0010:io_rw_init_file+0x15/0x170
Code: 00 6d 22 82 0f 95 c0 83 c0 02 c3 66 2e 0f 1f 84 00 00 00 00 00
0f 1f 44 00 00 41 55 41 54 55 53 4c 8b 2f 4c 8b 67 58 8b 6f 20 <41> 23
75 44 0f 84 28 01 00 00 48 89 fb f6 47 44 01 0f 84 08 01 00
RSP: 0018:ffffc9000108fba8 EFLAGS: 00010207
RAX: 0000000000000001 RBX: ffff888103ddd688 RCX: ffffc9000108fc18
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888103ddd600
RBP: 0000000000000000 R08: ffffc9000108fbd8 R09: 00007ffffffff000
R10: 0000000000020000 R11: 000056012e2ce2e0 R12: ffff88810276b800
R13: 0000000000000000 R14: 0000000000000000 R15: ffff888103ddd600
FS:  00007f9058d72580(0000) GS:ffff888237d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000044 CR3: 0000000100966004 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 io_read+0x65/0x4d0
 ? select_task_rq_fair+0x602/0xf20
 ? newidle_balance.constprop.0+0x2ff/0x3a0
 io_issue_sqe+0xd86/0x21a0
 ? __schedule+0x228/0x610
 ? timerqueue_del+0x2a/0x40
 io_req_task_submit+0x26/0x100
 tctx_task_work+0x172/0x4b0
 task_work_run+0x5c/0x90
 io_cqring_wait+0x48d/0x790
 ? io_eventfd_put+0x20/0x20
 __do_sys_io_uring_enter+0x28d/0x5e0
 ? __cond_resched+0x16/0x40
 ? task_work_run+0x61/0x90
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x56012d87159c
Code: c2 41 8b 54 24 04 8b bd cc 00 00 00 41 83 ca 10 f6 85 d0 00 00
00 01 4d 8b 44 24 10 44 0f 44 d0 45 8b 4c 24 0c 44 89 f0 0f 05 <41> 89
c3 85 c0 0f 88 4a ff ff ff 41 29 04 24 bf 01 00 00 00 48 85
RSP: 002b:00007ffc8db5c550 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000056012d87159c
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffc8db5c620 R08: 0000000000000000 R09: 0000000000000008
R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffc8db5c580
R13: 00007ffc8db5c618 R14: 00000000000001aa R15: 0000000000000000
 </TASK>
Modules linked in:
CR2: 0000000000000044
---[ end trace 0000000000000000 ]---

  reply	other threads:[~2022-03-29 18:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-29 13:20 io_uring_prep_openat_direct() and link/drain Miklos Szeredi
2022-03-29 16:08 ` Jens Axboe
2022-03-29 17:04   ` Jens Axboe
2022-03-29 18:21     ` Miklos Szeredi
2022-03-29 18:26       ` Jens Axboe
2022-03-29 18:31         ` Miklos Szeredi [this message]
2022-03-29 18:40           ` Jens Axboe
2022-03-29 19:30             ` Miklos Szeredi
2022-03-29 20:03               ` Jens Axboe
2022-03-30  8:18                 ` Miklos Szeredi
2022-03-30 12:35                   ` Jens Axboe
2022-03-30 12:43                     ` Miklos Szeredi
2022-03-30 12:48                       ` Jens Axboe
2022-03-30 12:51                         ` Miklos Szeredi
2022-03-30 14:58                           ` Miklos Szeredi
2022-03-30 15:05                             ` Jens Axboe
2022-03-30 15:12                               ` Miklos Szeredi
2022-03-30 15:17                                 ` Jens Axboe
2022-03-30 15:53                                   ` Jens Axboe
2022-03-30 17:49                                     ` Jens Axboe
2022-04-01  8:40                                       ` Miklos Szeredi
2022-04-01 15:36                                         ` Jens Axboe
2022-04-01 16:02                                           ` Miklos Szeredi
2022-04-01 16:21                                             ` Jens Axboe
2022-04-02  1:17                                               ` Jens Axboe
2022-04-05  7:45                                                 ` Miklos Szeredi
2022-04-05 14:44                                                   ` Jens Axboe
2022-04-21 12:31                                                     ` Miklos Szeredi
2022-04-21 12:34                                                       ` Jens Axboe
2022-04-21 12:39                                                         ` Miklos Szeredi
2022-04-21 12:41                                                           ` Jens Axboe
2022-04-21 13:10                                                             ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJfpegs=GcTuXcor-pbhaAxDKeS5XRy5rwTGXUcZM0BYYUK2LA@mail.gmail.com' \
    --to=miklos@szeredi.hu \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.