All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: io-uring@vger.kernel.org
Subject: Re: io_uring_prep_openat_direct() and link/drain
Date: Tue, 29 Mar 2022 12:40:17 -0600	[thread overview]
Message-ID: <89322bd1-5e6f-bcc6-7974-ffd22363a165@kernel.dk> (raw)
In-Reply-To: <CAJfpegs=GcTuXcor-pbhaAxDKeS5XRy5rwTGXUcZM0BYYUK2LA@mail.gmail.com>

On 3/29/22 12:31 PM, Miklos Szeredi wrote:
> On Tue, 29 Mar 2022 at 20:26, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 3/29/22 12:21 PM, Miklos Szeredi wrote:
>>> On Tue, 29 Mar 2022 at 19:04, Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> On 3/29/22 10:08 AM, Jens Axboe wrote:
>>>>> On 3/29/22 7:20 AM, Miklos Szeredi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to read multiple files with io_uring and getting stuck,
>>>>>> because the link and drain flags don't seem to do what they are
>>>>>> documented to do.
>>>>>>
>>>>>> Kernel is v5.17 and liburing is compiled from the git tree at
>>>>>> 7a3a27b6a384 ("add tests for nonblocking accept sockets").
>>>>>>
>>>>>> Without those flags the attached example works some of the time, but
>>>>>> that's probably accidental since ordering is not ensured.
>>>>>>
>>>>>> Adding the drain or link flags make it even worse (fail in casese that
>>>>>> the unordered one didn't).
>>>>>>
>>>>>> What am I missing?
>>>>>
>>>>> I don't think you're missing anything, it looks like a bug. What you
>>>>> want here is:
>>>>>
>>>>> prep_open_direct(sqe);
>>>>> sqe->flags |= IOSQE_IO_LINK;
>>>>> ...
>>>>> prep_read(sqe);
>>>
>>> So with the below merge this works.   But if instead I do
>>>
>>> prep_open_direct(sqe);
>>>  ...
>>> prep_read(sqe);
>>> sqe->flags |= IOSQE_IO_DRAIN;
>>>
>>> than it doesn't.  Shouldn't drain have a stronger ordering guarantee than link?
>>
>> I didn't test that, but I bet it's running into the same kind of issue
>> wrt prep. Are you getting -EBADF? The drain will indeed ensure that
>> _execution_ doesn't start until the previous requests have completed,
>> but it's still prepared before.
>>
>> For your use case, IO_LINK is what you want and that must work.
>>
>> I'll check the drain case just in case, it may in fact work if you just
>> edit the code base you're running now and remove these two lines from
>> io_init_req():
>>
>> if (unlikely(!req->file)) {
>> -        if (!ctx->submit_state.link.head)
>> -                return -EBADF;
>>         req->result = fd;
>>         req->flags |= REQ_F_DEFERRED_FILE;
>> }
>>
>> to not make it dependent on link.head. Probably not a bad idea in
>> general, as the rest of the handlers have been audited for req->file
>> usage in prep.
> 
> Nope, that results in the following Oops:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000044
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> CPU: 3 PID: 1126 Comm: readfiles Not tainted
> 5.17.0-00065-g3287b182c9c3-dirty #623
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014
> RIP: 0010:io_rw_init_file+0x15/0x170
> Code: 00 6d 22 82 0f 95 c0 83 c0 02 c3 66 2e 0f 1f 84 00 00 00 00 00
> 0f 1f 44 00 00 41 55 41 54 55 53 4c 8b 2f 4c 8b 67 58 8b 6f 20 <41> 23
> 75 44 0f 84 28 01 00 00 48 89 fb f6 47 44 01 0f 84 08 01 00
> RSP: 0018:ffffc9000108fba8 EFLAGS: 00010207
> RAX: 0000000000000001 RBX: ffff888103ddd688 RCX: ffffc9000108fc18
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888103ddd600
> RBP: 0000000000000000 R08: ffffc9000108fbd8 R09: 00007ffffffff000
> R10: 0000000000020000 R11: 000056012e2ce2e0 R12: ffff88810276b800
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff888103ddd600
> FS:  00007f9058d72580(0000) GS:ffff888237d80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000044 CR3: 0000000100966004 CR4: 0000000000370ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  io_read+0x65/0x4d0
>  ? select_task_rq_fair+0x602/0xf20
>  ? newidle_balance.constprop.0+0x2ff/0x3a0
>  io_issue_sqe+0xd86/0x21a0
>  ? __schedule+0x228/0x610
>  ? timerqueue_del+0x2a/0x40
>  io_req_task_submit+0x26/0x100
>  tctx_task_work+0x172/0x4b0
>  task_work_run+0x5c/0x90
>  io_cqring_wait+0x48d/0x790
>  ? io_eventfd_put+0x20/0x20
>  __do_sys_io_uring_enter+0x28d/0x5e0
>  ? __cond_resched+0x16/0x40
>  ? task_work_run+0x61/0x90
>  do_syscall_64+0x3b/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae

Ah yes that makes sense, since I only worried the prep file part up for
links. Forgot about that... Let me test, I'll see if it's feasible to do
for drain and send you an incremental.

-- 
Jens Axboe


  reply	other threads:[~2022-03-29 18:40 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-29 13:20 io_uring_prep_openat_direct() and link/drain Miklos Szeredi
2022-03-29 16:08 ` Jens Axboe
2022-03-29 17:04   ` Jens Axboe
2022-03-29 18:21     ` Miklos Szeredi
2022-03-29 18:26       ` Jens Axboe
2022-03-29 18:31         ` Miklos Szeredi
2022-03-29 18:40           ` Jens Axboe [this message]
2022-03-29 19:30             ` Miklos Szeredi
2022-03-29 20:03               ` Jens Axboe
2022-03-30  8:18                 ` Miklos Szeredi
2022-03-30 12:35                   ` Jens Axboe
2022-03-30 12:43                     ` Miklos Szeredi
2022-03-30 12:48                       ` Jens Axboe
2022-03-30 12:51                         ` Miklos Szeredi
2022-03-30 14:58                           ` Miklos Szeredi
2022-03-30 15:05                             ` Jens Axboe
2022-03-30 15:12                               ` Miklos Szeredi
2022-03-30 15:17                                 ` Jens Axboe
2022-03-30 15:53                                   ` Jens Axboe
2022-03-30 17:49                                     ` Jens Axboe
2022-04-01  8:40                                       ` Miklos Szeredi
2022-04-01 15:36                                         ` Jens Axboe
2022-04-01 16:02                                           ` Miklos Szeredi
2022-04-01 16:21                                             ` Jens Axboe
2022-04-02  1:17                                               ` Jens Axboe
2022-04-05  7:45                                                 ` Miklos Szeredi
2022-04-05 14:44                                                   ` Jens Axboe
2022-04-21 12:31                                                     ` Miklos Szeredi
2022-04-21 12:34                                                       ` Jens Axboe
2022-04-21 12:39                                                         ` Miklos Szeredi
2022-04-21 12:41                                                           ` Jens Axboe
2022-04-21 13:10                                                             ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89322bd1-5e6f-bcc6-7974-ffd22363a165@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.