linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: linux-block@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH 1/3] io_uring: add support for async work inheriting files table
Date: Wed, 23 Oct 2019 08:11:29 -0600	[thread overview]
Message-ID: <3b97233b-5d05-5efc-4173-e3a1ef177cbc@kernel.dk> (raw)
In-Reply-To: <20191023120446.75oxdwom34nhe3l5@olga.proxmox.com>

On 10/23/19 6:04 AM, Wolfgang Bumiller wrote:
> On Thu, Oct 17, 2019 at 03:28:56PM -0600, Jens Axboe wrote:
>> This is in preparation for adding opcodes that need to modify files
>> in a process file table, either adding new ones or closing old ones.
>>
>> If an opcode needs this, it must set REQ_F_NEED_FILES in the request
>> structure. If work that needs to get punted to async context have this
>> set, they will grab a reference to the process file table. When the
>> work is completed, the reference is dropped again.
> 
> I think IORING_OP_SENDMSG and _RECVMSG need to set this flag due to
> SCM_RIGHTS control messages.
> Thought I'd reply here since I just now ran into the issue that I was
> getting ever-increasing wrong file descriptor numbers on pretty much
> ever "other" async recvmsg() call I did via io-uring while receiving
> file descriptors from lxc for the seccomp-notify proxy. (I'm currently
> running an ubuntu based 5.3.1 kernel)
> I ended up finding them in /proc - they show up in all kernel threads,
> eg.:
> 
> root:/root # grep Name /proc/9/status
> Name:   mm_percpu_wq
> root:/root # ls -l /proc/9/fd
> total 0
> lr-x------ 1 root root 64 Oct 23 12:00 0 -> '/proc/512 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 1 -> /proc/512/mem
> lr-x------ 1 root root 64 Oct 23 12:00 10 -> '/proc/11782 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 11 -> /proc/11782/mem
> lr-x------ 1 root root 64 Oct 23 12:00 12 -> '/proc/12210 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 13 -> /proc/12210/mem
> lr-x------ 1 root root 64 Oct 23 12:00 14 -> '/proc/12298 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 15 -> /proc/12298/mem
> lr-x------ 1 root root 64 Oct 23 12:00 16 -> '/proc/13955 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 17 -> /proc/13955/mem
> lr-x------ 1 root root 64 Oct 23 12:00 18 -> '/proc/13989 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 19 -> /proc/13989/mem
> lr-x------ 1 root root 64 Oct 23 12:00 2 -> '/proc/584 (deleted)'
> lr-x------ 1 root root 64 Oct 23 12:00 20 -> '/proc/15502 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 21 -> /proc/15502/mem
> lr-x------ 1 root root 64 Oct 23 12:00 22 -> '/proc/15510 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 23 -> /proc/15510/mem
> lr-x------ 1 root root 64 Oct 23 12:00 24 -> '/proc/17833 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 25 -> /proc/17833/mem
> lr-x------ 1 root root 64 Oct 23 12:00 26 -> '/proc/17836 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 27 -> /proc/17836/mem
> lr-x------ 1 root root 64 Oct 23 12:00 28 -> '/proc/21929 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 29 -> /proc/21929/mem
> lrwx------ 1 root root 64 Oct 23 12:00 3 -> /proc/584/mem
> lr-x------ 1 root root 64 Oct 23 12:00 30 -> '/proc/22214 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 31 -> /proc/22214/mem
> lr-x------ 1 root root 64 Oct 23 12:00 32 -> '/proc/22283 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 33 -> /proc/22283/mem
> lr-x------ 1 root root 64 Oct 23 12:00 34 -> '/proc/29795 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 35 -> /proc/29795/mem
> lr-x------ 1 root root 64 Oct 23 12:00 36 -> '/proc/30124 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 37 -> /proc/30124/mem
> lr-x------ 1 root root 64 Oct 23 12:00 38 -> '/proc/31016 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 39 -> /proc/31016/mem
> lr-x------ 1 root root 64 Oct 23 12:00 4 -> '/proc/1632 (deleted)'
> lr-x------ 1 root root 64 Oct 23 12:00 40 -> '/proc/4137 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 41 -> /proc/4137/mem
> lrwx------ 1 root root 64 Oct 23 12:00 5 -> /proc/1632/mem
> lr-x------ 1 root root 64 Oct 23 12:00 6 -> '/proc/3655 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 7 -> /proc/3655/mem
> lr-x------ 1 root root 64 Oct 23 12:00 8 -> '/proc/7075 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 9 -> /proc/7075/mem
> root:/root #
> 
> Those are the fds I expected to receive, and I get fd numbers
> consistently increasing with them.
> lxc sends the syscall-executing process' pidfd and its 'mem' fd via a
> socket, but instead of making it to the receiver, they end up there...
> 
> I suspect that an async sendmsg() call could potentially end up
> accessing those instead of the ones from the sender process, but I
> haven't tested it...

Might "just" be a case of the sendmsg() being stuck, we can't currently
cancel work. So if they never complete, the ring won't go away.

Actually working on a small workqueue replacement for io_uring which
allow us to cancel things like that. It's a requirement for accept() as
well, but also for basic read/write send/recv on sockets. So used to
storage IO operations that complete in a finite amount of time...

But yes, I hope with that, and the flush trick that Jann suggested, that
we can make this 100% reliable for any type of operation.

-- 
Jens Axboe


  reply	other threads:[~2019-10-23 14:11 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 21:28 [PATCHSET] io_uring: add support for accept(4) Jens Axboe
2019-10-17 21:28 ` [PATCH 1/3] io_uring: add support for async work inheriting files table Jens Axboe
2019-10-18  2:41   ` Jann Horn
2019-10-18 14:01     ` Jens Axboe
2019-10-18 14:34       ` Jann Horn
2019-10-18 14:37         ` Jens Axboe
2019-10-18 14:40           ` Jann Horn
2019-10-18 14:43             ` Jens Axboe
2019-10-18 14:52               ` Jann Horn
2019-10-18 15:00                 ` Jens Axboe
2019-10-18 15:54                   ` Jens Axboe
2019-10-18 16:20                     ` Jann Horn
2019-10-18 16:36                       ` Jens Axboe
2019-10-18 17:05                         ` Jens Axboe
2019-10-18 18:06                           ` Jann Horn
2019-10-18 18:16                             ` Jens Axboe
2019-10-18 18:50                               ` Jann Horn
2019-10-24 19:41                                 ` Jens Axboe
2019-10-24 20:31                                   ` Jann Horn
2019-10-24 22:04                                     ` Jens Axboe
2019-10-24 22:09                                       ` Jens Axboe
2019-10-24 23:13                                       ` Jann Horn
2019-10-25  0:35                                         ` Jens Axboe
2019-10-25  0:52                                           ` Jens Axboe
2019-10-23 12:04   ` Wolfgang Bumiller
2019-10-23 14:11     ` Jens Axboe [this message]
2019-10-17 21:28 ` [PATCH 2/3] net: add __sys_accept4_file() helper Jens Axboe
2019-10-17 21:28 ` [PATCH 3/3] io_uring: add support for IORING_OP_ACCEPT Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b97233b-5d05-5efc-4173-e3a1ef177cbc@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=linux-block@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).