All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: linux-block@vger.kernel.org, davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH 1/3] io_uring: add support for async work inheriting files table
Date: Wed, 23 Oct 2019 08:11:29 -0600	[thread overview]
Message-ID: <3b97233b-5d05-5efc-4173-e3a1ef177cbc@kernel.dk> (raw)
In-Reply-To: <20191023120446.75oxdwom34nhe3l5@olga.proxmox.com>

On 10/23/19 6:04 AM, Wolfgang Bumiller wrote:
> On Thu, Oct 17, 2019 at 03:28:56PM -0600, Jens Axboe wrote:
>> This is in preparation for adding opcodes that need to modify files
>> in a process file table, either adding new ones or closing old ones.
>>
>> If an opcode needs this, it must set REQ_F_NEED_FILES in the request
>> structure. If work that needs to get punted to async context have this
>> set, they will grab a reference to the process file table. When the
>> work is completed, the reference is dropped again.
> 
> I think IORING_OP_SENDMSG and _RECVMSG need to set this flag due to
> SCM_RIGHTS control messages.
> Thought I'd reply here since I just now ran into the issue that I was
> getting ever-increasing wrong file descriptor numbers on pretty much
> ever "other" async recvmsg() call I did via io-uring while receiving
> file descriptors from lxc for the seccomp-notify proxy. (I'm currently
> running an ubuntu based 5.3.1 kernel)
> I ended up finding them in /proc - they show up in all kernel threads,
> eg.:
> 
> root:/root # grep Name /proc/9/status
> Name:   mm_percpu_wq
> root:/root # ls -l /proc/9/fd
> total 0
> lr-x------ 1 root root 64 Oct 23 12:00 0 -> '/proc/512 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 1 -> /proc/512/mem
> lr-x------ 1 root root 64 Oct 23 12:00 10 -> '/proc/11782 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 11 -> /proc/11782/mem
> lr-x------ 1 root root 64 Oct 23 12:00 12 -> '/proc/12210 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 13 -> /proc/12210/mem
> lr-x------ 1 root root 64 Oct 23 12:00 14 -> '/proc/12298 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 15 -> /proc/12298/mem
> lr-x------ 1 root root 64 Oct 23 12:00 16 -> '/proc/13955 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 17 -> /proc/13955/mem
> lr-x------ 1 root root 64 Oct 23 12:00 18 -> '/proc/13989 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 19 -> /proc/13989/mem
> lr-x------ 1 root root 64 Oct 23 12:00 2 -> '/proc/584 (deleted)'
> lr-x------ 1 root root 64 Oct 23 12:00 20 -> '/proc/15502 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 21 -> /proc/15502/mem
> lr-x------ 1 root root 64 Oct 23 12:00 22 -> '/proc/15510 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 23 -> /proc/15510/mem
> lr-x------ 1 root root 64 Oct 23 12:00 24 -> '/proc/17833 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 25 -> /proc/17833/mem
> lr-x------ 1 root root 64 Oct 23 12:00 26 -> '/proc/17836 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 27 -> /proc/17836/mem
> lr-x------ 1 root root 64 Oct 23 12:00 28 -> '/proc/21929 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 29 -> /proc/21929/mem
> lrwx------ 1 root root 64 Oct 23 12:00 3 -> /proc/584/mem
> lr-x------ 1 root root 64 Oct 23 12:00 30 -> '/proc/22214 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 31 -> /proc/22214/mem
> lr-x------ 1 root root 64 Oct 23 12:00 32 -> '/proc/22283 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 33 -> /proc/22283/mem
> lr-x------ 1 root root 64 Oct 23 12:00 34 -> '/proc/29795 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 35 -> /proc/29795/mem
> lr-x------ 1 root root 64 Oct 23 12:00 36 -> '/proc/30124 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 37 -> /proc/30124/mem
> lr-x------ 1 root root 64 Oct 23 12:00 38 -> '/proc/31016 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 39 -> /proc/31016/mem
> lr-x------ 1 root root 64 Oct 23 12:00 4 -> '/proc/1632 (deleted)'
> lr-x------ 1 root root 64 Oct 23 12:00 40 -> '/proc/4137 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 41 -> /proc/4137/mem
> lrwx------ 1 root root 64 Oct 23 12:00 5 -> /proc/1632/mem
> lr-x------ 1 root root 64 Oct 23 12:00 6 -> '/proc/3655 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 7 -> /proc/3655/mem
> lr-x------ 1 root root 64 Oct 23 12:00 8 -> '/proc/7075 (deleted)'
> lrwx------ 1 root root 64 Oct 23 12:00 9 -> /proc/7075/mem
> root:/root #
> 
> Those are the fds I expected to receive, and I get fd numbers
> consistently increasing with them.
> lxc sends the syscall-executing process' pidfd and its 'mem' fd via a
> socket, but instead of making it to the receiver, they end up there...
> 
> I suspect that an async sendmsg() call could potentially end up
> accessing those instead of the ones from the sender process, but I
> haven't tested it...

Might "just" be a case of the sendmsg() being stuck, we can't currently
cancel work. So if they never complete, the ring won't go away.

Actually working on a small workqueue replacement for io_uring which
allow us to cancel things like that. It's a requirement for accept() as
well, but also for basic read/write send/recv on sockets. So used to
storage IO operations that complete in a finite amount of time...

But yes, I hope with that, and the flush trick that Jann suggested, that
we can make this 100% reliable for any type of operation.

-- 
Jens Axboe


  reply	other threads:[~2019-10-23 14:11 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 21:28 [PATCHSET] io_uring: add support for accept(4) Jens Axboe
2019-10-17 21:28 ` [PATCH 1/3] io_uring: add support for async work inheriting files table Jens Axboe
2019-10-18  2:41   ` Jann Horn
2019-10-18 14:01     ` Jens Axboe
2019-10-18 14:34       ` Jann Horn
2019-10-18 14:37         ` Jens Axboe
2019-10-18 14:40           ` Jann Horn
2019-10-18 14:43             ` Jens Axboe
2019-10-18 14:52               ` Jann Horn
2019-10-18 15:00                 ` Jens Axboe
2019-10-18 15:54                   ` Jens Axboe
2019-10-18 16:20                     ` Jann Horn
2019-10-18 16:36                       ` Jens Axboe
2019-10-18 17:05                         ` Jens Axboe
2019-10-18 18:06                           ` Jann Horn
2019-10-18 18:16                             ` Jens Axboe
2019-10-18 18:50                               ` Jann Horn
2019-10-24 19:41                                 ` Jens Axboe
2019-10-24 20:31                                   ` Jann Horn
2019-10-24 22:04                                     ` Jens Axboe
2019-10-24 22:09                                       ` Jens Axboe
2019-10-24 23:13                                       ` Jann Horn
2019-10-25  0:35                                         ` Jens Axboe
2019-10-25  0:52                                           ` Jens Axboe
2019-10-23 12:04   ` Wolfgang Bumiller
2019-10-23 14:11     ` Jens Axboe [this message]
2019-10-17 21:28 ` [PATCH 2/3] net: add __sys_accept4_file() helper Jens Axboe
2019-10-17 21:28 ` [PATCH 3/3] io_uring: add support for IORING_OP_ACCEPT Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b97233b-5d05-5efc-4173-e3a1ef177cbc@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=linux-block@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.