All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
To: Jens Axboe <axboe@kernel.dk>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	linux-api@vger.kernel.org
Cc: hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	jannh@google.com, viro@ZenIV.linux.org.uk
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Fri, 8 Feb 2019 14:02:28 +0000	[thread overview]
Message-ID: <02e71636-5b63-41e6-0ffd-646f305011c9@gmail.com> (raw)
In-Reply-To: <2ac73020-6ab0-e351-3a1a-180d0f1f801b@kernel.dk>

On 08/02/2019 12:57, Jens Axboe wrote:
> On 2/8/19 5:17 AM, Alan Jenkins wrote:
>>> +static int io_sqe_files_scm(struct io_ring_ctx *ctx)
>>> +{
>>> +#if defined(CONFIG_NET)
>>> +	struct scm_fp_list *fpl = ctx->user_files;
>>> +	struct sk_buff *skb;
>>> +	int i;
>>> +
>>> +	skb =  __alloc_skb(0, GFP_KERNEL, 0, NUMA_NO_NODE);
>>> +	if (!skb)
>>> +		return -ENOMEM;
>>> +
>>> +	skb->sk = ctx->ring_sock->sk;
>>> +	skb->destructor = unix_destruct_scm;
>>> +
>>> +	fpl->user = get_uid(ctx->user);
>>> +	for (i = 0; i < fpl->count; i++) {
>>> +		get_file(fpl->fp[i]);
>>> +		unix_inflight(fpl->user, fpl->fp[i]);
>>> +		fput(fpl->fp[i]);
>>> +	}
>>> +
>>> +	UNIXCB(skb).fp = fpl;
>>> +	skb_queue_head(&ctx->ring_sock->sk->sk_receive_queue, skb);
>> This code sounds elegant if you know about the existence of unix_gc(),
>> but quite mysterious if you don't.  (E.g. why "inflight"?)  Could we
>> have a brief comment, to comfort mortal readers on their journey?
>>
>> /* A message on a unix socket can hold a reference to a file. This can
>> cause a reference cycle. So there is a garbage collector for unix
>> sockets, which we hook into here. */
> Yes that's a good idea, I've added a comment as to why we go through the
> trouble of doing this socket + skb dance.

Great, thanks.

>> I think this is bypassing too_many_unix_fds() though?  I understood that
>> was intended to bound kernel memory allocation, at least in principle.
> As the code stands above, it'll cap it at 253. I'm just now reworking it
> to NOT be limited to the SCM max fd count, but still impose a limit of
> 1024 on the number of registered files. This is important to cap the
> memory allocation attempt as well.

I saw you were limiting to SCM_MAX_FD per io_uring.  On the other hand, 
there's no specific limit on the number of io_urings you can open (only 
the standard limits on fds).  So this would let you allocate hundreds of 
times more files than the previous limit RLIMIT_NOFILE...

static inline bool too_many_unix_fds(struct task_struct *p)
{
	struct user_struct *user = current_user();

	if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE)))
		return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN);
	return false;
}

RLIMIT_NOFILE is technically per-task, but here it is capping 
unix_inflight per-user.  So the way I look at this, the number of file 
descriptors per user is bounded by NOFILE * NPROC.  Then 
user->unix_inflight can have one additional process' worth (NOFILE) of 
"inflight" files.  (Plus SCM_MAX_FD slop, because too_many_fds() is only 
called once per SCM_RIGHTS).

Because io_uring doesn't check too_many_unix_fds(), I think it will let 
you have about 253 (or 1024) more process' worth of open files. That 
could be big proportionally when RLIMIT_NPROC is low.

I don't know if it matters.  It maybe reads like an oversight though.

(If it does matter, it might be cleanest to change too_many_unix_fds() 
to get rid of the "slop".  Since that may be different between af_unix 
and io_uring; 253 v.s. 1024 or whatever. E.g. add a parameter for the 
number of inflight files we want to add.)

>> Also, this code relies on CONFIG_NET.  To handle the case where
>> CONFIG_NET is not enabled, don't you still need to forbid registering an
>> io_uring fd ?
> Good point, we do still need to reject the io_uring fd itself if
> CONFIG_UNIX is not enabled. Done.

WARNING: multiple messages have this Message-ID (diff)
From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
To: Jens Axboe <axboe@kernel.dk>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	linux-api@vger.kernel.org
Cc: hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	jannh@google.com, viro@ZenIV.linux.org.uk
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Fri, 8 Feb 2019 14:02:28 +0000	[thread overview]
Message-ID: <02e71636-5b63-41e6-0ffd-646f305011c9@gmail.com> (raw)
In-Reply-To: <2ac73020-6ab0-e351-3a1a-180d0f1f801b@kernel.dk>

On 08/02/2019 12:57, Jens Axboe wrote:
> On 2/8/19 5:17 AM, Alan Jenkins wrote:
>>> +static int io_sqe_files_scm(struct io_ring_ctx *ctx)
>>> +{
>>> +#if defined(CONFIG_NET)
>>> +	struct scm_fp_list *fpl = ctx->user_files;
>>> +	struct sk_buff *skb;
>>> +	int i;
>>> +
>>> +	skb =  __alloc_skb(0, GFP_KERNEL, 0, NUMA_NO_NODE);
>>> +	if (!skb)
>>> +		return -ENOMEM;
>>> +
>>> +	skb->sk = ctx->ring_sock->sk;
>>> +	skb->destructor = unix_destruct_scm;
>>> +
>>> +	fpl->user = get_uid(ctx->user);
>>> +	for (i = 0; i < fpl->count; i++) {
>>> +		get_file(fpl->fp[i]);
>>> +		unix_inflight(fpl->user, fpl->fp[i]);
>>> +		fput(fpl->fp[i]);
>>> +	}
>>> +
>>> +	UNIXCB(skb).fp = fpl;
>>> +	skb_queue_head(&ctx->ring_sock->sk->sk_receive_queue, skb);
>> This code sounds elegant if you know about the existence of unix_gc(),
>> but quite mysterious if you don't.  (E.g. why "inflight"?)  Could we
>> have a brief comment, to comfort mortal readers on their journey?
>>
>> /* A message on a unix socket can hold a reference to a file. This can
>> cause a reference cycle. So there is a garbage collector for unix
>> sockets, which we hook into here. */
> Yes that's a good idea, I've added a comment as to why we go through the
> trouble of doing this socket + skb dance.

Great, thanks.

>> I think this is bypassing too_many_unix_fds() though?  I understood that
>> was intended to bound kernel memory allocation, at least in principle.
> As the code stands above, it'll cap it at 253. I'm just now reworking it
> to NOT be limited to the SCM max fd count, but still impose a limit of
> 1024 on the number of registered files. This is important to cap the
> memory allocation attempt as well.

I saw you were limiting to SCM_MAX_FD per io_uring.  On the other hand, 
there's no specific limit on the number of io_urings you can open (only 
the standard limits on fds).  So this would let you allocate hundreds of 
times more files than the previous limit RLIMIT_NOFILE...

static inline bool too_many_unix_fds(struct task_struct *p)
{
	struct user_struct *user = current_user();

	if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE)))
		return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN);
	return false;
}

RLIMIT_NOFILE is technically per-task, but here it is capping 
unix_inflight per-user.  So the way I look at this, the number of file 
descriptors per user is bounded by NOFILE * NPROC.  Then 
user->unix_inflight can have one additional process' worth (NOFILE) of 
"inflight" files.  (Plus SCM_MAX_FD slop, because too_many_fds() is only 
called once per SCM_RIGHTS).

Because io_uring doesn't check too_many_unix_fds(), I think it will let 
you have about 253 (or 1024) more process' worth of open files. That 
could be big proportionally when RLIMIT_NPROC is low.

I don't know if it matters.  It maybe reads like an oversight though.

(If it does matter, it might be cleanest to change too_many_unix_fds() 
to get rid of the "slop".  Since that may be different between af_unix 
and io_uring; 253 v.s. 1024 or whatever. E.g. add a parameter for the 
number of inflight files we want to add.)

>> Also, this code relies on CONFIG_NET.  To handle the case where
>> CONFIG_NET is not enabled, don't you still need to forbid registering an
>> io_uring fd ?
> Good point, we do still need to reject the io_uring fd itself if
> CONFIG_UNIX is not enabled. Done.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

  reply	other threads:[~2019-02-08 14:02 UTC|newest]

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07 19:55 [PATCHSET v12] io_uring IO interface Jens Axboe
2019-02-07 19:55 ` Jens Axboe
2019-02-07 19:55 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 20:15   ` Keith Busch
2019-02-07 20:15     ` Keith Busch
2019-02-07 20:16     ` Jens Axboe
2019-02-07 20:16       ` Jens Axboe
2019-02-07 19:55 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 20:57   ` Jeff Moyer
2019-02-07 20:57     ` Jeff Moyer
2019-02-07 21:02     ` Jens Axboe
2019-02-07 21:02       ` Jens Axboe
2019-02-07 22:38   ` Jeff Moyer
2019-02-07 22:38     ` Jeff Moyer
2019-02-07 22:47     ` Jens Axboe
2019-02-07 22:47       ` Jens Axboe
2019-02-07 19:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-08 12:17   ` Alan Jenkins
2019-02-08 12:17     ` Alan Jenkins
2019-02-08 12:57     ` Jens Axboe
2019-02-08 12:57       ` Jens Axboe
2019-02-08 14:02       ` Alan Jenkins [this message]
2019-02-08 14:02         ` Alan Jenkins
2019-02-08 15:13         ` Jens Axboe
2019-02-08 15:13           ` Jens Axboe
2019-02-12 12:29           ` Alan Jenkins
2019-02-12 12:29             ` Alan Jenkins
2019-02-12 15:17             ` Jens Axboe
2019-02-12 15:17               ` Jens Axboe
2019-02-12 17:21               ` Alan Jenkins
2019-02-12 17:21                 ` Alan Jenkins
2019-02-12 17:33                 ` Jens Axboe
2019-02-12 17:33                   ` Jens Axboe
2019-02-12 20:23                   ` Alan Jenkins
2019-02-12 20:23                     ` Alan Jenkins
2019-02-12 21:10                     ` Jens Axboe
2019-02-12 21:10                       ` Jens Axboe
2019-02-07 19:55 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 22:12   ` Jeff Moyer
2019-02-07 22:12     ` Jeff Moyer
2019-02-07 22:18     ` Jens Axboe
2019-02-07 22:18       ` Jens Axboe
2019-02-07 19:55 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 19:55 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
2019-02-07 19:55   ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-02-01 15:23 [PATCHSET v11] io_uring IO interface Jens Axboe
2019-02-01 15:24 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-02-01 15:24   ` Jens Axboe
2019-01-30 21:55 [PATCHSET v10] io_uring IO interface Jens Axboe
2019-01-30 21:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30 21:55   ` Jens Axboe
2019-01-29 19:26 [PATCHSET v9] io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-30  1:29   ` Jann Horn
2019-01-30  1:29     ` Jann Horn
2019-01-30 15:35     ` Jens Axboe
2019-01-30 15:35       ` Jens Axboe
2019-02-04  2:56     ` Al Viro
2019-02-04  2:56       ` Al Viro
2019-02-05  2:19       ` Jens Axboe
2019-02-05  2:19         ` Jens Axboe
2019-02-05 17:57         ` Jens Axboe
2019-02-05 17:57           ` Jens Axboe
2019-02-05 19:08           ` Jens Axboe
2019-02-05 19:08             ` Jens Axboe
2019-02-06  0:27             ` Jens Axboe
2019-02-06  0:27               ` Jens Axboe
2019-02-06  1:01               ` Al Viro
2019-02-06  1:01                 ` Al Viro
2019-02-06 17:56                 ` Jens Axboe
2019-02-06 17:56                   ` Jens Axboe
2019-02-07  4:05                   ` Al Viro
2019-02-07  4:05                     ` Al Viro
2019-02-07 16:14                     ` Jens Axboe
2019-02-07 16:30                       ` Al Viro
2019-02-07 16:30                         ` Al Viro
2019-02-07 16:35                         ` Jens Axboe
2019-02-07 16:35                           ` Jens Axboe
2019-02-07 16:51                         ` Al Viro
2019-02-07 16:51                           ` Al Viro
2019-02-06  0:56             ` Al Viro
2019-02-06  0:56               ` Al Viro
2019-02-06 13:41               ` Jens Axboe
2019-02-06 13:41                 ` Jens Axboe
2019-02-07  4:00                 ` Al Viro
2019-02-07  4:00                   ` Al Viro
2019-02-07  9:22                   ` Miklos Szeredi
2019-02-07  9:22                     ` Miklos Szeredi
2019-02-07 13:31                     ` Al Viro
2019-02-07 13:31                       ` Al Viro
2019-02-07 14:20                       ` Miklos Szeredi
2019-02-07 14:20                         ` Miklos Szeredi
2019-02-07 15:20                         ` Al Viro
2019-02-07 15:20                           ` Al Viro
2019-02-07 15:27                           ` Miklos Szeredi
2019-02-07 15:27                             ` Miklos Szeredi
2019-02-07 16:26                             ` Al Viro
2019-02-07 16:26                               ` Al Viro
2019-02-07 19:08                               ` Miklos Szeredi
2019-02-07 19:08                                 ` Miklos Szeredi
2019-02-07 18:45                   ` Jens Axboe
2019-02-07 18:45                     ` Jens Axboe
2019-02-07 18:58                     ` Jens Axboe
2019-02-07 18:58                       ` Jens Axboe
2019-02-11 15:55                     ` Jonathan Corbet
2019-02-11 15:55                       ` Jonathan Corbet
2019-02-11 17:35                       ` Al Viro
2019-02-11 17:35                         ` Al Viro
2019-02-11 20:33                         ` Jonathan Corbet
2019-02-11 20:33                           ` Jonathan Corbet
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-29 16:36   ` Jann Horn
2019-01-29 16:36     ` Jann Horn
2019-01-29 18:13     ` Jens Axboe
2019-01-29 18:13       ` Jens Axboe
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02e71636-5b63-41e6-0ffd-646f305011c9@gmail.com \
    --to=alan.christopher.jenkins@gmail.com \
    --cc=avi@scylladb.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.