linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Jann Horn <jannh@google.com>
Cc: linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Wed, 30 Jan 2019 08:35:35 -0700	[thread overview]
Message-ID: <a54c68d0-7069-a72c-5604-37b0e1ed53bf@kernel.dk> (raw)
In-Reply-To: <CAG48ez2kLWUEambWesQQaeVn_u=PtjuMeDUc49yamhYspfd7Sg@mail.gmail.com>

On 1/29/19 6:29 PM, Jann Horn wrote:
> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@kernel.dk> wrote:
>> We normally have to fget/fput for each IO we do on a file. Even with
>> the batching we do, the cost of the atomic inc/dec of the file usage
>> count adds up.
>>
>> This adds IORING_REGISTER_FILES, and IORING_UNREGISTER_FILES opcodes
>> for the io_uring_register(2) system call. The arguments passed in must
>> be an array of __s32 holding file descriptors, and nr_args should hold
>> the number of file descriptors the application wishes to pin for the
>> duration of the io_uring context (or until IORING_UNREGISTER_FILES is
>> called).
>>
>> When used, the application must set IOSQE_FIXED_FILE in the sqe->flags
>> member. Then, instead of setting sqe->fd to the real fd, it sets sqe->fd
>> to the index in the array passed in to IORING_REGISTER_FILES.
>>
>> Files are automatically unregistered when the io_uring context is
>> torn down. An application need only unregister if it wishes to
>> register a new set of fds.
> 
> Crazy idea:
> 
> Taking a step back, at a high level, basically this patch creates sort
> of the same difference that you get when you compare the following
> scenarios for normal multithreaded I/O in userspace:
> 
> ===========================================================
> ~/tests/fdget_perf$ cat fdget_perf.c
> #define _GNU_SOURCE
> #include <sys/wait.h>
> #include <sched.h>
> #include <unistd.h>
> #include <stdbool.h>
> #include <string.h>
> #include <err.h>
> #include <signal.h>
> #include <sys/eventfd.h>
> #include <stdio.h>
> 
> // two different physical processors on my machine
> #define CORE_A 0
> #define CORE_B 14
> 
> static void pin_to_core(int coreid) {
>   cpu_set_t set;
>   CPU_ZERO(&set);
>   CPU_SET(coreid, &set);
>   if (sched_setaffinity(0, sizeof(cpu_set_t), &set))
>     err(1, "sched_setaffinity");
> }
> 
> static int fd = -1;
> 
> static volatile int time_over = 0;
> static void alarm_handler(int sig) { time_over = 1; }
> static void run_stuff(void) {
>   unsigned long long iterations = 0;
>   if (signal(SIGALRM, alarm_handler) == SIG_ERR) err(1, "signal");
>   alarm(10);
>   while (1) {
>     uint64_t val;
>     read(fd, &val, sizeof(val));
>     if (time_over) {
>       printf("iterations = 0x%llx\n", iterations);
>       return;
>     }
>     iterations++;
>   }
> }
> 
> static int child_fn(void *dummy) {
>   pin_to_core(CORE_B);
>   run_stuff();
>   return 0;
> }
> 
> static char child_stack[1024*1024];
> 
> int main(int argc, char **argv) {
>   fd = eventfd(0, EFD_NONBLOCK);
>   if (fd == -1) err(1, "eventfd");
> 
>   if (argc != 2) errx(1, "bad usage");
>   int flags = SIGCHLD;
>   if (strcmp(argv[1], "shared") == 0) {
>     flags |= CLONE_FILES;
>   } else if (strcmp(argv[1], "cloned") == 0) {
>     /* nothing */
>   } else {
>     errx(1, "bad usage");
>   }
>   pid_t child = clone(child_fn, child_stack+sizeof(child_stack), flags, NULL);
>   if (child == -1) err(1, "clone");
> 
>   pin_to_core(CORE_A);
>   run_stuff();
>   int status;
>   if (wait(&status) != child) err(1, "wait");
>   return 0;
> }
> ~/tests/fdget_perf$ gcc -Wall -o fdget_perf fdget_perf.c
> ~/tests/fdget_perf$ ./fdget_perf shared
> iterations = 0x8d3010
> iterations = 0x92d894
> ~/tests/fdget_perf$ ./fdget_perf cloned
> iterations = 0xad3bbd
> iterations = 0xb08838
> ~/tests/fdget_perf$ ./fdget_perf shared
> iterations = 0x8cc340
> iterations = 0x8e4e64
> ~/tests/fdget_perf$ ./fdget_perf cloned
> iterations = 0xada5f3
> iterations = 0xb04b6f
> ===========================================================
> 
> This kinda makes me wonder whether this is really something that
> should be implemented specifically for the io_uring API, or whether it
> would make sense to somehow handle part of this in the generic VFS
> code and give the user the ability to prepare a new files_struct that
> can then be transferred to the worker thread, or something like
> that... I'm not sure whether there's a particularly clean way to do
> that though.
> 
> Or perhaps you could add a userspace API for marking file descriptor
> table entries as "has percpu refcounting" somehow, with one percpu
> refcount per files_struct and one bit per fd, allocated when percpu
> refcounting is activated for the files_struct the first time, or
> something like that...

There's undoubtedly a win by NOT sharing, obviously. Not sure how to do
this in a generalized fashion, cleanly, it's easier (and a better fit)
to do it for specific cases, like io_uring here. If others want to go
down that path, io_uring could always be adapted to use that
infrastructure.

-- 
Jens Axboe


  reply	other threads:[~2019-01-30 15:35 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 19:26 [PATCHSET v9] io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-29 19:26 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-29 19:26 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-29 19:26 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-29 20:47   ` Jann Horn
2019-01-29 20:56     ` Jens Axboe
2019-01-29 21:10       ` Jann Horn
2019-01-29 21:33         ` Jens Axboe
2019-01-29 19:26 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-29 19:26 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-29 23:31   ` Jann Horn
2019-01-29 23:44     ` Jens Axboe
2019-01-30 15:33       ` Jens Axboe
2019-01-29 19:26 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-29 19:26 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-29 19:26 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-29 22:44   ` Jann Horn
2019-01-29 22:56     ` Jens Axboe
2019-01-29 23:03       ` Jann Horn
2019-01-29 23:06         ` Jens Axboe
2019-01-29 23:08           ` Jann Horn
2019-01-29 23:14             ` Jens Axboe
2019-01-29 23:42               ` Jann Horn
2019-01-29 23:51                 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30  1:29   ` Jann Horn
2019-01-30 15:35     ` Jens Axboe [this message]
2019-02-04  2:56     ` Al Viro
2019-02-05  2:19       ` Jens Axboe
2019-02-05 17:57         ` Jens Axboe
2019-02-05 19:08           ` Jens Axboe
2019-02-06  0:27             ` Jens Axboe
2019-02-06  1:01               ` Al Viro
2019-02-06 17:56                 ` Jens Axboe
2019-02-07  4:05                   ` Al Viro
2019-02-07 16:14                     ` Jens Axboe
2019-02-07 16:30                       ` Al Viro
2019-02-07 16:35                         ` Jens Axboe
2019-02-07 16:51                         ` Al Viro
2019-02-06  0:56             ` Al Viro
2019-02-06 13:41               ` Jens Axboe
2019-02-07  4:00                 ` Al Viro
2019-02-07  9:22                   ` Miklos Szeredi
2019-02-07 13:31                     ` Al Viro
2019-02-07 14:20                       ` Miklos Szeredi
2019-02-07 15:20                         ` Al Viro
2019-02-07 15:27                           ` Miklos Szeredi
2019-02-07 16:26                             ` Al Viro
2019-02-07 19:08                               ` Miklos Szeredi
2019-02-07 18:45                   ` Jens Axboe
2019-02-07 18:58                     ` Jens Axboe
2019-02-11 15:55                     ` Jonathan Corbet
2019-02-11 17:35                       ` Al Viro
2019-02-11 20:33                         ` Jonathan Corbet
2019-01-29 19:26 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-29 19:26 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-29 19:27 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-29 19:27 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-29 19:27 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-02-07 19:55 [PATCHSET v12] io_uring IO interface Jens Axboe
2019-02-07 19:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-02-08 12:17   ` Alan Jenkins
2019-02-08 12:57     ` Jens Axboe
2019-02-08 14:02       ` Alan Jenkins
2019-02-08 15:13         ` Jens Axboe
2019-02-12 12:29           ` Alan Jenkins
2019-02-12 15:17             ` Jens Axboe
2019-02-12 17:21               ` Alan Jenkins
2019-02-12 17:33                 ` Jens Axboe
2019-02-12 20:23                   ` Alan Jenkins
2019-02-12 21:10                     ` Jens Axboe
2019-02-01 15:23 [PATCHSET v11] io_uring IO interface Jens Axboe
2019-02-01 15:24 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30 21:55 [PATCHSET v10] io_uring IO interface Jens Axboe
2019-01-30 21:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-29 16:36   ` Jann Horn
2019-01-29 18:13     ` Jens Axboe
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a54c68d0-7069-a72c-5604-37b0e1ed53bf@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=avi@scylladb.com \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).