All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@scylladb.com>
To: Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.cz>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>,
	jack@suse.com, hch@infradead.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 0/8 v2] Non-blocking AIO
Date: Mon, 6 Mar 2017 20:17:43 +0200	[thread overview]
Message-ID: <4cbafb12-a30e-bb57-da43-de7c47726c81@scylladb.com> (raw)
In-Reply-To: <7aabb6b4-df8d-8554-fbe3-90504887fb8e@kernel.dk>



On 03/06/2017 07:06 PM, Jens Axboe wrote:
> On 03/06/2017 09:59 AM, Avi Kivity wrote:
>>
>> On 03/06/2017 06:08 PM, Jens Axboe wrote:
>>> On 03/06/2017 08:59 AM, Avi Kivity wrote:
>>>> On 03/06/2017 05:38 PM, Jens Axboe wrote:
>>>>> On 03/06/2017 08:29 AM, Avi Kivity wrote:
>>>>>> On 03/06/2017 05:19 PM, Jens Axboe wrote:
>>>>>>> On 03/06/2017 01:25 AM, Jan Kara wrote:
>>>>>>>> On Sun 05-03-17 16:56:21, Avi Kivity wrote:
>>>>>>>>>> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
>>>>>>>>>> any of these conditions are met. This way userspace can push most
>>>>>>>>>> of the write()s to the kernel to the best of its ability to complete
>>>>>>>>>> and if it returns -EAGAIN, can defer it to another thread.
>>>>>>>>>>
>>>>>>>>> Is it not possible to push the iocb to a workqueue?  This will allow
>>>>>>>>> existing userspace to work with the new functionality, unchanged. Any
>>>>>>>>> userspace implementation would have to do the same thing, so it's not like
>>>>>>>>> we're saving anything by pushing it there.
>>>>>>>> That is not easy because until IO is fully submitted, you need some parts
>>>>>>>> of the context of the process which submits the IO (e.g. memory mappings,
>>>>>>>> but possibly also other credentials). So you would need to somehow transfer
>>>>>>>> this information to the workqueue.
>>>>>>> Outside of technical challenges, the API also needs to return EAGAIN or
>>>>>>> start blocking at some point. We can't expose a direct connection to
>>>>>>> queue work like that, and let any user potentially create millions of
>>>>>>> pending work items (and IOs).
>>>>>> You wouldn't expect more concurrent events than the maxevents parameter
>>>>>> that was supplied to io_setup syscall; it should have reserved any
>>>>>> resources needed.
>>>>> Doesn't matter what limit you apply, my point still stands - at some
>>>>> point you have to return EAGAIN, or block. Returning EAGAIN without
>>>>> the caller having flagged support for that change of behavior would
>>>>> be problematic.
>>>> Doesn't it already return EAGAIN (or some other error) if you exceed
>>>> maxevents?
>>> It's a setup thing. We check these limits when someone creates an IO
>>> context, and carve out the specified entries form our global pool. Then
>>> we free those "resources" when the io context is freed.
>>>
>>> Right now I can setup an IO context with 1000 entries on it, yet that
>>> number has NO bearing on when io_submit() would potentially block or
>>> return EAGAIN.
>>>
>>> We can have a huge gap on the intent signaled by io context setup, and
>>> the reality imposed by what actually happens on the IO submission side.
>> Isn't that a bug?  Shouldn't that 1001st incomplete io_submit() return
>> EAGAIN?
>>
>> Just tested it, and maxevents is not respected for this:
>>
>> io_setup(1, [0x7fc64537f000])           = 0
>> io_submit(0x7fc64537f000, 10, [{pread, fildes=3, buf=0x1eb4000,
>> nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096,
>> offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0},
>> {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, {pread,
>> fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, {pread, fildes=3,
>> buf=0x1eb4000, nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000,
>> nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096,
>> offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0},
>> {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}]) = 10
>>
>> which is unexpected, to me.
> ioctx_alloc()
> {
>          [...]
>
>          /*
>           * We keep track of the number of available ringbuffer slots, to prevent
>           * overflow (reqs_available), and we also use percpu counters for this.
>           *
>           * So since up to half the slots might be on other cpu's percpu counters
>           * and unavailable, double nr_events so userspace sees what they
>           * expected: additionally, we move req_batch slots to/from percpu
>           * counters at a time, so make sure that isn't 0:
>           */
>          nr_events = max(nr_events, num_possible_cpus() * 4);
>          nr_events *= 2;
> }

On a 4-lcore desktop:

io_setup(1, [0x7fc210041000])           = 0
io_submit(0x7fc210041000, 10000, [big array]) = 126
io_submit(0x7fc210041000, 10000, [big array]) = -1 EAGAIN (Resource 
temporarily unavailable)

so, the user should already expect EAGAIN from io_submit() due to 
resource limits.  I'm sure the check could be tightened so that if we do 
have to use a workqueue, we respect the user's limit rather than some 
inflated number.

  reply	other threads:[~2017-03-06 18:17 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-28 23:36 [PATCH 0/8 v2] Non-blocking AIO Goldwyn Rodrigues
2017-02-28 23:36 ` [PATCH 1/8] nowait aio: Introduce IOCB_FLAG_NOWAIT Goldwyn Rodrigues
2017-03-01 15:36   ` Christoph Hellwig
2017-03-01 15:56     ` Christoph Hellwig
2017-03-01 16:57       ` Goldwyn Rodrigues
2017-03-01 22:44         ` Christoph Hellwig
2017-02-28 23:36 ` [PATCH 2/8] nowait aio: Return if cannot get hold of i_rwsem Goldwyn Rodrigues
2017-03-01 15:37   ` Christoph Hellwig
2017-02-28 23:36 ` [PATCH 3/8] nowait aio: return if direct write will trigger writeback Goldwyn Rodrigues
2017-03-01  3:46   ` Matthew Wilcox
2017-03-01 15:38     ` Christoph Hellwig
2017-03-02 10:38       ` Jan Kara
2017-03-02 14:12         ` Matthew Wilcox
2017-03-02 15:22           ` Jan Kara
2017-02-28 23:36 ` [PATCH 4/8] nowait aio: Introduce IOMAP_NOWAIT Goldwyn Rodrigues
2017-02-28 23:36 ` [PATCH 5/8] nowait aio: return on congested block device Goldwyn Rodrigues
2017-03-08  7:03   ` Sagi Grimberg
2017-03-08 15:00     ` Goldwyn Rodrigues
2017-03-08 15:28       ` Jan Kara
2017-03-08 15:51         ` Christoph Hellwig
2017-03-08 16:17       ` Jens Axboe
2017-03-09  2:18         ` Goldwyn Rodrigues
2017-02-28 23:36 ` [PATCH 6/8] nowait aio: ext4 Goldwyn Rodrigues
2017-02-28 23:36 ` [PATCH 7/8] nowait aio: xfs Goldwyn Rodrigues
2017-03-01 15:40   ` Christoph Hellwig
2017-02-28 23:36 ` [PATCH 8/8] nowait aio: btrfs Goldwyn Rodrigues
2017-03-05 14:56 ` [PATCH 0/8 v2] Non-blocking AIO Avi Kivity
2017-03-06  8:25   ` Jan Kara
2017-03-06  8:40     ` Avi Kivity
2017-03-06 15:19     ` Jens Axboe
2017-03-06 15:29       ` Avi Kivity
2017-03-06 15:38         ` Jens Axboe
2017-03-06 15:59           ` Avi Kivity
2017-03-06 16:08             ` Jens Axboe
2017-03-06 16:59               ` Avi Kivity
2017-03-06 17:06                 ` Jens Axboe
2017-03-06 18:17                   ` Avi Kivity [this message]
2017-03-06 18:27                     ` Jens Axboe
2017-03-06 18:50                       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4cbafb12-a30e-bb57-da43-de7c47726c81@scylladb.com \
    --to=avi@scylladb.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.