linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Roman Penyaev <rpenyaev@suse.de>
Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
	linux-block@vger.kernel.org, hch@lst.de, jmoyer@redhat.com,
	avi@scylladb.com, linux-block-owner@vger.kernel.org
Subject: Re: [PATCH 05/17] Add io_uring IO interface
Date: Mon, 21 Jan 2019 09:23:44 -0700	[thread overview]
Message-ID: <df5b04ea-1c7c-03e5-087e-d9e3763d6670@kernel.dk> (raw)
In-Reply-To: <eb1e623843cd26ced5d06deb7fdb7851@suse.de>

On 1/21/19 8:58 AM, Roman Penyaev wrote:
> On 2019-01-21 16:30, Jens Axboe wrote:
>> On 1/21/19 2:13 AM, Roman Penyaev wrote:
>>> On 2019-01-18 17:12, Jens Axboe wrote:
>>>
>>> [...]
>>>
>>>> +
>>>> +static int io_uring_create(unsigned entries, struct io_uring_params
>>>> *p,
>>>> +			   bool compat)
>>>> +{
>>>> +	struct user_struct *user = NULL;
>>>> +	struct io_ring_ctx *ctx;
>>>> +	int ret;
>>>> +
>>>> +	if (entries > IORING_MAX_ENTRIES)
>>>> +		return -EINVAL;
>>>> +
>>>> +	/*
>>>> +	 * Use twice as many entries for the CQ ring. It's possible for the
>>>> +	 * application to drive a higher depth than the size of the SQ 
>>>> ring,
>>>> +	 * since the sqes are only used at submission time. This allows for
>>>> +	 * some flexibility in overcommitting a bit.
>>>> +	 */
>>>> +	p->sq_entries = roundup_pow_of_two(entries);
>>>> +	p->cq_entries = 2 * p->sq_entries;
>>>> +
>>>> +	if (!capable(CAP_IPC_LOCK)) {
>>>> +		user = get_uid(current_user());
>>>> +		ret = __io_account_mem(user, ring_pages(p->sq_entries,
>>>> +							p->cq_entries));
>>>> +		if (ret) {
>>>> +			free_uid(user);
>>>> +			return ret;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	ctx = io_ring_ctx_alloc(p);
>>>> +	if (!ctx)
>>>> +		return -ENOMEM;
>>>
>>> Hi Jens,
>>>
>>> It seems pages should be "unaccounted" back here and uid freed if path
>>> with "if (!capable(CAP_IPC_LOCK))" above was taken.
>>
>> Thanks, yes that is leaky. I'll fix that up.
>>
>>> But really, could please someone explain me what is wrong with
>>> allocating
>>> all urings in mmap() without touching RLIMIT_MEMLOCK at all?  Thus all
>>> memory will be accounted to the caller app and if app is greedy it 
>>> will
>>> be killed by oom.  What I'm missing?
>>
>> I don't really what that'd change, if we do it off the ->mmap() or when
>> we setup the io_uring instance with io_uring_setup(2). We need this 
>> memory
>> to be pinned, we can't fault on it.
> 
> Hm, I thought that for pinning there is a separate counter ->pinned_vm
> (introduced by bc3e53f682d9 ("mm: distinguish between mlocked and pinned
> pages")  Which seems not wired up with anything, just a counter, used by
> couple of drivers.

io_uring doesn't inc/dec either of those, but it probably should. As it
appears rather unused, probably not a big deal.

> Hmmm.. Frankly, now I am lost. You map these pages through
> remap_pfn_range(), so virtual user mapping won't fault, right?  And
> these pages you allocate with GFP_KERNEL, so they are already pinned.

Right, they will not fault. My point is that it sounded like you want
the application to allocate this memory in userspace, and then have the
kernel map it. I don't want to do that, that brings it's own host of
issues with it (we used to do that). The mmap(2) of kernel memory is
much cleaner.

> So now I do not understand why this accounting is needed at all :)
> The only reason I had in mind is some kind of accounting, to filter out
> greedy and nasty apps.  If this is not the case, then I am lost.
> Could you please explain?

We need some kind of limit, to prevent a user from creating millions of
io_uring instances and pining down everything. The old aio code realized
this after the fact, and added some silly sysctls to control this. I
want to avoid the same mess, and hence it makes more sense to tie into
some kind of limiting we already have, like RLIMIT_MEMLOCK. Since we're
using that rlimit, accounting the memory as locked is the right way to
go.

-- 
Jens Axboe


  reply	other threads:[~2019-01-21 16:23 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-18 16:12 [PATCHSET v6] io_uring IO interface Jens Axboe
2019-01-18 16:12 ` [PATCH 01/17] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-18 16:12 ` [PATCH 02/17] block: wire up block device iopoll method Jens Axboe
2019-01-18 16:12 ` [PATCH 03/17] block: add bio_set_polled() helper Jens Axboe
2019-01-18 16:12 ` [PATCH 04/17] iomap: wire up the iopoll method Jens Axboe
2019-01-18 16:12 ` [PATCH 05/17] Add io_uring IO interface Jens Axboe
2019-01-21  9:13   ` Roman Penyaev
2019-01-21 15:30     ` Jens Axboe
2019-01-21 15:58       ` Roman Penyaev
2019-01-21 16:23         ` Jens Axboe [this message]
2019-01-21 16:49           ` Roman Penyaev
2019-01-22 16:11             ` Jens Axboe
2019-01-18 16:12 ` [PATCH 06/17] io_uring: add fsync support Jens Axboe
2019-01-18 16:12 ` [PATCH 07/17] io_uring: support for IO polling Jens Axboe
2019-01-18 16:12 ` [PATCH 08/17] fs: add fget_many() and fput_many() Jens Axboe
2019-01-18 16:12 ` [PATCH 09/17] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-18 16:12 ` [PATCH 10/17] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-18 16:12 ` [PATCH 11/17] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-18 16:12 ` [PATCH 12/17] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-18 16:12 ` [PATCH 13/17] io_uring: add file set registration Jens Axboe
2019-01-18 16:12 ` [PATCH 14/17] io_uring: add submission polling Jens Axboe
2019-01-18 16:12 ` [PATCH 15/17] io_uring: add io_kiocb ref count Jens Axboe
2019-01-18 16:12 ` [PATCH 16/17] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-18 16:12 ` [PATCH 17/17] io_uring: add io_uring_event cache hit information Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df5b04ea-1c7c-03e5-087e-d9e3763d6670@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=avi@scylladb.com \
    --cc=hch@lst.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-block-owner@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rpenyaev@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).