linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: David Hildenbrand <david@redhat.com>,
	Andrew Dona-Couch <andrew@donacou.ch>,
	Andrew Morton <akpm@linux-foundation.org>,
	Drew DeVault <sir@cmpwn.com>
Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	io_uring Mailing List <io-uring@vger.kernel.org>,
	Pavel Begunkov <asml.silence@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH] Increase default MLOCK_LIMIT to 8 MiB
Date: Mon, 22 Nov 2021 13:44:11 -0700	[thread overview]
Message-ID: <3adc55d3-f383-efa9-7319-740fc6ab5d7a@kernel.dk> (raw)
In-Reply-To: <5f998bb7-7b5d-9253-2337-b1d9ea59c796@redhat.com>

On 11/22/21 1:08 PM, David Hildenbrand wrote:
> On 22.11.21 20:53, Jens Axboe wrote:
>> On 11/22/21 11:26 AM, David Hildenbrand wrote:
>>> On 22.11.21 18:55, Andrew Dona-Couch wrote:
>>>> Forgive me for jumping in to an already overburdened thread.  But can
>>>> someone pushing back on this clearly explain the issue with applying
>>>> this patch?
>>>
>>> It will allow unprivileged users to easily and even "accidentally"
>>> allocate more unmovable memory than it should in some environments. Such
>>> limits exist for a reason. And there are ways for admins/distros to
>>> tweak these limits if they know what they are doing.
>>
>> But that's entirely the point, the cases where this change is needed are
>> already screwed by a distro and the user is the administrator. This is
>> _exactly_ the case where things should just work out of the box. If
>> you're managing farms of servers, yeah you have competent administration
>> and you can be expected to tweak settings to get the best experience and
>> performance, but the kernel should provide a sane default. 64K isn't a
>> sane default.
> 
> 0.1% of RAM isn't either.

No default is perfect, byt 0.1% will solve 99% of the problem. And most
likely solve 100% of the problems for the important case, which is where
you want things to Just Work on your distro without doing any
administration.  If you're aiming for perfection, it doesn't exist.

>>> This is not a step into the right direction. This is all just trying to
>>> hide the fact that we're exposing FOLL_LONGTERM usage to random
>>> unprivileged users.
>>>
>>> Maybe we could instead try getting rid of FOLL_LONGTERM usage and the
>>> memlock limit in io_uring altogether, for example, by using mmu
>>> notifiers. But I'm no expert on the io_uring code.
>>
>> You can't use mmu notifiers without impacting the fast path. This isn't
>> just about io_uring, there are other users of memlock right now (like
>> bpf) which just makes it even worse.
> 
> 1) Do we have a performance evaluation? Did someone try and come up with
> a conclusion how bad it would be?

I honestly don't remember the details, I took a look at it about a year
ago due to some unrelated reasons. These days it just pertains to
registered buffers, so it's less of an issue than back then when it
dealt with the rings as well. Hence might be feasible, I'm certainly not
against anyone looking into it. Easy enough to review and test for
performance concerns.

> 2) Could be provide a mmu variant to ordinary users that's just good
> enough but maybe not as fast as what we have today? And limit
> FOLL_LONGTERM to special, privileged users?

If it's not as fast, then it's most likely not good enough though...

> 3) Just because there are other memlock users is not an excuse. For
> example, VFIO/VDPA have to use it for a reason, because there is no way
> not do use FOLL_LONGTERM.

It's not an excuse, the statement merely means that the problem is
_worse_ as there are other memlock users.

>>
>> We should just make this 0.1% of RAM (min(0.1% ram, 64KB)) or something
>> like what was suggested, if that will help move things forward. IMHO the
>> 32MB machine is mostly a theoretical case, but whatever .
> 
> 1) I'm deeply concerned about large ZONE_MOVABLE and MIGRATE_CMA ranges
> where FOLL_LONGTERM cannot be used, as that memory is not available.
> 
> 2) With 0.1% RAM it's sufficient to start 1000 processes to break any
> system completely and deeply mess up the MM. Oh my.

We're talking per-user limits here. But if you want to talk hyperbole,
then 64K multiplied by some other random number will also allow
everything to be pinned, potentially.

-- 
Jens Axboe


  reply	other threads:[~2021-11-22 20:44 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-28  8:08 [PATCH] Increase default MLOCK_LIMIT to 8 MiB Drew DeVault
2021-10-28 18:22 ` Jens Axboe
2021-11-04 14:27   ` Cyril Hrubis
2021-11-04 14:44     ` Jens Axboe
2021-11-06  2:33 ` Ammar Faizi
2021-11-06  7:05   ` Drew DeVault
2021-11-06  7:12     ` Ammar Faizi
2021-11-16  4:35       ` Andrew Morton
2021-11-16  6:32         ` Drew DeVault
2021-11-16 19:47           ` Andrew Morton
2021-11-16 19:48             ` Drew DeVault
2021-11-16 21:37               ` Andrew Morton
2021-11-17  8:23                 ` Drew DeVault
2021-11-22 17:11                 ` David Hildenbrand
2021-11-22 17:55                   ` Andrew Dona-Couch
2021-11-22 18:26                     ` David Hildenbrand
2021-11-22 19:53                       ` Jens Axboe
2021-11-22 20:03                         ` Matthew Wilcox
2021-11-22 20:04                           ` Jens Axboe
2021-11-22 20:08                         ` David Hildenbrand
2021-11-22 20:44                           ` Jens Axboe [this message]
2021-11-22 21:56                             ` David Hildenbrand
2021-11-23 12:02                               ` David Hildenbrand
2021-11-23 13:25                           ` Jason Gunthorpe
2021-11-23 13:39                             ` David Hildenbrand
2021-11-23 14:07                               ` Jason Gunthorpe
2021-11-23 14:44                                 ` David Hildenbrand
2021-11-23 17:00                                   ` Jason Gunthorpe
2021-11-23 17:04                                     ` David Hildenbrand
2021-11-23 22:04                                     ` Vlastimil Babka
2021-11-23 23:59                                       ` Jason Gunthorpe
2021-11-24  8:57                                         ` David Hildenbrand
2021-11-24 13:23                                           ` Jason Gunthorpe
2021-11-24 13:25                                             ` David Hildenbrand
2021-11-24 13:28                                               ` Jason Gunthorpe
2021-11-24 13:29                                                 ` David Hildenbrand
2021-11-24 13:48                                                   ` Jason Gunthorpe
2021-11-24 14:14                                                     ` David Hildenbrand
2021-11-24 15:34                                                       ` Jason Gunthorpe
2021-11-24 16:43                                                         ` David Hildenbrand
2021-11-24 18:35                                                           ` Jason Gunthorpe
2021-11-24 19:09                                                             ` David Hildenbrand
2021-11-24 23:11                                                               ` Jason Gunthorpe
2021-11-30 15:52                                                                 ` David Hildenbrand
2021-11-24 18:37                                                           ` David Hildenbrand
2021-11-24 14:37                                           ` Vlastimil Babka
2021-11-24 14:41                                             ` David Hildenbrand
2021-11-16 18:36         ` Matthew Wilcox
2021-11-16 18:44           ` Drew DeVault
2021-11-16 18:55           ` Jens Axboe
2021-11-16 19:21             ` Vito Caputo
2021-11-16 19:25               ` Drew DeVault
2021-11-16 19:46                 ` Vito Caputo
2021-11-16 19:41               ` Jens Axboe
2021-11-17 22:26         ` Johannes Weiner
2021-11-17 23:17           ` Jens Axboe
2021-11-18 21:58             ` Andrew Morton
2021-11-19  7:41               ` Drew DeVault

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3adc55d3-f383-efa9-7319-740fc6ab5d7a@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=akpm@linux-foundation.org \
    --cc=ammarfaizi2@gnuweeb.org \
    --cc=andrew@donacou.ch \
    --cc=asml.silence@gmail.com \
    --cc=david@redhat.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sir@cmpwn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).