From: Yasunori Goto <y-goto@fujitsu.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-rdma@vger.kernel.org, Doug Ledford <dledford@redhat.com>
Subject: Re: [PATCH] RDMA/core: EPERM should be returned when # of pined pages is over ulimit
Date: Fri, 20 Aug 2021 17:45:54 +0900 [thread overview]
Message-ID: <e3cb3dee-9c32-8024-1396-8dfd975a7b23@fujitsu.com> (raw)
In-Reply-To: <f784a0c6-27b7-5e30-b3ba-e1f4ebe95399@fujitsu.com>
On 2021/08/20 9:36, Yasunori Goto wrote:
>
>
> On 2021/08/20 8:10, Jason Gunthorpe wrote:
>> On Wed, Aug 18, 2021 at 05:27:02PM +0900, Yasunori Goto wrote:
>>> Hello,
>>>
>>> When I started to use SoftRoCE, I'm very confused by
>>> ENOMEM error output even if I gave enough memory.
>>>
>>> I think EPERM is more suitable for uses to solve error rather than
>>> ENOMEM at here of ib_umem_get() when # of pinned pages is over ulimit.
>>> This is not "memory is not enough" problem, because driver can
>>> succeed to pin enough amount of pages, but it is larger than ulimit
>>> value.
>>>
>>> The hard limit of "max locked memory" can be changed by limit.conf.
>>> In addition, this checks also CAP_IPC_LOCK, it is indeed permmission
>>> check.
>>> So, I think the following patch.
>>>
>>> If there is a intention why ENOMEM is used here, please let me know.
>>> Otherwise, I'm glad if this is merged.
>>>
>>> Thanks.
>>>
>>>
>>> ---
>>> When # of pinned pages are larger than ulimit of "max locked memory"
>>> without CAP_IPC_LOCK, current ib_umem_get() returns ENOMEM.
>>> But it does not mean "not enough memory", because driver could
>>> succeed to
>>> pinned enough pages.
>>> This is just capability error. Even if a normal user is limited
>>> his/her # of pinned pages, system administrator can give permission
>>> by change hard limit of this ulimit value.
>>> To notify correct information to user, ib_umem_get()
>>> should return EPERM instead of ENOMEM at here.
>>
>> I'm not convinced, can you find other places checking the ulimit and
>> list what codes they return?
>
> Hmm, OK.
>
> I'll investigate it.
After the investigation, I found the followings.
- Many codes return ENOMEM in kernel/driver.
- Only one exception I could find is perf_mmap() in kernel/events/core.c
It returns EPERM.
----
static int perf_mmap(struct file *file, struct vm_area_struct *vma)
{
:
:
lock_limit = rlimit(RLIMIT_MEMLOCK);
lock_limit >>= PAGE_SHIFT;
locked = atomic64_read(&vma->vm_mm->pinned_vm) + extra;
if ((locked > lock_limit) && perf_is_paranoid() &&
!capable(CAP_IPC_LOCK)) {
ret = -EPERM; <----!!!
goto unlock;
}
----
- The man pages of mlock(2) says the followings. This seems to be cause
why ENOMEM is returned in many place.
----
ENOMEM (Linux 2.6.9 and later) the caller had a nonzero RLIMIT_MEMLOCK
soft resource limit, but tried to lock more memory than the limit
permitted. This limit is not enforced if the process is
privileged (CAP_IPC_LOCK).
---
- In addition, POSIX specification(*) also says the followings at
mlock(2).
---
[ENOMEM]
Locking the pages mapped by the specified range would exceed an
implementation-defined limit on the amount of memory that the process
may lock.
----
(*) https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/
So, I changed my mind now. ib_umem_get() should return ENOMEM.
However, I want to provide some information to make it easy for users to
understand. For example, sev_pin_memory() of arch/x86/kvm/svm/sev.c
outputs error message like the followings.
---
static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
:
:
if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
pr_err("SEV: %lu locked pages exceed the lock limit of
%lu.\n", locked, lock_limit);
return ERR_PTR(-ENOMEM);
}
---
I think it is better than nothing. How do you think?
Thanks,
-- -
Yasunori Goto
next prev parent reply other threads:[~2021-08-20 8:46 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-18 8:27 [PATCH] RDMA/core: EPERM should be returned when # of pined pages is over ulimit Yasunori Goto
2021-08-19 23:10 ` Jason Gunthorpe
2021-08-20 0:36 ` Yasunori Goto
2021-08-20 8:45 ` Yasunori Goto [this message]
2021-08-26 13:32 ` Jason Gunthorpe
2021-08-27 0:08 ` Gotou, Yasunori/五島 康文
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e3cb3dee-9c32-8024-1396-8dfd975a7b23@fujitsu.com \
--to=y-goto@fujitsu.com \
--cc=dledford@redhat.com \
--cc=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).