linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yasunori Goto <y-goto@fujitsu.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-rdma@vger.kernel.org, Doug Ledford <dledford@redhat.com>
Subject: Re: [PATCH] RDMA/core: EPERM should be returned when # of pined pages is over ulimit
Date: Fri, 20 Aug 2021 17:45:54 +0900	[thread overview]
Message-ID: <e3cb3dee-9c32-8024-1396-8dfd975a7b23@fujitsu.com> (raw)
In-Reply-To: <f784a0c6-27b7-5e30-b3ba-e1f4ebe95399@fujitsu.com>



On 2021/08/20 9:36, Yasunori Goto wrote:
> 
> 
> On 2021/08/20 8:10, Jason Gunthorpe wrote:
>> On Wed, Aug 18, 2021 at 05:27:02PM +0900, Yasunori Goto wrote:
>>> Hello,
>>>
>>> When I started to use SoftRoCE, I'm very confused by
>>> ENOMEM error output even if I gave enough memory.
>>>
>>> I think EPERM is more suitable for uses to solve error rather than
>>> ENOMEM at here of ib_umem_get() when # of pinned pages is over ulimit.
>>> This is not "memory is not enough" problem, because driver can
>>> succeed to pin enough amount of pages, but it is larger than ulimit 
>>> value.
>>>
>>> The hard limit of "max locked memory" can be changed by limit.conf.
>>> In addition, this checks also CAP_IPC_LOCK, it is indeed permmission 
>>> check.
>>> So, I think the following patch.
>>>
>>> If there is a intention why ENOMEM is used here, please let me know.
>>> Otherwise, I'm glad if this is merged.
>>>
>>> Thanks.
>>>
>>>
>>> ---
>>> When # of pinned pages are larger than ulimit of "max locked memory"
>>> without CAP_IPC_LOCK, current ib_umem_get() returns ENOMEM.
>>> But it does not mean "not enough memory", because driver could 
>>> succeed to
>>> pinned enough pages.
>>> This is just capability error. Even if a normal user is limited
>>> his/her # of pinned pages, system administrator can give permission
>>> by change hard limit of this ulimit value.
>>> To notify correct information to user, ib_umem_get()
>>> should return EPERM instead of ENOMEM at here.
>>
>> I'm not convinced, can you find other places checking the ulimit and
>> list what codes they return?
> 
> Hmm, OK.
> 
> I'll investigate it.

After the investigation, I found the followings.

- Many codes return ENOMEM in kernel/driver.
- Only one exception I could find is perf_mmap() in kernel/events/core.c
   It returns EPERM.

----
static int perf_mmap(struct file *file, struct vm_area_struct *vma)
{
    :
    :
         lock_limit = rlimit(RLIMIT_MEMLOCK);
         lock_limit >>= PAGE_SHIFT;
         locked = atomic64_read(&vma->vm_mm->pinned_vm) + extra;

         if ((locked > lock_limit) && perf_is_paranoid() &&
                 !capable(CAP_IPC_LOCK)) {
                 ret = -EPERM; <----!!!
                 goto unlock;
         }
----

- The man pages of mlock(2) says the followings. This seems to be cause
   why ENOMEM is returned in many place.
----
ENOMEM (Linux  2.6.9  and later) the caller had a nonzero RLIMIT_MEMLOCK
        soft resource limit, but tried to lock more memory than the limit
        permitted.   This  limit  is  not  enforced  if  the  process  is
        privileged (CAP_IPC_LOCK).
---

- In addition, POSIX specification(*) also says the followings at
   mlock(2).
---
[ENOMEM]
Locking the pages mapped by the specified range would exceed an
implementation-defined limit on the amount of memory that the process
may lock.
----
(*) https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/

So, I changed my mind now. ib_umem_get() should return ENOMEM.

However, I want to provide some information to make it easy for users to 
understand. For example, sev_pin_memory() of arch/x86/kvm/svm/sev.c 
outputs error message like the followings.

---
static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
    :
    :
         if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
                 pr_err("SEV: %lu locked pages exceed the lock limit of 
%lu.\n", locked, lock_limit);
                 return ERR_PTR(-ENOMEM);
         }
---

I think it is better than nothing. How do you think?

Thanks,
-- -
Yasunori Goto

  reply	other threads:[~2021-08-20  8:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18  8:27 [PATCH] RDMA/core: EPERM should be returned when # of pined pages is over ulimit Yasunori Goto
2021-08-19 23:10 ` Jason Gunthorpe
2021-08-20  0:36   ` Yasunori Goto
2021-08-20  8:45     ` Yasunori Goto [this message]
2021-08-26 13:32       ` Jason Gunthorpe
2021-08-27  0:08         ` Gotou, Yasunori/五島 康文

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3cb3dee-9c32-8024-1396-8dfd975a7b23@fujitsu.com \
    --to=y-goto@fujitsu.com \
    --cc=dledford@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).