All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Kadashev <dkadashev@gmail.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>, Josef <josef.grieb@gmail.com>,
	Norman Maurer <norman.maurer@googlemail.com>,
	io-uring <io-uring@vger.kernel.org>
Subject: Re: "Cannot allocate memory" on ring creation (not RLIMIT_MEMLOCK)
Date: Wed, 23 Dec 2020 18:48:46 +0700	[thread overview]
Message-ID: <CAOKbgA4eihm=MyiVZSG03cxjks6=yw5eTr-dCBXmhQWmkK4YEg@mail.gmail.com> (raw)
In-Reply-To: <CAOKbgA70CtfmM7-yFRcGTzJdgoF41MQt7mLC7L_s8jcnrtkB=Q@mail.gmail.com>

On Wed, Dec 23, 2020 at 4:38 PM Dmitry Kadashev <dkadashev@gmail.com> wrote:
>
> On Wed, Dec 23, 2020 at 3:39 PM Dmitry Kadashev <dkadashev@gmail.com> wrote:
> >
> > On Tue, Dec 22, 2020 at 11:37 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
> > >
> > > On 22/12/2020 11:04, Dmitry Kadashev wrote:
> > > > On Tue, Dec 22, 2020 at 11:11 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> > > [...]
> > > >>> What about smaller rings? Can you check io_uring of what SQ size it can allocate?
> > > >>> That can be a different program, e.g. modify a bit liburing/test/nop.
> > > > Unfortunately I've rebooted the box I've used for tests yesterday, so I can't
> > > > try this there. Also I was not able to come up with an isolated reproducer for
> > > > this yet.
> > > >
> > > > The good news is I've found a relatively easy way to provoke this on a test VM
> > > > using our software. Our app runs with "admin" user perms (plus some
> > > > capabilities), it bumps RLIMIT_MEMLOCK to infinity on start. I've also created
> > > > an user called 'ioutest' to run the check for ring sizes using a different user.
> > > >
> > > > I've modified the test program slightly, to show the number of rings
> > > > successfully
> > > > created on each iteration and the actual error message (to debug a problem I was
> > > > having with it, but I've kept this after that). Here is the output:
> > > >
> > > > # sudo -u admin bash -c 'ulimit -a' | grep locked
> > > > max locked memory       (kbytes, -l) 1024
> > > >
> > > > # sudo -u ioutest bash -c 'ulimit -a' | grep locked
> > > > max locked memory       (kbytes, -l) 1024
> > > >
> > > > # sudo -u admin ./iou-test1
> > > > Failed after 0 rings with 1024 size: Cannot allocate memory
> > > > Failed after 0 rings with 512 size: Cannot allocate memory
> > > > Failed after 0 rings with 256 size: Cannot allocate memory
> > > > Failed after 0 rings with 128 size: Cannot allocate memory
> > > > Failed after 0 rings with 64 size: Cannot allocate memory
> > > > Failed after 0 rings with 32 size: Cannot allocate memory
> > > > Failed after 0 rings with 16 size: Cannot allocate memory
> > > > Failed after 0 rings with 8 size: Cannot allocate memory
> > > > Failed after 0 rings with 4 size: Cannot allocate memory
> > > > Failed after 0 rings with 2 size: Cannot allocate memory
> > > > can't allocate 1
> > > >
> > > > # sudo -u ioutest ./iou-test1
> > > > max size 1024
> > >
> > > Then we screw that specific user. Interestingly, if it has CAP_IPC_LOCK
> > > capability we don't even account locked memory.
> >
> > We do have some capabilities, but not CAP_IPC_LOCK. Ours are:
> >
> > CAP_NET_ADMIN, CAP_NET_BIND_SERVICE, CAP_SYS_RESOURCE, CAP_KILL,
> > CAP_DAC_READ_SEARCH.
> >
> > The latter was necessary for integration with some third-party thing that we do
> > not really use anymore, so we can try building without it, but it'd require some
> > time, mostly because I'm not sure how quickly I'd be able to provoke the issue.
> >
> > > btw, do you use registered buffers?
> >
> > No, we do not use neither registered buffers nor registered files (nor anything
> > else).
> >
> > Also, I just tried the test program on a real box (this time one instance of our
> > program is still running - can repeat the check with it dead, but I expect the
> > results to be pretty much the same, at least after a few more restarts). This
> > box runs 5.9.5.
> >
> > # sudo -u admin bash -c 'ulimit -l'
> > 1024
> >
> > # sudo -u admin ./iou-test1
> > Failed after 0 rings with 1024 size: Cannot allocate memory
> > Failed after 0 rings with 512 size: Cannot allocate memory
> > Failed after 0 rings with 256 size: Cannot allocate memory
> > Failed after 0 rings with 128 size: Cannot allocate memory
> > Failed after 0 rings with 64 size: Cannot allocate memory
> > Failed after 0 rings with 32 size: Cannot allocate memory
> > Failed after 0 rings with 16 size: Cannot allocate memory
> > Failed after 0 rings with 8 size: Cannot allocate memory
> > Failed after 0 rings with 4 size: Cannot allocate memory
> > Failed after 0 rings with 2 size: Cannot allocate memory
> > can't allocate 1
> >
> > # sudo -u dmitry bash -c 'ulimit -l'
> > 1024
> >
> > # sudo -u dmitry ./iou-test1
> > max size 1024
>
> Please ignore the results from the real box above (5.9.5). The memlock limit
> interfered with this, since our app was running in the background and it had a
> few rings running (most failed to be created, but not all). I'll try to make it
> fully stuck and repeat the test with the app dead.

I've experimented with the 5.9 live boxes that were showing signs of the problem
a bit more, and I'm not entirely sure they get stuck until reboot anymore.

I'm pretty sure it is the case with 5.6, but probably a bug was fixed since
then - the fact that 5.8 in particular had quite a few fixes that seemed
relevant is the reason we've tried 5.9 in the first place.

And on 5.9 we might be seeing fragmentation issues indeed. I shouldn't have been
mixing my kernel versions :) Also, I did not realize a ring of size=1024
requires 16 contiguous pages. We will experiment and observe a bit more, and
meanwhile let's consider the case closed. If the issue surfaces again I'll
update this thread.

Thanks a *lot* Pavel for helping to debug this issue.

And sorry for the false alarm / noise everyone.

-- 
Dmitry Kadashev

  reply	other threads:[~2020-12-23 11:50 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17  8:19 "Cannot allocate memory" on ring creation (not RLIMIT_MEMLOCK) Dmitry Kadashev
2020-12-17  8:26 ` Norman Maurer
2020-12-17  8:36   ` Dmitry Kadashev
2020-12-17  8:40     ` Dmitry Kadashev
2020-12-17 10:38       ` Josef
2020-12-17 11:10         ` Dmitry Kadashev
2020-12-17 13:43           ` Victor Stewart
2020-12-18  9:20             ` Dmitry Kadashev
2020-12-18 17:22               ` Jens Axboe
2020-12-18 15:26 ` Jens Axboe
2020-12-18 17:21   ` Josef
2020-12-18 17:23     ` Jens Axboe
2020-12-19  2:49       ` Josef
2020-12-19 16:13         ` Jens Axboe
2020-12-19 16:29           ` Jens Axboe
2020-12-19 17:11             ` Jens Axboe
2020-12-19 17:34               ` Norman Maurer
2020-12-19 17:38                 ` Jens Axboe
2020-12-19 20:51                   ` Josef
2020-12-19 21:54                     ` Jens Axboe
2020-12-19 23:13                       ` Jens Axboe
2020-12-19 23:42                         ` Josef
2020-12-19 23:42                         ` Pavel Begunkov
2020-12-20  0:25                           ` Jens Axboe
2020-12-20  0:55                             ` Pavel Begunkov
2020-12-21 10:35                               ` Dmitry Kadashev
2020-12-21 10:49                                 ` Dmitry Kadashev
2020-12-21 11:00                                 ` Dmitry Kadashev
2020-12-21 15:36                                   ` Pavel Begunkov
2020-12-22  3:35                                   ` Pavel Begunkov
2020-12-22  4:07                                     ` Pavel Begunkov
2020-12-22 11:04                                       ` Dmitry Kadashev
2020-12-22 11:06                                         ` Dmitry Kadashev
2020-12-22 13:13                                           ` Dmitry Kadashev
2020-12-22 16:33                                         ` Pavel Begunkov
2020-12-23  8:39                                           ` Dmitry Kadashev
2020-12-23  9:38                                             ` Dmitry Kadashev
2020-12-23 11:48                                               ` Dmitry Kadashev [this message]
2020-12-23 12:27                                                 ` Pavel Begunkov
2020-12-20  1:57                             ` Pavel Begunkov
2020-12-20  7:13                               ` Josef
2020-12-20 13:00                                 ` Pavel Begunkov
2020-12-20 14:19                                   ` Pavel Begunkov
2020-12-20 15:56                                     ` Josef
2020-12-20 15:58                                       ` Pavel Begunkov
2020-12-20 16:14                                   ` Jens Axboe
2020-12-20 16:59                                     ` Josef
2020-12-20 18:23                                       ` Josef
2020-12-20 18:41                                         ` Pavel Begunkov
2020-12-21  8:22                                           ` Josef
2020-12-21 15:30                                             ` Pavel Begunkov
2020-12-21 10:31               ` Dmitry Kadashev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOKbgA4eihm=MyiVZSG03cxjks6=yw5eTr-dCBXmhQWmkK4YEg@mail.gmail.com' \
    --to=dkadashev@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=josef.grieb@gmail.com \
    --cc=norman.maurer@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.