All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mina Almasry <almasrymina@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Hugh Dickins <hughd@google.com>, Shuah Khan <shuah@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Greg Thelen <gthelen@google.com>,
	Dave Chinner <david@fromorbit.com>, Roman Gushchin <guro@fb.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v4 0/4] Deterministic charging of shared memory
Date: Fri, 19 Nov 2021 21:27:34 -0800	[thread overview]
Message-ID: <CAHS8izM-DpKs1=-F04-8Qp55VFYC57zERpKkFBk4_QNsOmnOoQ@mail.gmail.com> (raw)
In-Reply-To: <YZiBH6GxlkFFuyqa@casper.infradead.org>

On Fri, Nov 19, 2021 at 9:01 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Nov 19, 2021 at 08:50:06PM -0800, Mina Almasry wrote:
> > 1. One complication to address is the behavior when the target memcg
> > hits its memory.max limit because of remote charging. In this case the
> > oom-killer will be invoked, but the oom-killer may not find anything
> > to kill in the target memcg being charged. Thera are a number of considerations
> > in this case:
> >
> > 1. It's not great to kill the allocating process since the allocating process
> >    is not running in the memcg under oom, and killing it will not free memory
> >    in the memcg under oom.
> > 2. Pagefaults may hit the memcg limit, and we need to handle the pagefault
> >    somehow. If not, the process will forever loop the pagefault in the upstream
> >    kernel.
> >
> > In this case, I propose simply failing the remote charge and returning an ENOSPC
> > to the caller. This will cause will cause the process executing the remote
> > charge to get an ENOSPC in non-pagefault paths, and get a SIGBUS on the pagefault
> > path.  This will be documented behavior of remote charging, and this feature is
> > opt-in. Users can:
> > - Not opt-into the feature if they want.
> > - Opt-into the feature and accept the risk of received ENOSPC or SIGBUS and
> >   abort if they desire.
> > - Gracefully handle any resulting ENOSPC or SIGBUS errors and continue their
> >   operation without executing the remote charge if possible.
>
> Why is ENOSPC the right error instead of ENOMEM?

Returning ENOMEM from mem_cgroup_charge_mapping() will cause the
application to get ENOMEM from non-pagefault paths (which is perfectly
fine), and get stuck in a loop trying to resolve the pagefault in the
pagefault path (less fine). The logic is here:
https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/fault.c#L1432

ENOMEM gets bubbled up here as VM_FAULT_OOM and on remote charges the
behavior I see is that the kernel loops the pagefault forever until
memory is freed in the remote memcg, and it may never will.

ENOSPC gets bubbled up here as a VM_FAULT_SIGBUS and and sends a
SIGBUS to the allocating process. The conjecture here is that it's
preferred to send a SIGBUS to the allocating process rather than have
it be stuck in a loop trying to resolve a pagefault.

  reply	other threads:[~2021-11-20  5:27 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-20  4:50 [PATCH v4 0/4] Deterministic charging of shared memory Mina Almasry
2021-11-20  4:50 ` Mina Almasry
2021-11-20  4:50 ` [PATCH v4 1/4] mm: support deterministic memory charging of filesystems Mina Almasry
2021-11-20  4:50   ` Mina Almasry
2021-11-20  7:53   ` Shakeel Butt
2021-11-20  4:50 ` [PATCH v4 2/4] mm/oom: handle remote ooms Mina Almasry
2021-11-20  5:07   ` Matthew Wilcox
2021-11-20  5:07     ` Matthew Wilcox
2021-11-20  5:31     ` Mina Almasry
2021-11-20  7:58   ` Shakeel Butt
2021-11-20  7:58     ` Shakeel Butt
2021-11-20  4:50 ` [PATCH v4 3/4] mm, shmem: add filesystem memcg= option documentation Mina Almasry
2021-11-20  4:50 ` [PATCH v4 4/4] mm, shmem, selftests: add tmpfs memcg= mount option tests Mina Almasry
2021-11-20  5:01 ` [PATCH v4 0/4] Deterministic charging of shared memory Matthew Wilcox
2021-11-20  5:27   ` Mina Almasry [this message]
2021-11-22 19:04 ` Johannes Weiner
2021-11-22 22:09   ` Mina Almasry
2021-11-22 23:09   ` Roman Gushchin
2021-11-23 19:26     ` Mina Almasry
2021-11-23 20:21     ` Johannes Weiner
2021-11-23 21:19       ` Mina Almasry
2021-11-23 22:49         ` Roman Gushchin
2021-11-24 17:27   ` Michal Hocko
2021-11-29  6:00   ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHS8izM-DpKs1=-F04-8Qp55VFYC57zERpKkFBk4_QNsOmnOoQ@mail.gmail.com' \
    --to=almasrymina@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tytso@mit.edu \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.