From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Shakeel Butt <shakeelb@google.com>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
cgroups@vger.kernel.org, netdev@vger.kernel.org,
kernel-team@fb.com
Subject: [PATCH] mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges
Date: Tue, 22 Oct 2019 19:37:08 -0400 [thread overview]
Message-ID: <20191022233708.365764-1-hannes@cmpxchg.org> (raw)
While upgrading from 4.16 to 5.2, we noticed these allocation errors
in the log of the new kernel:
[ 8642.253395] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
[ 8642.269170] cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0
[ 8642.293009] node 0: slabs: 5, objs: 170, free: 0
slab_out_of_memory+1
___slab_alloc+969
__slab_alloc+14
kmem_cache_alloc+346
inet_twsk_alloc+60
tcp_time_wait+46
tcp_fin+206
tcp_data_queue+2034
tcp_rcv_state_process+784
tcp_v6_do_rcv+405
__release_sock+118
tcp_close+385
inet_release+46
__sock_release+55
sock_close+17
__fput+170
task_work_run+127
exit_to_usermode_loop+191
do_syscall_64+212
entry_SYSCALL_64_after_hwframe+68
accompanied by an increase in machines going completely radio silent
under memory pressure.
One thing that changed since 4.16 is e699e2c6a654 ("net, mm: account
sock objects to kmemcg"), which made these slab caches subject to
cgroup memory accounting and control.
The problem with that is that cgroups, unlike the page allocator, do
not maintain dedicated atomic reserves. As a cgroup's usage hovers at
its limit, atomic allocations - such as done during network rx - can
fail consistently for extended periods of time. The kernel is not able
to operate under these conditions.
We don't want to revert the culprit patch, because it indeed tracks a
potentially substantial amount of memory used by a cgroup.
We also don't want to implement dedicated atomic reserves for cgroups.
There is no point in keeping a fixed margin of unused bytes in the
cgroup's memory budget to accomodate a consumer that is impossible to
predict - we'd be wasting memory and get into configuration headaches,
not unlike what we have going with min_free_kbytes. We do this for
physical mem because we have to, but cgroups are an accounting game.
Instead, account these privileged allocations to the cgroup, but let
them bypass the configured limit if they have to. This way, we get the
benefits of accounting the consumed memory and have it exert pressure
on the rest of the cgroup, but like with the page allocator, we shift
the burden of reclaimining on behalf of atomic allocations onto the
regular allocations that can block.
Cc: stable@kernel.org # 4.18+
Fixes: e699e2c6a654 ("net, mm: account sock objects to kmemcg")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8090b4c99ac7..c7e3e758c165 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2528,6 +2528,15 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
goto retry;
}
+ /*
+ * Memcg doesn't have a dedicated reserve for atomic
+ * allocations. But like the global atomic pool, we need to
+ * put the burden of reclaim on regular allocation requests
+ * and let these go through as privileged allocations.
+ */
+ if (gfp_mask & __GFP_ATOMIC)
+ goto force;
+
/*
* Unlike in global OOM situations, memcg is not in a physical
* memory shortage. Allow dying and OOM-killed tasks to
--
2.23.0
next reply other threads:[~2019-10-22 23:37 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-22 23:37 Johannes Weiner [this message]
2019-10-23 0:08 ` [PATCH] mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges Shakeel Butt
2019-10-23 0:08 ` Shakeel Butt
2019-10-23 6:40 ` Michal Hocko
2019-10-23 15:46 ` Johannes Weiner
2019-10-23 17:38 ` Shakeel Butt
2019-10-23 17:38 ` Shakeel Butt
2019-10-24 8:14 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191022233708.365764-1-hannes@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=netdev@vger.kernel.org \
--cc=shakeelb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.