linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] netfilter: account ebt_table_info to kmemcg
@ 2018-12-29  1:55 Shakeel Butt
  2018-12-29  7:33 ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Shakeel Butt @ 2018-12-29  1:55 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Michal Hocko, Andrew Morton
  Cc: linux-mm, netfilter-devel, coreteam, bridge, linux-kernel,
	Shakeel Butt, syzbot+7713f3aa67be76b1552c

The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
memory is already accounted to kmemcg. Do the same for ebtables. The
syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
whole system from a restricted memcg, a potential DoS.

Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 net/bridge/netfilter/ebtables.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 491828713e0b..5e55cef0cec3 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user,
 	tmp.name[sizeof(tmp.name) - 1] = 0;
 
 	countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
-	newinfo = vmalloc(sizeof(*newinfo) + countersize);
+	newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
+			    PAGE_KERNEL);
 	if (!newinfo)
 		return -ENOMEM;
 
 	if (countersize)
 		memset(newinfo->counters, 0, countersize);
 
-	newinfo->entries = vmalloc(tmp.entries_size);
+	newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT,
+				     PAGE_KERNEL);
 	if (!newinfo->entries) {
 		ret = -ENOMEM;
 		goto free_newinfo;
-- 
2.20.1.415.g653613c723-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-29  1:55 [PATCH] netfilter: account ebt_table_info to kmemcg Shakeel Butt
@ 2018-12-29  7:33 ` Michal Hocko
  2018-12-29  9:52   ` Florian Westphal
  2018-12-29  9:52   ` Kirill Tkhai
  0 siblings, 2 replies; 14+ messages in thread
From: Michal Hocko @ 2018-12-29  7:33 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Pablo Neira Ayuso, Florian Westphal, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, linux-mm,
	netfilter-devel, coreteam, bridge, linux-kernel,
	syzbot+7713f3aa67be76b1552c

On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> memory is already accounted to kmemcg. Do the same for ebtables. The
> syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> whole system from a restricted memcg, a potential DoS.

What is the lifetime of these objects? Are they bound to any process?

> Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
>  net/bridge/netfilter/ebtables.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
> index 491828713e0b..5e55cef0cec3 100644
> --- a/net/bridge/netfilter/ebtables.c
> +++ b/net/bridge/netfilter/ebtables.c
> @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user,
>  	tmp.name[sizeof(tmp.name) - 1] = 0;
>  
>  	countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
> -	newinfo = vmalloc(sizeof(*newinfo) + countersize);
> +	newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
> +			    PAGE_KERNEL);
>  	if (!newinfo)
>  		return -ENOMEM;
>  
>  	if (countersize)
>  		memset(newinfo->counters, 0, countersize);
>  
> -	newinfo->entries = vmalloc(tmp.entries_size);
> +	newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT,
> +				     PAGE_KERNEL);
>  	if (!newinfo->entries) {
>  		ret = -ENOMEM;
>  		goto free_newinfo;
> -- 
> 2.20.1.415.g653613c723-goog
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-29  7:33 ` Michal Hocko
@ 2018-12-29  9:52   ` Florian Westphal
  2018-12-29 10:06     ` Michal Hocko
  2018-12-29  9:52   ` Kirill Tkhai
  1 sibling, 1 reply; 14+ messages in thread
From: Florian Westphal @ 2018-12-29  9:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Shakeel Butt, Pablo Neira Ayuso, Florian Westphal,
	Jozsef Kadlecsik, Roopa Prabhu, Nikolay Aleksandrov,
	Andrew Morton, linux-mm, netfilter-devel, coreteam, bridge,
	linux-kernel, syzbot+7713f3aa67be76b1552c

Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > memory is already accounted to kmemcg. Do the same for ebtables. The
> > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > whole system from a restricted memcg, a potential DoS.
> 
> What is the lifetime of these objects? Are they bound to any process?

No, they are not.
They are free'd only when userspace requests it or the netns is
destroyed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-29  7:33 ` Michal Hocko
  2018-12-29  9:52   ` Florian Westphal
@ 2018-12-29  9:52   ` Kirill Tkhai
  2018-12-29 19:39     ` Shakeel Butt
  1 sibling, 1 reply; 14+ messages in thread
From: Kirill Tkhai @ 2018-12-29  9:52 UTC (permalink / raw)
  To: Michal Hocko, Shakeel Butt
  Cc: Pablo Neira Ayuso, Florian Westphal, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, linux-mm,
	netfilter-devel, coreteam, bridge, linux-kernel,
	syzbot+7713f3aa67be76b1552c

Hi, Michal!

On 29.12.2018 10:33, Michal Hocko wrote:
> On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
>> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
>> memory is already accounted to kmemcg. Do the same for ebtables. The
>> syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
>> whole system from a restricted memcg, a potential DoS.
> 
> What is the lifetime of these objects? Are they bound to any process?

These are list of ebtables rules, which may be displayed with $ebtables-save command.
In case of we do not account them, a low priority container may eat all the memory
and OOM killer in berserk mode will kill all the processes on machine. They are not bound
to any process, but they are bound to network namespace.

OOM killer does not analyze such the memory cgroup-related allocations, since it
is task-aware only. Maybe we should do it namespace-aware too...

Kirill

>> Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com
>> Signed-off-by: Shakeel Butt <shakeelb@google.com>
>> ---
>>  net/bridge/netfilter/ebtables.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
>> index 491828713e0b..5e55cef0cec3 100644
>> --- a/net/bridge/netfilter/ebtables.c
>> +++ b/net/bridge/netfilter/ebtables.c
>> @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user,
>>  	tmp.name[sizeof(tmp.name) - 1] = 0;
>>  
>>  	countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
>> -	newinfo = vmalloc(sizeof(*newinfo) + countersize);
>> +	newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT,
>> +			    PAGE_KERNEL);
>>  	if (!newinfo)
>>  		return -ENOMEM;
>>  
>>  	if (countersize)
>>  		memset(newinfo->counters, 0, countersize);
>>  
>> -	newinfo->entries = vmalloc(tmp.entries_size);
>> +	newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT,
>> +				     PAGE_KERNEL);
>>  	if (!newinfo->entries) {
>>  		ret = -ENOMEM;
>>  		goto free_newinfo;
>> -- 
>> 2.20.1.415.g653613c723-goog
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-29  9:52   ` Florian Westphal
@ 2018-12-29 10:06     ` Michal Hocko
  2018-12-29 19:34       ` Shakeel Butt
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-12-29 10:06 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Shakeel Butt, Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
	Nikolay Aleksandrov, Andrew Morton, linux-mm, netfilter-devel,
	coreteam, bridge, linux-kernel, syzbot+7713f3aa67be76b1552c

On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> Michal Hocko <mhocko@kernel.org> wrote:
> > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > > memory is already accounted to kmemcg. Do the same for ebtables. The
> > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > > whole system from a restricted memcg, a potential DoS.
> > 
> > What is the lifetime of these objects? Are they bound to any process?
> 
> No, they are not.
> They are free'd only when userspace requests it or the netns is
> destroyed.

Then this is problematic, because the oom killer is not able to
guarantee the hard limit and so the excessive memory consumption cannot
be really contained. As a result the memcg will be basically useless
until somebody tears down the charged objects by other means. The memcg
oom killer will surely kill all the existing tasks in the cgroup and
this could somehow reduce the problem. Maybe this is sufficient for
some usecases but that should be properly analyzed and described in the
changelog.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-29 10:06     ` Michal Hocko
@ 2018-12-29 19:34       ` Shakeel Butt
  2018-12-30  7:45         ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Shakeel Butt @ 2018-12-29 19:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, Linux MM,
	netfilter-devel, coreteam, bridge, LKML,
	syzbot+7713f3aa67be76b1552c

On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> > Michal Hocko <mhocko@kernel.org> wrote:
> > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > > > memory is already accounted to kmemcg. Do the same for ebtables. The
> > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > > > whole system from a restricted memcg, a potential DoS.
> > >
> > > What is the lifetime of these objects? Are they bound to any process?
> >
> > No, they are not.
> > They are free'd only when userspace requests it or the netns is
> > destroyed.
>
> Then this is problematic, because the oom killer is not able to
> guarantee the hard limit and so the excessive memory consumption cannot
> be really contained. As a result the memcg will be basically useless
> until somebody tears down the charged objects by other means. The memcg
> oom killer will surely kill all the existing tasks in the cgroup and
> this could somehow reduce the problem. Maybe this is sufficient for
> some usecases but that should be properly analyzed and described in the
> changelog.
>

Can you explain why you think the memcg hard limit will not be
enforced? From what I understand, the memcg oom-killer will kill the
allocating processes as you have mentioned. We do force charging for
very limited conditions but here the memcg oom-killer will take care
of

Anyways, the kernel is already charging the memory for
[ip,ip6,arp]_tables and this patch adds the charging for ebtables.
Without this patch, as Kirill has described and shown by syzbot, a low
priority memcg can force system OOM.

Shakeel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-29  9:52   ` Kirill Tkhai
@ 2018-12-29 19:39     ` Shakeel Butt
  0 siblings, 0 replies; 14+ messages in thread
From: Shakeel Butt @ 2018-12-29 19:39 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Michal Hocko, Pablo Neira Ayuso, Florian Westphal,
	Jozsef Kadlecsik, Roopa Prabhu, Nikolay Aleksandrov,
	Andrew Morton, Linux MM, netfilter-devel, coreteam, bridge, LKML,
	syzbot+7713f3aa67be76b1552c

Hi Kirill,

On Sat, Dec 29, 2018 at 1:52 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>
> Hi, Michal!
>
> On 29.12.2018 10:33, Michal Hocko wrote:
> > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> >> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> >> memory is already accounted to kmemcg. Do the same for ebtables. The
> >> syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> >> whole system from a restricted memcg, a potential DoS.
> >
> > What is the lifetime of these objects? Are they bound to any process?
>
> These are list of ebtables rules, which may be displayed with $ebtables-save command.
> In case of we do not account them, a low priority container may eat all the memory
> and OOM killer in berserk mode will kill all the processes on machine. They are not bound
> to any process, but they are bound to network namespace.
>
> OOM killer does not analyze such the memory cgroup-related allocations, since it
> is task-aware only. Maybe we should do it namespace-aware too...

This is a good idea. I am already brainstorming on a somewhat similar
idea to make shmem/tmpfs files oom-killable. I will share once I have
something more concrete and will think on namespace angle too.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-29 19:34       ` Shakeel Butt
@ 2018-12-30  7:45         ` Michal Hocko
  2018-12-30  8:00           ` Michal Hocko
  2018-12-31  4:00           ` Shakeel Butt
  0 siblings, 2 replies; 14+ messages in thread
From: Michal Hocko @ 2018-12-30  7:45 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, Linux MM,
	netfilter-devel, coreteam, bridge, LKML,
	syzbot+7713f3aa67be76b1552c

On Sat 29-12-18 11:34:29, Shakeel Butt wrote:
> On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> > > Michal Hocko <mhocko@kernel.org> wrote:
> > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > > > > memory is already accounted to kmemcg. Do the same for ebtables. The
> > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > > > > whole system from a restricted memcg, a potential DoS.
> > > >
> > > > What is the lifetime of these objects? Are they bound to any process?
> > >
> > > No, they are not.
> > > They are free'd only when userspace requests it or the netns is
> > > destroyed.
> >
> > Then this is problematic, because the oom killer is not able to
> > guarantee the hard limit and so the excessive memory consumption cannot
> > be really contained. As a result the memcg will be basically useless
> > until somebody tears down the charged objects by other means. The memcg
> > oom killer will surely kill all the existing tasks in the cgroup and
> > this could somehow reduce the problem. Maybe this is sufficient for
> > some usecases but that should be properly analyzed and described in the
> > changelog.
> >
> 
> Can you explain why you think the memcg hard limit will not be
> enforced? From what I understand, the memcg oom-killer will kill the
> allocating processes as you have mentioned. We do force charging for
> very limited conditions but here the memcg oom-killer will take care
> of

I was talking about the force charge part. Depending on a specific
allocation and its life time this can gradually get us over hard limit
without any bound theoretically.

> Anyways, the kernel is already charging the memory for
> [ip,ip6,arp]_tables and this patch adds the charging for ebtables.
> Without this patch, as Kirill has described and shown by syzbot, a low
> priority memcg can force system OOM.

I am not opposing the patch per-se. I would just like the changelog to
be more descriptive about the life time and consequences.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-30  7:45         ` Michal Hocko
@ 2018-12-30  8:00           ` Michal Hocko
  2018-12-31  3:59             ` Shakeel Butt
  2018-12-31  4:00           ` Shakeel Butt
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-12-30  8:00 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, Linux MM,
	netfilter-devel, coreteam, bridge, LKML,
	syzbot+7713f3aa67be76b1552c

On Sun 30-12-18 08:45:13, Michal Hocko wrote:
> On Sat 29-12-18 11:34:29, Shakeel Butt wrote:
> > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> > > > Michal Hocko <mhocko@kernel.org> wrote:
> > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The
> > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > > > > > whole system from a restricted memcg, a potential DoS.
> > > > >
> > > > > What is the lifetime of these objects? Are they bound to any process?
> > > >
> > > > No, they are not.
> > > > They are free'd only when userspace requests it or the netns is
> > > > destroyed.
> > >
> > > Then this is problematic, because the oom killer is not able to
> > > guarantee the hard limit and so the excessive memory consumption cannot
> > > be really contained. As a result the memcg will be basically useless
> > > until somebody tears down the charged objects by other means. The memcg
> > > oom killer will surely kill all the existing tasks in the cgroup and
> > > this could somehow reduce the problem. Maybe this is sufficient for
> > > some usecases but that should be properly analyzed and described in the
> > > changelog.
> > >
> > 
> > Can you explain why you think the memcg hard limit will not be
> > enforced? From what I understand, the memcg oom-killer will kill the
> > allocating processes as you have mentioned. We do force charging for
> > very limited conditions but here the memcg oom-killer will take care
> > of
> 
> I was talking about the force charge part. Depending on a specific
> allocation and its life time this can gradually get us over hard limit
> without any bound theoretically.

Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when
the current task is killed"") there is no way to bail out from the
vmalloc allocation loop so if the request is really large then the memcg
oom will not help. Is that a problem here?

Maybe it is time to revisit fatal_signal_pending check.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-30  8:00           ` Michal Hocko
@ 2018-12-31  3:59             ` Shakeel Butt
  2018-12-31 10:11               ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Shakeel Butt @ 2018-12-31  3:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, Linux MM,
	netfilter-devel, coreteam, bridge, LKML,
	syzbot+7713f3aa67be76b1552c

On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Sun 30-12-18 08:45:13, Michal Hocko wrote:
> > On Sat 29-12-18 11:34:29, Shakeel Butt wrote:
> > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> > > > > Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The
> > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > > > > > > whole system from a restricted memcg, a potential DoS.
> > > > > >
> > > > > > What is the lifetime of these objects? Are they bound to any process?
> > > > >
> > > > > No, they are not.
> > > > > They are free'd only when userspace requests it or the netns is
> > > > > destroyed.
> > > >
> > > > Then this is problematic, because the oom killer is not able to
> > > > guarantee the hard limit and so the excessive memory consumption cannot
> > > > be really contained. As a result the memcg will be basically useless
> > > > until somebody tears down the charged objects by other means. The memcg
> > > > oom killer will surely kill all the existing tasks in the cgroup and
> > > > this could somehow reduce the problem. Maybe this is sufficient for
> > > > some usecases but that should be properly analyzed and described in the
> > > > changelog.
> > > >
> > >
> > > Can you explain why you think the memcg hard limit will not be
> > > enforced? From what I understand, the memcg oom-killer will kill the
> > > allocating processes as you have mentioned. We do force charging for
> > > very limited conditions but here the memcg oom-killer will take care
> > > of
> >
> > I was talking about the force charge part. Depending on a specific
> > allocation and its life time this can gradually get us over hard limit
> > without any bound theoretically.
>
> Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when
> the current task is killed"") there is no way to bail out from the
> vmalloc allocation loop so if the request is really large then the memcg
> oom will not help. Is that a problem here?
>

Yes, I think it will be an issue here.

> Maybe it is time to revisit fatal_signal_pending check.

Yes, we will need something to handle the memcg OOM. I will think more
on that front or if you have any ideas, please do propose.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-30  7:45         ` Michal Hocko
  2018-12-30  8:00           ` Michal Hocko
@ 2018-12-31  4:00           ` Shakeel Butt
  1 sibling, 0 replies; 14+ messages in thread
From: Shakeel Butt @ 2018-12-31  4:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, Linux MM,
	netfilter-devel, coreteam, bridge, LKML,
	syzbot+7713f3aa67be76b1552c

On Sat, Dec 29, 2018 at 11:45 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Sat 29-12-18 11:34:29, Shakeel Butt wrote:
> > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> > > > Michal Hocko <mhocko@kernel.org> wrote:
> > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The
> > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > > > > > whole system from a restricted memcg, a potential DoS.
> > > > >
> > > > > What is the lifetime of these objects? Are they bound to any process?
> > > >
> > > > No, they are not.
> > > > They are free'd only when userspace requests it or the netns is
> > > > destroyed.
> > >
> > > Then this is problematic, because the oom killer is not able to
> > > guarantee the hard limit and so the excessive memory consumption cannot
> > > be really contained. As a result the memcg will be basically useless
> > > until somebody tears down the charged objects by other means. The memcg
> > > oom killer will surely kill all the existing tasks in the cgroup and
> > > this could somehow reduce the problem. Maybe this is sufficient for
> > > some usecases but that should be properly analyzed and described in the
> > > changelog.
> > >
> >
> > Can you explain why you think the memcg hard limit will not be
> > enforced? From what I understand, the memcg oom-killer will kill the
> > allocating processes as you have mentioned. We do force charging for
> > very limited conditions but here the memcg oom-killer will take care
> > of
>
> I was talking about the force charge part. Depending on a specific
> allocation and its life time this can gradually get us over hard limit
> without any bound theoretically.
>
> > Anyways, the kernel is already charging the memory for
> > [ip,ip6,arp]_tables and this patch adds the charging for ebtables.
> > Without this patch, as Kirill has described and shown by syzbot, a low
> > priority memcg can force system OOM.
>
> I am not opposing the patch per-se. I would just like the changelog to
> be more descriptive about the life time and consequences.
> --

I will resend the patch with more detailed change log.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-31  3:59             ` Shakeel Butt
@ 2018-12-31 10:11               ` Michal Hocko
  2019-01-03 20:52                 ` Shakeel Butt
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-12-31 10:11 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, Linux MM,
	netfilter-devel, coreteam, bridge, LKML,
	syzbot+7713f3aa67be76b1552c

On Sun 30-12-18 19:59:53, Shakeel Butt wrote:
> On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Sun 30-12-18 08:45:13, Michal Hocko wrote:
> > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote:
> > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> > > > > > Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The
> > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > > > > > > > whole system from a restricted memcg, a potential DoS.
> > > > > > >
> > > > > > > What is the lifetime of these objects? Are they bound to any process?
> > > > > >
> > > > > > No, they are not.
> > > > > > They are free'd only when userspace requests it or the netns is
> > > > > > destroyed.
> > > > >
> > > > > Then this is problematic, because the oom killer is not able to
> > > > > guarantee the hard limit and so the excessive memory consumption cannot
> > > > > be really contained. As a result the memcg will be basically useless
> > > > > until somebody tears down the charged objects by other means. The memcg
> > > > > oom killer will surely kill all the existing tasks in the cgroup and
> > > > > this could somehow reduce the problem. Maybe this is sufficient for
> > > > > some usecases but that should be properly analyzed and described in the
> > > > > changelog.
> > > > >
> > > >
> > > > Can you explain why you think the memcg hard limit will not be
> > > > enforced? From what I understand, the memcg oom-killer will kill the
> > > > allocating processes as you have mentioned. We do force charging for
> > > > very limited conditions but here the memcg oom-killer will take care
> > > > of
> > >
> > > I was talking about the force charge part. Depending on a specific
> > > allocation and its life time this can gradually get us over hard limit
> > > without any bound theoretically.
> >
> > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when
> > the current task is killed"") there is no way to bail out from the
> > vmalloc allocation loop so if the request is really large then the memcg
> > oom will not help. Is that a problem here?
> >
> 
> Yes, I think it will be an issue here.
> 
> > Maybe it is time to revisit fatal_signal_pending check.
> 
> Yes, we will need something to handle the memcg OOM. I will think more
> on that front or if you have any ideas, please do propose.

I can see three options here:
	- do not force charge on memcg oom or introduce a limited charge
	  overflow (reserves basically).
	- revert the revert and reintroduce the fatal_signal_pending
	  check into vmalloc
	- be more specific and check tsk_is_oom_victim in vmalloc and
	  fail

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2018-12-31 10:11               ` Michal Hocko
@ 2019-01-03 20:52                 ` Shakeel Butt
  2019-01-04 13:21                   ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Shakeel Butt @ 2019-01-03 20:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, Linux MM,
	netfilter-devel, coreteam, bridge, LKML,
	syzbot+7713f3aa67be76b1552c

On Mon, Dec 31, 2018 at 2:12 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Sun 30-12-18 19:59:53, Shakeel Butt wrote:
> > On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Sun 30-12-18 08:45:13, Michal Hocko wrote:
> > > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote:
> > > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > >
> > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> > > > > > > Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The
> > > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > > > > > > > > whole system from a restricted memcg, a potential DoS.
> > > > > > > >
> > > > > > > > What is the lifetime of these objects? Are they bound to any process?
> > > > > > >
> > > > > > > No, they are not.
> > > > > > > They are free'd only when userspace requests it or the netns is
> > > > > > > destroyed.
> > > > > >
> > > > > > Then this is problematic, because the oom killer is not able to
> > > > > > guarantee the hard limit and so the excessive memory consumption cannot
> > > > > > be really contained. As a result the memcg will be basically useless
> > > > > > until somebody tears down the charged objects by other means. The memcg
> > > > > > oom killer will surely kill all the existing tasks in the cgroup and
> > > > > > this could somehow reduce the problem. Maybe this is sufficient for
> > > > > > some usecases but that should be properly analyzed and described in the
> > > > > > changelog.
> > > > > >
> > > > >
> > > > > Can you explain why you think the memcg hard limit will not be
> > > > > enforced? From what I understand, the memcg oom-killer will kill the
> > > > > allocating processes as you have mentioned. We do force charging for
> > > > > very limited conditions but here the memcg oom-killer will take care
> > > > > of
> > > >
> > > > I was talking about the force charge part. Depending on a specific
> > > > allocation and its life time this can gradually get us over hard limit
> > > > without any bound theoretically.
> > >
> > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when
> > > the current task is killed"") there is no way to bail out from the
> > > vmalloc allocation loop so if the request is really large then the memcg
> > > oom will not help. Is that a problem here?
> > >
> >
> > Yes, I think it will be an issue here.
> >
> > > Maybe it is time to revisit fatal_signal_pending check.
> >
> > Yes, we will need something to handle the memcg OOM. I will think more
> > on that front or if you have any ideas, please do propose.
>
> I can see three options here:
>         - do not force charge on memcg oom or introduce a limited charge
>           overflow (reserves basically).
>         - revert the revert and reintroduce the fatal_signal_pending
>           check into vmalloc
>         - be more specific and check tsk_is_oom_victim in vmalloc and
>           fail
>

I think for the long term solution we might need something similar to
memcg oom reserves (1) but for quick fix I think we can do the
combination of (2) and (3).

Shakeel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfilter: account ebt_table_info to kmemcg
  2019-01-03 20:52                 ` Shakeel Butt
@ 2019-01-04 13:21                   ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2019-01-04 13:21 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Roopa Prabhu, Nikolay Aleksandrov, Andrew Morton, Linux MM,
	netfilter-devel, coreteam, bridge, LKML,
	syzbot+7713f3aa67be76b1552c

On Thu 03-01-19 12:52:54, Shakeel Butt wrote:
> On Mon, Dec 31, 2018 at 2:12 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Sun 30-12-18 19:59:53, Shakeel Butt wrote:
> > > On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Sun 30-12-18 08:45:13, Michal Hocko wrote:
> > > > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote:
> > > > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > >
> > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> > > > > > > > Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
> > > > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The
> > > > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
> > > > > > > > > > whole system from a restricted memcg, a potential DoS.
> > > > > > > > >
> > > > > > > > > What is the lifetime of these objects? Are they bound to any process?
> > > > > > > >
> > > > > > > > No, they are not.
> > > > > > > > They are free'd only when userspace requests it or the netns is
> > > > > > > > destroyed.
> > > > > > >
> > > > > > > Then this is problematic, because the oom killer is not able to
> > > > > > > guarantee the hard limit and so the excessive memory consumption cannot
> > > > > > > be really contained. As a result the memcg will be basically useless
> > > > > > > until somebody tears down the charged objects by other means. The memcg
> > > > > > > oom killer will surely kill all the existing tasks in the cgroup and
> > > > > > > this could somehow reduce the problem. Maybe this is sufficient for
> > > > > > > some usecases but that should be properly analyzed and described in the
> > > > > > > changelog.
> > > > > > >
> > > > > >
> > > > > > Can you explain why you think the memcg hard limit will not be
> > > > > > enforced? From what I understand, the memcg oom-killer will kill the
> > > > > > allocating processes as you have mentioned. We do force charging for
> > > > > > very limited conditions but here the memcg oom-killer will take care
> > > > > > of
> > > > >
> > > > > I was talking about the force charge part. Depending on a specific
> > > > > allocation and its life time this can gradually get us over hard limit
> > > > > without any bound theoretically.
> > > >
> > > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when
> > > > the current task is killed"") there is no way to bail out from the
> > > > vmalloc allocation loop so if the request is really large then the memcg
> > > > oom will not help. Is that a problem here?
> > > >
> > >
> > > Yes, I think it will be an issue here.
> > >
> > > > Maybe it is time to revisit fatal_signal_pending check.
> > >
> > > Yes, we will need something to handle the memcg OOM. I will think more
> > > on that front or if you have any ideas, please do propose.
> >
> > I can see three options here:
> >         - do not force charge on memcg oom or introduce a limited charge
> >           overflow (reserves basically).
> >         - revert the revert and reintroduce the fatal_signal_pending
> >           check into vmalloc
> >         - be more specific and check tsk_is_oom_victim in vmalloc and
> >           fail
> >
> 
> I think for the long term solution we might need something similar to
> memcg oom reserves (1) but for quick fix I think we can do the
> combination of (2) and (3).

Johannes argued that fatal_signal_pending is too general check for
vmalloc. I would argue that we already break out of some operations on
fatal signals. tsk_is_oom_victim is more subtle but much more targeted
on the other hand.

I do not have any strong preference to be honest but I agree that some
limited reserves would be the best solution long term. I just do not
have any idea how to scale those reserves to be meaningful.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-01-04 13:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-29  1:55 [PATCH] netfilter: account ebt_table_info to kmemcg Shakeel Butt
2018-12-29  7:33 ` Michal Hocko
2018-12-29  9:52   ` Florian Westphal
2018-12-29 10:06     ` Michal Hocko
2018-12-29 19:34       ` Shakeel Butt
2018-12-30  7:45         ` Michal Hocko
2018-12-30  8:00           ` Michal Hocko
2018-12-31  3:59             ` Shakeel Butt
2018-12-31 10:11               ` Michal Hocko
2019-01-03 20:52                 ` Shakeel Butt
2019-01-04 13:21                   ` Michal Hocko
2018-12-31  4:00           ` Shakeel Butt
2018-12-29  9:52   ` Kirill Tkhai
2018-12-29 19:39     ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).