netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: kernel panic: Out of memory and no killable processes... (2)
       [not found] <001a1144b0caee2e8c0563d9de0a@google.com>
@ 2018-01-29  0:20 ` Tetsuo Handa
  2018-01-29  7:23   ` [netfilter-core] " Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Tetsuo Handa @ 2018-01-29  0:20 UTC (permalink / raw)
  To: davem, netfilter-devel, coreteam, netdev
  Cc: aarcange, akpm, guro, kirill.shutemov, linux-kernel, linux-mm,
	mhocko, mingo, rientjes, syzkaller-bugs, yang.s

syzbot wrote:
> syzbot hit the following crash on net-next commit
> 6bb46bc57c8e9ce947cc605e555b7204b44d2b10 (Fri Jan 26 16:00:23 2018 +0000)
> Merge branch 'cxgb4-fix-dump-collection-when-firmware-crashed'
> 
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+8630e35fc7287b392aac@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for  
> details.
> If you forward the report, please keep this part and the footer.
> 
> [ 3685]     0  3685    17821        1   184320        0             0 sshd
> [ 3692]     0  3692     4376        0    32768        0             0  
> syzkaller025682
> [ 3695]     0  3695     4376        0    36864        0             0  
> syzkaller025682
> Kernel panic - not syncing: Out of memory and no killable processes...
> 

This sounds like too huge vmalloc() request where size is controlled by userspace.

----------
[   27.738855] syzkaller025682 invoked oom-killer: gfp_mask=0x14002c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), nodemask=(null), order=0, oom_score_adj=0
[   27.754960] syzkaller025682 cpuset=/ mems_allowed=0
[   27.760386] CPU: 0 PID: 3689 Comm: syzkaller025682 Not tainted 4.15.0-rc9+ #212
[   27.767820] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[   27.777166] Call Trace:
[   27.779739]  dump_stack+0x194/0x257
[   27.793194]  dump_header+0x28c/0xe1e
[   27.888997]  oom_kill_process+0x8b5/0x14a0
[   27.981648]  out_of_memory+0x86d/0x1220
[   28.003684]  __alloc_pages_slowpath+0x1d1b/0x2d00
[   28.054140]  __alloc_pages_nodemask+0x9fb/0xd80
[   28.090590]  alloc_pages_current+0xb6/0x1e0
[   28.094927]  __vmalloc_node_range+0x409/0x650
[   28.103837]  __vmalloc_node_flags_caller+0x50/0x60
[   28.113166]  kvmalloc_node+0x82/0xd0
[   28.116869]  xt_alloc_table_info+0x64/0xe0
[   28.121097]  do_ip6t_set_ctl+0x29b/0x5f0
[   28.139158]  nf_setsockopt+0x67/0xc0
[   28.142862]  ipv6_setsockopt+0x115/0x150
[   28.146912]  udpv6_setsockopt+0x45/0x80
[   28.150867]  sock_common_setsockopt+0x95/0xd0
[   28.155359]  SyS_setsockopt+0x189/0x360
[   28.177379]  entry_SYSCALL_64_fastpath+0x29/0xa0
----------

struct xt_table_info *xt_alloc_table_info(unsigned int size)
{
(...snipped...)
	info = kvmalloc(sz, GFP_KERNEL);
(...snipped...)
}

static int
do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
{
        int ret;

        if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
                return -EPERM;

        switch (cmd) {
        case IP6T_SO_SET_REPLACE:
                ret = do_replace(sock_net(sk), user, len);
                break;

        case IP6T_SO_SET_ADD_COUNTERS:
                ret = do_add_counters(sock_net(sk), user, len, 0);
                break;

        default:
                ret = -EINVAL;
        }

        return ret;
}

vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
off when the current task is killed") but then became unkillable by commit
b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
killed""). Therefore, we can't handle this problem from MM side.
Please consider adding some limit from networking side.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-29  0:20 ` kernel panic: Out of memory and no killable processes... (2) Tetsuo Handa
@ 2018-01-29  7:23   ` Florian Westphal
  2018-01-29  8:26     ` Kirill A. Shutemov
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2018-01-29  7:23 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: davem, netfilter-devel, coreteam, netdev, aarcange, yang.s,
	mhocko, syzkaller-bugs, linux-kernel, mingo, linux-mm, rientjes,
	akpm, guro, kirill.shutemov

Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
> syzbot wrote:
> > syzbot hit the following crash on net-next commit
> > 6bb46bc57c8e9ce947cc605e555b7204b44d2b10 (Fri Jan 26 16:00:23 2018 +0000)
> > Merge branch 'cxgb4-fix-dump-collection-when-firmware-crashed'
> > 
> > C reproducer is attached.
> > syzkaller reproducer is attached.
> > Raw console output is attached.
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached.
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+8630e35fc7287b392aac@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for  
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > [ 3685]     0  3685    17821        1   184320        0             0 sshd
> > [ 3692]     0  3692     4376        0    32768        0             0  
> > syzkaller025682
> > [ 3695]     0  3695     4376        0    36864        0             0  
> > syzkaller025682
> > Kernel panic - not syncing: Out of memory and no killable processes...
> > 
> 
> This sounds like too huge vmalloc() request where size is controlled by userspace.

Right.

Before eacd86ca3b036e55e172b7279f101cef4a6ff3a4
this used 

              info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY,

would it help to re-add that?

> vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> off when the current task is killed") but then became unkillable by commit
> b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> killed""). Therefore, we can't handle this problem from MM side.
> Please consider adding some limit from networking side.

I don't know what "some limit" would be.  I would prefer if there was
a way to supress OOM Killer in first place so we can just -ENOMEM user.

AFAIU NOWARN|NORETRY does that, so I would propose to just read-add it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-29  7:23   ` [netfilter-core] " Florian Westphal
@ 2018-01-29  8:26     ` Kirill A. Shutemov
  2018-01-29 16:57       ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Kirill A. Shutemov @ 2018-01-29  8:26 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Tetsuo Handa, davem, netfilter-devel, coreteam, netdev, aarcange,
	yang.s, mhocko, syzkaller-bugs, linux-kernel, mingo, linux-mm,
	rientjes, akpm, guro, kirill.shutemov

On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote:
> > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> > off when the current task is killed") but then became unkillable by commit
> > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> > killed""). Therefore, we can't handle this problem from MM side.
> > Please consider adding some limit from networking side.
> 
> I don't know what "some limit" would be.  I would prefer if there was
> a way to supress OOM Killer in first place so we can just -ENOMEM user.

Just supressing OOM kill is a bad idea. We still leave a way to allocate
arbitrary large buffer in kernel.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-29  8:26     ` Kirill A. Shutemov
@ 2018-01-29 16:57       ` Florian Westphal
  2018-01-29 18:24         ` Jan Engelhardt
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Florian Westphal @ 2018-01-29 16:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Florian Westphal, Tetsuo Handa, davem, netfilter-devel, coreteam,
	netdev, aarcange, yang.s, mhocko, syzkaller-bugs, linux-kernel,
	mingo, linux-mm, rientjes, akpm, guro, kirill.shutemov

Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote:
> > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> > > off when the current task is killed") but then became unkillable by commit
> > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> > > killed""). Therefore, we can't handle this problem from MM side.
> > > Please consider adding some limit from networking side.
> > 
> > I don't know what "some limit" would be.  I would prefer if there was
> > a way to supress OOM Killer in first place so we can just -ENOMEM user.
> 
> Just supressing OOM kill is a bad idea. We still leave a way to allocate
> arbitrary large buffer in kernel.

Isn't that what we do everywhere in network stack?

I think we should try to allocate whatever amount of memory is needed
for the given xtables ruleset, given that is what admin requested us to do.

I also would not know what limit is sane -- I've seen setups with as much
as 100k iptables rules, and that was 5 years ago.

And even if we add a "Xk rules" limit, it might be too much for
low-memory systems, or not enough for whatever other use case there
might be.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-29 16:57       ` Florian Westphal
@ 2018-01-29 18:24         ` Jan Engelhardt
  2018-01-29 18:25         ` Michal Hocko
  2018-01-29 18:28         ` Kirill A. Shutemov
  2 siblings, 0 replies; 23+ messages in thread
From: Jan Engelhardt @ 2018-01-29 18:24 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Kirill A. Shutemov, Tetsuo Handa, davem, netfilter-devel,
	coreteam, netdev, aarcange, yang.s, mhocko, syzkaller-bugs,
	linux-kernel, mingo, linux-mm, rientjes, akpm, guro,
	kirill.shutemov


On Monday 2018-01-29 17:57, Florian Westphal wrote:
>> > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
>> > > off when the current task is killed") but then became unkillable by commit
>> > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
>> > > killed""). Therefore, we can't handle this problem from MM side.
>> > > Please consider adding some limit from networking side.
>> > 
>> > I don't know what "some limit" would be.  I would prefer if there was
>> > a way to supress OOM Killer in first place so we can just -ENOMEM user.
>> 
>> Just supressing OOM kill is a bad idea. We still leave a way to allocate
>> arbitrary large buffer in kernel.

At the very least, mm could limit that kind of "arbitrary". If the machine has
x GB (swap included) and the admin tries to make the kernel allocate space for
an x GB ruleset, no way is it going to be satisfiable _even with OOM_.

>I think we should try to allocate whatever amount of memory is needed
>for the given xtables ruleset, given that is what admin requested us to do.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-29 16:57       ` Florian Westphal
  2018-01-29 18:24         ` Jan Engelhardt
@ 2018-01-29 18:25         ` Michal Hocko
  2018-01-29 18:28         ` Kirill A. Shutemov
  2 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2018-01-29 18:25 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Kirill A. Shutemov, Tetsuo Handa, davem, netfilter-devel,
	coreteam, netdev, aarcange, yang.s, syzkaller-bugs, linux-kernel,
	mingo, linux-mm, rientjes, akpm, guro, kirill.shutemov

On Mon 29-01-18 17:57:22, Florian Westphal wrote:
> Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote:
> > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> > > > off when the current task is killed") but then became unkillable by commit
> > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> > > > killed""). Therefore, we can't handle this problem from MM side.
> > > > Please consider adding some limit from networking side.
> > > 
> > > I don't know what "some limit" would be.  I would prefer if there was
> > > a way to supress OOM Killer in first place so we can just -ENOMEM user.
> > 
> > Just supressing OOM kill is a bad idea. We still leave a way to allocate
> > arbitrary large buffer in kernel.
> 
> Isn't that what we do everywhere in network stack?
> 
> I think we should try to allocate whatever amount of memory is needed
> for the given xtables ruleset, given that is what admin requested us to do.

If this is a root only thing then __GFP_NORETRY sounds like the most
straightforward way to go.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-29 16:57       ` Florian Westphal
  2018-01-29 18:24         ` Jan Engelhardt
  2018-01-29 18:25         ` Michal Hocko
@ 2018-01-29 18:28         ` Kirill A. Shutemov
  2018-01-29 22:35           ` Florian Westphal
  2 siblings, 1 reply; 23+ messages in thread
From: Kirill A. Shutemov @ 2018-01-29 18:28 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Tetsuo Handa, davem, netfilter-devel, coreteam, netdev, aarcange,
	yang.s, mhocko, syzkaller-bugs, linux-kernel, mingo, linux-mm,
	rientjes, akpm, guro, kirill.shutemov

On Mon, Jan 29, 2018 at 05:57:22PM +0100, Florian Westphal wrote:
> Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote:
> > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> > > > off when the current task is killed") but then became unkillable by commit
> > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> > > > killed""). Therefore, we can't handle this problem from MM side.
> > > > Please consider adding some limit from networking side.
> > > 
> > > I don't know what "some limit" would be.  I would prefer if there was
> > > a way to supress OOM Killer in first place so we can just -ENOMEM user.
> > 
> > Just supressing OOM kill is a bad idea. We still leave a way to allocate
> > arbitrary large buffer in kernel.
> 
> Isn't that what we do everywhere in network stack?
> 
> I think we should try to allocate whatever amount of memory is needed
> for the given xtables ruleset, given that is what admin requested us to do.

Is it correct that "admin" in this case is root in random container?
I mean, can we get access to it with CLONE_NEWUSER|CLONE_NEWNET?

This can be fun.

> I also would not know what limit is sane -- I've seen setups with as much
> as 100k iptables rules, and that was 5 years ago.
> 
> And even if we add a "Xk rules" limit, it might be too much for
> low-memory systems, or not enough for whatever other use case there
> might be.

I hate what I'm saying, but I guess we need some tunable here.
Not sure what exactly.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-29 18:28         ` Kirill A. Shutemov
@ 2018-01-29 22:35           ` Florian Westphal
  2018-01-30  7:52             ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2018-01-29 22:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Florian Westphal, Tetsuo Handa, davem, netfilter-devel, coreteam,
	netdev, aarcange, yang.s, mhocko, syzkaller-bugs, linux-kernel,
	mingo, linux-mm, rientjes, akpm, guro, kirill.shutemov

Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Mon, Jan 29, 2018 at 05:57:22PM +0100, Florian Westphal wrote:
> > Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote:
> > > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back
> > > > > off when the current task is killed") but then became unkillable by commit
> > > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is
> > > > > killed""). Therefore, we can't handle this problem from MM side.
> > > > > Please consider adding some limit from networking side.
> > > > 
> > > > I don't know what "some limit" would be.  I would prefer if there was
> > > > a way to supress OOM Killer in first place so we can just -ENOMEM user.
> > > 
> > > Just supressing OOM kill is a bad idea. We still leave a way to allocate
> > > arbitrary large buffer in kernel.
> > 
> > Isn't that what we do everywhere in network stack?
> > 
> > I think we should try to allocate whatever amount of memory is needed
> > for the given xtables ruleset, given that is what admin requested us to do.
> 
> Is it correct that "admin" in this case is root in random container?

Yes.

> I mean, can we get access to it with CLONE_NEWUSER|CLONE_NEWNET?

Yes.

> This can be fun.

Do we prevent "admin in random container" to insert 2m ipv6 routes
(alternatively: ipsec tunnels, interfaces etc etc)?

> > I also would not know what limit is sane -- I've seen setups with as much
> > as 100k iptables rules, and that was 5 years ago.
> > 
> > And even if we add a "Xk rules" limit, it might be too much for
> > low-memory systems, or not enough for whatever other use case there
> > might be.
> 
> I hate what I'm saying, but I guess we need some tunable here.
> Not sure what exactly.

Would memcg help?

(I don't buy the "run untrusted binaries on linux is safe" thing, so
 I would not know).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-29 22:35           ` Florian Westphal
@ 2018-01-30  7:52             ` Michal Hocko
  2018-01-30  8:11               ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-01-30  7:52 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Kirill A. Shutemov, Tetsuo Handa, davem, netfilter-devel,
	coreteam, netdev, aarcange, yang.s, syzkaller-bugs, linux-kernel,
	mingo, linux-mm, rientjes, akpm, guro, kirill.shutemov

On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> Kirill A. Shutemov <kirill@shutemov.name> wrote:
[...]
> > I hate what I'm saying, but I guess we need some tunable here.
> > Not sure what exactly.
> 
> Would memcg help?

That really depends. I would have to check whether vmalloc path obeys
__GFP_ACCOUNT (I suspect it does except for page tables allocations but
that shouldn't be a big deal). But then the other potential problem is
the life time of the xt_table_info (or other potentially large) data
structures. Are they bound to any process life time. Because if they are
not then the OOM killer will not help. The OOM panic earlier in this
thread suggests it doesn't because the test case managed to eat all the
available memory and killed all the eligible tasks which didn't help.

So in some sense the memcg would help to stop the excessive allocation,
but it wouldn't resolve it other than kill all tasks in the affected
memcg/container. Whether this is sufficient or not, I dunno. It sounds
quite suboptimal to me. But it is true this would be less tricky then
adding a global knob...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30  7:52             ` Michal Hocko
@ 2018-01-30  8:11               ` Florian Westphal
  2018-01-30  8:28                 ` Kirill A. Shutemov
  2018-01-30  9:51                 ` Michal Hocko
  0 siblings, 2 replies; 23+ messages in thread
From: Florian Westphal @ 2018-01-30  8:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Florian Westphal, Kirill A. Shutemov, Tetsuo Handa, davem,
	netfilter-devel, coreteam, netdev, aarcange, yang.s,
	syzkaller-bugs, linux-kernel, mingo, linux-mm, rientjes, akpm,
	guro, kirill.shutemov

Michal Hocko <mhocko@kernel.org> wrote:
> On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > Kirill A. Shutemov <kirill@shutemov.name> wrote:
> [...]
> > > I hate what I'm saying, but I guess we need some tunable here.
> > > Not sure what exactly.
> > 
> > Would memcg help?
> 
> That really depends. I would have to check whether vmalloc path obeys
> __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> that shouldn't be a big deal). But then the other potential problem is
> the life time of the xt_table_info (or other potentially large) data
> structures. Are they bound to any process life time.

No.

> Because if they are
> not then the OOM killer will not help. The OOM panic earlier in this
> thread suggests it doesn't because the test case managed to eat all the
> available memory and killed all the eligible tasks which didn't help.

Yes, which is why we do not want any OOM killer invocation in first
place...

> So in some sense the memcg would help to stop the excessive allocation,
> but it wouldn't resolve it other than kill all tasks in the affected
> memcg/container. Whether this is sufficient or not, I dunno. It sounds
> quite suboptimal to me. But it is true this would be less tricky then
> adding a global knob...

Global knob doesn't really help at all, I can add multiple large
iptables rulesets (so we would have to account), and we have same issue
in virtually all of networking, so we need limits for interface count,
tunnel count, ipsec policies/SAs, nftables, tc, etc etc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30  8:11               ` Florian Westphal
@ 2018-01-30  8:28                 ` Kirill A. Shutemov
  2018-01-30  9:02                   ` Dmitry Vyukov
  2018-01-30  9:51                 ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: Kirill A. Shutemov @ 2018-01-30  8:28 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Michal Hocko, Tetsuo Handa, davem, netfilter-devel, coreteam,
	netdev, aarcange, yang.s, syzkaller-bugs, linux-kernel, mingo,
	linux-mm, rientjes, akpm, guro, kirill.shutemov

On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
> Michal Hocko <mhocko@kernel.org> wrote:
> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > > Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > [...]
> > > > I hate what I'm saying, but I guess we need some tunable here.
> > > > Not sure what exactly.
> > > 
> > > Would memcg help?
> > 
> > That really depends. I would have to check whether vmalloc path obeys
> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> > that shouldn't be a big deal). But then the other potential problem is
> > the life time of the xt_table_info (or other potentially large) data
> > structures. Are they bound to any process life time.
> 
> No.

Well, IIUC they bound to net namespace life time, so killing all
proccesses in the namespace would help to get memory back. :)

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30  8:28                 ` Kirill A. Shutemov
@ 2018-01-30  9:02                   ` Dmitry Vyukov
  2018-01-30  9:57                     ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Dmitry Vyukov @ 2018-01-30  9:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Florian Westphal, Michal Hocko, Tetsuo Handa, David Miller,
	netfilter-devel, coreteam, netdev, Andrea Arcangeli, Yang Shi,
	syzkaller-bugs, LKML, Ingo Molnar, Linux-MM, David Rientjes,
	Andrew Morton, guro, Kirill A. Shutemov

On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
>> Michal Hocko <mhocko@kernel.org> wrote:
>> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
>> > > Kirill A. Shutemov <kirill@shutemov.name> wrote:
>> > [...]
>> > > > I hate what I'm saying, but I guess we need some tunable here.
>> > > > Not sure what exactly.
>> > >
>> > > Would memcg help?
>> >
>> > That really depends. I would have to check whether vmalloc path obeys
>> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
>> > that shouldn't be a big deal). But then the other potential problem is
>> > the life time of the xt_table_info (or other potentially large) data
>> > structures. Are they bound to any process life time.
>>
>> No.
>
> Well, IIUC they bound to net namespace life time, so killing all
> proccesses in the namespace would help to get memory back. :)

... unless the namespace is mounted into file system.

Let's start with NOWARN as that's what kernel generally uses for
allocations with user-controllable size. ENOMEM is roughly as
informative as the WARNING message in this case.

I think we also need to consider setting up memory cgroup for
syzkaller test processes (we do RLIMIT_AS, but that's weak).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30  8:11               ` Florian Westphal
  2018-01-30  8:28                 ` Kirill A. Shutemov
@ 2018-01-30  9:51                 ` Michal Hocko
  1 sibling, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2018-01-30  9:51 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Kirill A. Shutemov, Tetsuo Handa, davem, netfilter-devel,
	coreteam, netdev, aarcange, yang.s, syzkaller-bugs, linux-kernel,
	mingo, linux-mm, rientjes, akpm, guro, kirill.shutemov

On Tue 30-01-18 09:11:27, Florian Westphal wrote:
> Michal Hocko <mhocko@kernel.org> wrote:
> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > > Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > [...]
> > > > I hate what I'm saying, but I guess we need some tunable here.
> > > > Not sure what exactly.
> > > 
> > > Would memcg help?
> > 
> > That really depends. I would have to check whether vmalloc path obeys
> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> > that shouldn't be a big deal). But then the other potential problem is
> > the life time of the xt_table_info (or other potentially large) data
> > structures. Are they bound to any process life time.
> 
> No.
> 
> > Because if they are
> > not then the OOM killer will not help. The OOM panic earlier in this
> > thread suggests it doesn't because the test case managed to eat all the
> > available memory and killed all the eligible tasks which didn't help.
> 
> Yes, which is why we do not want any OOM killer invocation in first
> place...

The problem is that as soon as you eat that memory and ask for more
until you fail with ENOMEM then the OOM is simply unavoidable.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30  9:02                   ` Dmitry Vyukov
@ 2018-01-30  9:57                     ` Michal Hocko
  2018-01-30 14:01                       ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-01-30  9:57 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Kirill A. Shutemov, Florian Westphal, Tetsuo Handa, David Miller,
	netfilter-devel, coreteam, netdev, Andrea Arcangeli, Yang Shi,
	syzkaller-bugs, LKML, Ingo Molnar, Linux-MM, David Rientjes,
	Andrew Morton, guro, Kirill A. Shutemov

On Tue 30-01-18 10:02:34, Dmitry Vyukov wrote:
> On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
> >> Michal Hocko <mhocko@kernel.org> wrote:
> >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> >> > > Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >> > [...]
> >> > > > I hate what I'm saying, but I guess we need some tunable here.
> >> > > > Not sure what exactly.
> >> > >
> >> > > Would memcg help?
> >> >
> >> > That really depends. I would have to check whether vmalloc path obeys
> >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> >> > that shouldn't be a big deal). But then the other potential problem is
> >> > the life time of the xt_table_info (or other potentially large) data
> >> > structures. Are they bound to any process life time.
> >>
> >> No.
> >
> > Well, IIUC they bound to net namespace life time, so killing all
> > proccesses in the namespace would help to get memory back. :)
> 
> ... unless the namespace is mounted into file system.
> 
> Let's start with NOWARN as that's what kernel generally uses for
> allocations with user-controllable size. ENOMEM is roughly as
> informative as the WARNING message in this case.

You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc
right now. More specifically kvmalloc doesn't guanratee that the request
will not trigger the OOM killer (like regular __GFP_NORETRY). This is
because of internal vmalloc restrictions. If you are however OK to
simply bail out in most cases then __GFP_NORETRY should work reasonably
fine.

> I think we also need to consider setting up memory cgroup for
> syzkaller test processes (we do RLIMIT_AS, but that's weak).

Well, this is not about syzkaller, it merely pointed out a potential
DoS... And that has to be addressed somehow.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30  9:57                     ` Michal Hocko
@ 2018-01-30 14:01                       ` Michal Hocko
  2018-01-30 14:01                         ` Florian Westphal
  2018-01-30 19:27                         ` Andrew Morton
  0 siblings, 2 replies; 23+ messages in thread
From: Michal Hocko @ 2018-01-30 14:01 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Kirill A. Shutemov, Florian Westphal, Tetsuo Handa, David Miller,
	netfilter-devel, coreteam, netdev, Andrea Arcangeli, Yang Shi,
	syzkaller-bugs, LKML, Ingo Molnar, Linux-MM, David Rientjes,
	Andrew Morton, guro, Kirill A. Shutemov

On Tue 30-01-18 10:57:39, Michal Hocko wrote:
> On Tue 30-01-18 10:02:34, Dmitry Vyukov wrote:
> > On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov
> > <kirill@shutemov.name> wrote:
> > > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
> > >> Michal Hocko <mhocko@kernel.org> wrote:
> > >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > >> > > Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > >> > [...]
> > >> > > > I hate what I'm saying, but I guess we need some tunable here.
> > >> > > > Not sure what exactly.
> > >> > >
> > >> > > Would memcg help?
> > >> >
> > >> > That really depends. I would have to check whether vmalloc path obeys
> > >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> > >> > that shouldn't be a big deal). But then the other potential problem is
> > >> > the life time of the xt_table_info (or other potentially large) data
> > >> > structures. Are they bound to any process life time.
> > >>
> > >> No.
> > >
> > > Well, IIUC they bound to net namespace life time, so killing all
> > > proccesses in the namespace would help to get memory back. :)
> > 
> > ... unless the namespace is mounted into file system.
> > 
> > Let's start with NOWARN as that's what kernel generally uses for
> > allocations with user-controllable size. ENOMEM is roughly as
> > informative as the WARNING message in this case.
> 
> You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc
> right now. More specifically kvmalloc doesn't guanratee that the request
> will not trigger the OOM killer (like regular __GFP_NORETRY). This is
> because of internal vmalloc restrictions. If you are however OK to
> simply bail out in most cases then __GFP_NORETRY should work reasonably
> fine.
> 
> > I think we also need to consider setting up memory cgroup for
> > syzkaller test processes (we do RLIMIT_AS, but that's weak).
> 
> Well, this is not about syzkaller, it merely pointed out a potential
> DoS... And that has to be addressed somehow.

So how about this?
---
>From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Tue, 30 Jan 2018 14:51:07 +0100
Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive

syzbot has noticed that xt_alloc_table_info can allocate a lot of
memory. This is an admin only interface but an admin in a namespace
is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use
kvmalloc() in xt_alloc_table_info()") has changed the opencoded
kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on
the way because vmalloc has simply never fully supported __GFP_NORETRY
semantic. This is still the case because e.g. page tables backing the
vmalloc area are hardcoded GFP_KERNEL.

Revert back to __GFP_NORETRY as a poors man defence against excessively
large allocation request here. We will not rule out the OOM killer
completely but __GFP_NORETRY should at least stop the large request
in most cases.

Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 net/netfilter/x_tables.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d8571f414208..a5f5c29bcbdc 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	info = kvmalloc(sz, GFP_KERNEL);
+	/*
+	 * __GFP_NORETRY is not fully supported by kvmalloc but it should
+	 * work reasonably well if sz is too large and bail out rather
+	 * than shoot all processes down before realizing there is nothing
+	 * more to reclaim.
+	 */
+	info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
 	if (!info)
 		return NULL;
 
-- 
2.15.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30 14:01                       ` Michal Hocko
@ 2018-01-30 14:01                         ` Florian Westphal
  2018-01-30 14:39                           ` Michal Hocko
  2018-01-30 19:27                         ` Andrew Morton
  1 sibling, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2018-01-30 14:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dmitry Vyukov, Kirill A. Shutemov, Florian Westphal,
	Tetsuo Handa, David Miller, netfilter-devel, coreteam, netdev,
	Andrea Arcangeli, Yang Shi, syzkaller-bugs, LKML, Ingo Molnar,
	Linux-MM, David Rientjes, Andrew Morton, guro,
	Kirill A. Shutemov

> From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 30 Jan 2018 14:51:07 +0100
> Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive

Acked-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30 14:01                         ` Florian Westphal
@ 2018-01-30 14:39                           ` Michal Hocko
  2018-02-02 11:41                             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-01-30 14:39 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Dmitry Vyukov, Kirill A. Shutemov, Tetsuo Handa, David Miller,
	netfilter-devel, coreteam, netdev, Andrea Arcangeli, Yang Shi,
	syzkaller-bugs, LKML, Ingo Molnar, Linux-MM, David Rientjes,
	Andrew Morton, guro, Kirill A. Shutemov

On Tue 30-01-18 15:01:11, Florian Westphal wrote:
> > From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Tue, 30 Jan 2018 14:51:07 +0100
> > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive
> 
> Acked-by: Florian Westphal <fw@strlen.de>

Thanks! How should we route this change? Andrew, David?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30 14:01                       ` Michal Hocko
  2018-01-30 14:01                         ` Florian Westphal
@ 2018-01-30 19:27                         ` Andrew Morton
  2018-01-31  8:19                           ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2018-01-30 19:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dmitry Vyukov, Kirill A. Shutemov, Florian Westphal,
	Tetsuo Handa, David Miller, netfilter-devel, coreteam, netdev,
	Andrea Arcangeli, Yang Shi, syzkaller-bugs, LKML, Ingo Molnar,
	Linux-MM, David Rientjes, guro, Kirill A. Shutemov

On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> > Well, this is not about syzkaller, it merely pointed out a potential
> > DoS... And that has to be addressed somehow.
> 
> So how about this?
> ---

argh ;)

> >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 30 Jan 2018 14:51:07 +0100
> Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive
> 
> syzbot has noticed that xt_alloc_table_info can allocate a lot of
> memory. This is an admin only interface but an admin in a namespace
> is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use
> kvmalloc() in xt_alloc_table_info()") has changed the opencoded
> kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on
> the way because vmalloc has simply never fully supported __GFP_NORETRY
> semantic. This is still the case because e.g. page tables backing the
> vmalloc area are hardcoded GFP_KERNEL.
> 
> Revert back to __GFP_NORETRY as a poors man defence against excessively
> large allocation request here. We will not rule out the OOM killer
> completely but __GFP_NORETRY should at least stop the large request
> in most cases.
> 
> Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()")
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  net/netfilter/x_tables.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index d8571f414208..a5f5c29bcbdc 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
>  	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
>  		return NULL;

offtopic: preceding comment here is "prevent them from hitting BUG() in
vmalloc.c".  I suspect this is ancient code and vmalloc sure as heck
shouldn't go BUG with this input.  And it should be using `sz' ;)

So I suspect and hope that this code can be removed.  If not, let's fix
vmalloc!

> -	info = kvmalloc(sz, GFP_KERNEL);
> +	/*
> +	 * __GFP_NORETRY is not fully supported by kvmalloc but it should
> +	 * work reasonably well if sz is too large and bail out rather
> +	 * than shoot all processes down before realizing there is nothing
> +	 * more to reclaim.
> +	 */
> +	info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
>  	if (!info)
>  		return NULL;

checkpatch sayeth

networking block comments don't use an empty /* line, use /* Comment...

So I'll do that and shall scoot the patch Davewards.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30 19:27                         ` Andrew Morton
@ 2018-01-31  8:19                           ` Michal Hocko
  2018-02-07 17:44                             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-01-31  8:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dmitry Vyukov, Kirill A. Shutemov, Florian Westphal,
	Tetsuo Handa, David Miller, netfilter-devel, coreteam, netdev,
	Andrea Arcangeli, Yang Shi, syzkaller-bugs, LKML, Ingo Molnar,
	Linux-MM, David Rientjes, guro, Kirill A. Shutemov

On Tue 30-01-18 11:27:45, Andrew Morton wrote:
> On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > > Well, this is not about syzkaller, it merely pointed out a potential
> > > DoS... And that has to be addressed somehow.
> > 
> > So how about this?
> > ---
> 
> argh ;)

doh, those hardwired moves...

> > >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Tue, 30 Jan 2018 14:51:07 +0100
> > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive
> > 
> > syzbot has noticed that xt_alloc_table_info can allocate a lot of
> > memory. This is an admin only interface but an admin in a namespace
> > is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use
> > kvmalloc() in xt_alloc_table_info()") has changed the opencoded
> > kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on
> > the way because vmalloc has simply never fully supported __GFP_NORETRY
> > semantic. This is still the case because e.g. page tables backing the
> > vmalloc area are hardcoded GFP_KERNEL.
> > 
> > Revert back to __GFP_NORETRY as a poors man defence against excessively
> > large allocation request here. We will not rule out the OOM killer
> > completely but __GFP_NORETRY should at least stop the large request
> > in most cases.
> > 
> > Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()")
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  net/netfilter/x_tables.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index d8571f414208..a5f5c29bcbdc 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
> >  	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
> >  		return NULL;
> 
> offtopic: preceding comment here is "prevent them from hitting BUG() in
> vmalloc.c".  I suspect this is ancient code and vmalloc sure as heck
> shouldn't go BUG with this input.  And it should be using `sz' ;)

Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range.
My excavation tools pointed me to "VM: Rework vmalloc code to support mapping of arbitray pages"
by Christoph back in 2002. So yes, we can safely remove it finally. Se
below.


>From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 31 Jan 2018 09:16:56 +0100
Subject: [PATCH] net/netfilter/x_tables.c: remove size check

Back in 2002 vmalloc used to BUG on too large sizes. We are much better
behaved these days and vmalloc simply returns NULL for those. Remove
the check as it simply not needed and the comment even misleading.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 net/netfilter/x_tables.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index b55ec5aa51a6..48a6ff620493 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if (sz < sizeof(*info))
 		return NULL;
 
-	/* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */
-	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
-		return NULL;
-
 	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
 	 * work reasonably well if sz is too large and bail out rather
 	 * than shoot all processes down before realizing there is nothing
-- 
2.15.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-30 14:39                           ` Michal Hocko
@ 2018-02-02 11:41                             ` Pablo Neira Ayuso
  0 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2018-02-02 11:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Florian Westphal, Dmitry Vyukov, Kirill A. Shutemov,
	Tetsuo Handa, David Miller, netfilter-devel, coreteam, netdev,
	Andrea Arcangeli, Yang Shi, syzkaller-bugs, LKML, Ingo Molnar,
	Linux-MM, David Rientjes, Andrew Morton, guro,
	Kirill A. Shutemov

On Tue, Jan 30, 2018 at 03:39:58PM +0100, Michal Hocko wrote:
> On Tue 30-01-18 15:01:11, Florian Westphal wrote:
> > > From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Tue, 30 Jan 2018 14:51:07 +0100
> > > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive
> > 
> > Acked-by: Florian Westphal <fw@strlen.de>
> 
> Thanks! How should we route this change? Andrew, David?

I'll place this in the nf.git tree if everyone is happy with it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-01-31  8:19                           ` Michal Hocko
@ 2018-02-07 17:44                             ` Pablo Neira Ayuso
  2018-02-07 19:06                               ` Andrew Morton
  0 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2018-02-07 17:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Andrea Arcangeli, Yang Shi, Tetsuo Handa, netdev,
	guro, LKML, Ingo Molnar, syzkaller-bugs, Linux-MM, coreteam,
	netfilter-devel, Kirill A. Shutemov, David Rientjes,
	Kirill A. Shutemov, David Miller, Dmitry Vyukov

Hi,

On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote:
[...]
> Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range.
> My excavation tools pointed me to "VM: Rework vmalloc code to support mapping of arbitray pages"
> by Christoph back in 2002. So yes, we can safely remove it finally. Se
> below.
> 
> 
> From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 31 Jan 2018 09:16:56 +0100
> Subject: [PATCH] net/netfilter/x_tables.c: remove size check
> 
> Back in 2002 vmalloc used to BUG on too large sizes. We are much better
> behaved these days and vmalloc simply returns NULL for those. Remove
> the check as it simply not needed and the comment even misleading.
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  net/netfilter/x_tables.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index b55ec5aa51a6..48a6ff620493 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
>  	if (sz < sizeof(*info))
>  		return NULL;
>  
> -	/* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */
> -	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
> -		return NULL;
> -
>  	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
>  	 * work reasonably well if sz is too large and bail out rather
>  	 * than shoot all processes down before realizing there is nothing

Patchwork didn't catch this patch for some reason, would you mind to
resend?

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-02-07 17:44                             ` Pablo Neira Ayuso
@ 2018-02-07 19:06                               ` Andrew Morton
  2018-02-07 19:15                                 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2018-02-07 19:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Michal Hocko, Andrea Arcangeli, Yang Shi, Tetsuo Handa, netdev,
	guro, LKML, Ingo Molnar, syzkaller-bugs, Linux-MM, coreteam,
	netfilter-devel, Kirill A. Shutemov, David Rientjes,
	Kirill A. Shutemov, David Miller, Dmitry Vyukov

On Wed, 7 Feb 2018 18:44:39 +0100 Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> Hi,
> 
> On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote:
> [...]
> > Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range.
> > My excavation tools pointed me to "VM: Rework vmalloc code to support mapping of arbitray pages"
> > by Christoph back in 2002. So yes, we can safely remove it finally. Se
> > below.
> > 
> > 
> > From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Wed, 31 Jan 2018 09:16:56 +0100
> > Subject: [PATCH] net/netfilter/x_tables.c: remove size check
> > 
> > Back in 2002 vmalloc used to BUG on too large sizes. We are much better
> > behaved these days and vmalloc simply returns NULL for those. Remove
> > the check as it simply not needed and the comment even misleading.
> > 
> > Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  net/netfilter/x_tables.c | 4 ----
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index b55ec5aa51a6..48a6ff620493 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
> >  	if (sz < sizeof(*info))
> >  		return NULL;
> >  
> > -	/* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */
> > -	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
> > -		return NULL;
> > -
> >  	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
> >  	 * work reasonably well if sz is too large and bail out rather
> >  	 * than shoot all processes down before realizing there is nothing
> 
> Patchwork didn't catch this patch for some reason, would you mind to
> resend?

From: Michal Hocko <mhocko@suse.com>
Subject: net/netfilter/x_tables.c: remove size check

Back in 2002 vmalloc used to BUG on too large sizes.  We are much better
behaved these days and vmalloc simply returns NULL for those.  Remove the
check as it simply not needed and the comment is even misleading.

Link: http://lkml.kernel.org/r/20180131081916.GO21609@dhcp22.suse.cz
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 net/netfilter/x_tables.c |    4 ----
 1 file changed, 4 deletions(-)

diff -puN net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check net/netfilter/x_tables.c
--- a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check
+++ a/net/netfilter/x_tables.c
@@ -1004,10 +1004,6 @@ struct xt_table_info *xt_alloc_table_inf
 	if (sz < sizeof(*info))
 		return NULL;
 
-	/* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */
-	if ((size >> PAGE_SHIFT) + 2 > totalram_pages)
-		return NULL;
-
 	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
 	 * work reasonably well if sz is too large and bail out rather
 	 * than shoot all processes down before realizing there is nothing
_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
  2018-02-07 19:06                               ` Andrew Morton
@ 2018-02-07 19:15                                 ` Pablo Neira Ayuso
  0 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2018-02-07 19:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Andrea Arcangeli, Yang Shi, Tetsuo Handa, netdev,
	guro, LKML, Ingo Molnar, syzkaller-bugs, Linux-MM, coreteam,
	netfilter-devel, Kirill A. Shutemov, David Rientjes,
	Kirill A. Shutemov, David Miller, Dmitry Vyukov

On Wed, Feb 07, 2018 at 11:06:42AM -0800, Andrew Morton wrote:
> From: Michal Hocko <mhocko@suse.com>
> Subject: net/netfilter/x_tables.c: remove size check
> 
> Back in 2002 vmalloc used to BUG on too large sizes.  We are much better
> behaved these days and vmalloc simply returns NULL for those.  Remove the
> check as it simply not needed and the comment is even misleading.

Applied, thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-02-07 19:15 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <001a1144b0caee2e8c0563d9de0a@google.com>
2018-01-29  0:20 ` kernel panic: Out of memory and no killable processes... (2) Tetsuo Handa
2018-01-29  7:23   ` [netfilter-core] " Florian Westphal
2018-01-29  8:26     ` Kirill A. Shutemov
2018-01-29 16:57       ` Florian Westphal
2018-01-29 18:24         ` Jan Engelhardt
2018-01-29 18:25         ` Michal Hocko
2018-01-29 18:28         ` Kirill A. Shutemov
2018-01-29 22:35           ` Florian Westphal
2018-01-30  7:52             ` Michal Hocko
2018-01-30  8:11               ` Florian Westphal
2018-01-30  8:28                 ` Kirill A. Shutemov
2018-01-30  9:02                   ` Dmitry Vyukov
2018-01-30  9:57                     ` Michal Hocko
2018-01-30 14:01                       ` Michal Hocko
2018-01-30 14:01                         ` Florian Westphal
2018-01-30 14:39                           ` Michal Hocko
2018-02-02 11:41                             ` Pablo Neira Ayuso
2018-01-30 19:27                         ` Andrew Morton
2018-01-31  8:19                           ` Michal Hocko
2018-02-07 17:44                             ` Pablo Neira Ayuso
2018-02-07 19:06                               ` Andrew Morton
2018-02-07 19:15                                 ` Pablo Neira Ayuso
2018-01-30  9:51                 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).