From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_PASS, URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39C82C43387 for ; Fri, 14 Dec 2018 16:53:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D911F2070C for ; Fri, 14 Dec 2018 16:53:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1544806388; bh=VCbsPU3xIkQ7zqLDTviSp2pSx0jSjyVEWUyDdu2e1wo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=xAcKV4wGuWijMpZoJ2Ur6nc2mjLuii5Cms4aq9/rzqIgv4w8ihw4AUM+GhTk+jKQc BwYrURhgxtpN/m/uqcTZmAFh2NKvtdo2XOFB+r2tAGj1wH15XK1za1D2XyRcb3FEUC M8YqaAW3OXqg6xPDhK29nOwaB0Q4piBxUMxKkxh4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729968AbeLNQxG (ORCPT ); Fri, 14 Dec 2018 11:53:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:54504 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729409AbeLNQxF (ORCPT ); Fri, 14 Dec 2018 11:53:05 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 22CA5ADD8; Fri, 14 Dec 2018 16:53:02 +0000 (UTC) Date: Fri, 14 Dec 2018 17:52:58 +0100 From: Michal Hocko To: Dmitry Vyukov Cc: Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , David Miller , NetFilter , netdev , syzbot , Andrew Morton , LKML , Paul McKenney , Tetsuo Handa , rafael.j.wysocki@intel.com, syzkaller-bugs , vkuznets@redhat.com, Linux-MM Subject: Re: general protection fault in watchdog Message-ID: <20181214165258.GI5343@dhcp22.suse.cz> References: <0000000000004ea80b057cfae21e@google.com> <20181214132836.GE5343@dhcp22.suse.cz> <20181214135419.GG5343@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 14-12-18 15:31:44, Dmitry Vyukov wrote: > On Fri, Dec 14, 2018 at 2:54 PM Michal Hocko wrote: > > > > On Fri 14-12-18 14:42:33, Dmitry Vyukov wrote: > > > On Fri, Dec 14, 2018 at 2:28 PM Michal Hocko wrote: > > > > > > > > On Fri 14-12-18 14:11:05, Dmitry Vyukov wrote: > > > > > On Fri, Dec 14, 2018 at 1:51 PM syzbot > > > > > wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > syzbot found the following crash on: > > > > > > > > > > > > HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... > > > > > > git tree: upstream > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=16aca143400000 > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7713f3aa67be76b1552c > > > > > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1131381b400000 > > > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13bae593400000 > > > > > > > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > > > > Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com > > > > > > > > > > +linux-mm for memcg question > > > > > > > > > > What the repro does is effectively just > > > > > setsockopt(EBT_SO_SET_ENTRIES). This eats all machine memory and > > > > > causes OOMs. Somehow it also caused the GPF in watchdog when it > > > > > iterates over task list, perhaps some scheduler code leaves a dangling > > > > > pointer on OOM failures. > > > > > > > > > > But what bothers me is a different thing. syzkaller test processes are > > > > > sandboxed with a restrictive memcg which should prevent them from > > > > > eating all memory. do_replace_finish calls vmalloc, which uses > > > > > GFP_KERNEL, which does not include GFP_ACCOUNT (GFP_KERNEL_ACCOUNT > > > > > does). And page alloc seems to change memory against memcg iff > > > > > GFP_ACCOUNT is provided. > > > > > Am I missing something or vmalloc is indeed not accounted (DoS)? I see > > > > > some explicit uses of GFP_KERNEL_ACCOUNT, e.g. the one below, but they > > > > > seem to be very sparse. > > > > > > > > > > static void *seq_buf_alloc(unsigned long size) > > > > > { > > > > > return kvmalloc(size, GFP_KERNEL_ACCOUNT); > > > > > } > > > > > > > > > > Now looking at the code I also don't see how kmalloc(GFP_KERNEL) is > > > > > accounted... Which makes me think I am still missing something. > > > > > > > > You are not missing anything. We do not account all allocations and you > > > > have to explicitly opt-in by __GFP_ACCOUNT. This is a deliberate > > > > decision. If the allocation is directly controlable by an untrusted user > > > > and the memory is associated with a process life time then this looks > > > > like a good usecase for __GFP_ACCOUNT. If an allocation outlives a > > > > process then there the flag should be considered with a great care > > > > because oom killer is not able to resolve the memcg pressure and so the > > > > limit enforcement is not effective. > > > > > > Interesting. > > > I understand that namespaces, memcg's and processes (maybe even > > > threads) can have arbitrary overlapping. But I naively thought that in > > > canonical hierarchical cases it should all somehow work. > > > Question 1: is there some other, stricter sandboxing mechanism? > > > > I do not think so > > > > > We try > > > to sandbox syzkaller processes with everything available , because > > > these OOMs usually leads either to dead machines or hang/stall false > > > positives, which are nasty. > > > > Which is a useful test on its own. If you are able to trigger the global > > OOM from a restricted environment then you have a good candidate to > > consider a new __GFP_ACCOUNT user. > > > > > Question 2: this is a simple DoS vector, right? If I put a container > > > into a 1MB memcg, it can still eat arbitrary amount of non-pagable > > > kernel memory? > > > > As I've said. If there is a direct vector to allocated an unbounded > > amount of memory from the userspace (trusted users aside) then yes this > > sounds like a DoS to me. > > +netfilter maintainers for this easy DoS vector > short story: vmalloc in do_replace_finish allows unbounded memory > allocation not accounted to memcg Haven't we discussed that recently? > Looks pretty unbounded: > > [ 763.451796] syz-executor681: vmalloc: allocation failure, allocated > 1027440640 of 1879052288 bytes, mode:0x6000c0(GFP_KERNEL), > nodemask=(null) > > But how does this play with what you said about memory outliving > process? Netfilter tables can definitely outlive the process, they are > attached to net ns. Not very well, but it is not hopeless either. The charged memory wouldn't go away with oom victims so it will stay there until somebody destroys those objects. On the other hand any process within that memcg would be killed so a DoS should be somehow contained. > Also, am I reading this correctly that potentially thousands of kernel > memory allocations need to be converted to ACCOUNT? I mean for small > ones we maybe care less, but they _should_ be accounted. They can also > build up, or just simply allow small repeated allocations. > > GFP_KERNEL Referenced in 11070 files: > https://elixir.bootlin.com/linux/latest/ident/GFP_KERNEL > > GFP_KERNEL_ACCOUNT Referenced in 19 files: > https://elixir.bootlin.com/linux/latest/ident/GFP_KERNEL_ACCOUNT Yes, we do not care about all kernel objects. Most of them are contained by some high-level objects already. The purpose of the kmem accounting is to limit those objects that can grow really large and are reclaimable in some way. -- Michal Hocko SUSE Labs