From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f200.google.com (mail-ot0-f200.google.com [74.125.82.200]) by kanga.kvack.org (Postfix) with ESMTP id 773F56B028B for ; Wed, 4 Jul 2018 11:46:11 -0400 (EDT) Received: by mail-ot0-f200.google.com with SMTP id a1-v6so3658110oti.8 for ; Wed, 04 Jul 2018 08:46:11 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id d7-v6sor2762335oia.45.2018.07.04.08.46.09 for (Google Transport Security); Wed, 04 Jul 2018 08:46:09 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20180704075018.GE22503@dhcp22.suse.cz> References: <20180704075018.GE22503@dhcp22.suse.cz> From: Petros Angelatos Date: Wed, 4 Jul 2018 18:45:48 +0300 Message-ID: Subject: Re: Memory cgroup invokes OOM killer when there are a lot of dirty pages Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, Hugh Dickins , Tejun Heo , lstoakes@gmail.com > I assume dd just tried to fault a code page in and that failed due to > the hard limit and unreclaimable memory. The reason why the memcg v1 > oom throttling heuristic hasn't kicked in is that there are no pages > under writeback. This would match symptoms of the bug fixed by > 1c610d5f93c7 ("mm/vmscan: wake up flushers for legacy cgroups too") in > 4.16 but there might be more. You should have that fix already so there > must be something more in the game. You've said that you are using blkio > cgroup, right? What is the configuration? I strongly suspect that none > of the writeback has started because of the throttling. I'm only using a memory cgroup with no blkio restrictions so I'm not sure why writeback hasn't started. Another thing I noticed is that it's a lot harder to reproduce when the same amount of data is written in a single file versus many smaller files. That's why my original example code writes 500 files with 1MB of data. Your mention of writeback gave me the idea to try and do a sync_file_range() with SYNC_FILE_RANGE_WRITE after writing each file to manually schedule writeback and surprisingly it fixed the problem. Is that an indication of a bug in the kernel that doesn't trigger writeback in time? Also, you mentioned that the pagefault is probably due to a code page. Would another remedy be to lock the whole executable and dynamic libraries in memory with mlock() before starting the IO operations? -- Petros Angelatos CTO & Founder, Resin.io BA81 DC1C D900 9B24 2F88 6FDD 4404 DDEE 92BF 1079