From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f197.google.com (mail-io0-f197.google.com [209.85.223.197]) by kanga.kvack.org (Postfix) with ESMTP id 4404B6B0003 for ; Tue, 13 Feb 2018 06:58:58 -0500 (EST) Received: by mail-io0-f197.google.com with SMTP id z6so17047502iob.3 for ; Tue, 13 Feb 2018 03:58:58 -0800 (PST) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id y80si715746iod.4.2018.02.13.03.58.55 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 13 Feb 2018 03:58:56 -0800 (PST) Subject: Re: [PATCH] mm,oom: Don't call schedule_timeout_killable() with oom_lock held. From: Tetsuo Handa References: <1516628782-3524-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <20180123083806.GF1526@dhcp22.suse.cz> <201801232107.HJB48975.OHJFFOOLFQMVSt@I-love.SAKURA.ne.jp> <20180123124245.GK1526@dhcp22.suse.cz> <201801242228.FAD52671.SFFLQMOVOFHOtJ@I-love.SAKURA.ne.jp> In-Reply-To: <201801242228.FAD52671.SFFLQMOVOFHOtJ@I-love.SAKURA.ne.jp> Message-Id: <201802132058.HAG51540.QFtSLOJFOOFVMH@I-love.SAKURA.ne.jp> Date: Tue, 13 Feb 2018 20:58:38 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: mhocko@kernel.org Cc: akpm@linux-foundation.org, linux-mm@kvack.org, rientjes@google.com, hannes@cmpxchg.org, guro@fb.com, tj@kernel.org, vdavydov.dev@gmail.com, torvalds@linux-foundation.org Michal, if you are busy, can I test my version until you get time? Tetsuo Handa wrote: > Michal Hocko wrote: > > On Tue 23-01-18 21:07:03, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > To be completely host, I am not in love with this > > > > schedule_timeout_uninterruptible(1). It is an ugly hack. It used to be > > > > much more important in the past when the oom victim test was too > > > > fragile. I strongly suspect that it is not needed this days so rather > > > > than moving the sleep around I would try to remove it altogether. > > > > > > But this schedule_timeout_uninterruptible(1) serves as a guaranteed > > > sleep for PF_WQ_WORKER threads > > > ( http://lkml.kernel.org/r/20170830064019.mfihbeu3mm5ygcrb@dhcp22.suse.cz ). > > > > > > > If we are under memory pressure, __zone_watermark_ok() can return false. > > > > If __zone_watermark_ok() == false, when is schedule_timeout_*() called explicitly? > > > > > > If all zones fail with the watermark check then we should hit the oom > > > path and sleep there. We do not do so for all cases though. > > > > > > Thus, you cannot simply remove it. > > > > Then I would rather make should_reclaim_retry more robust. > > I'm OK with that if we can remove schedule_timeout_*() with oom_lock held. > > > > > > > Also, your changelog silently skips over some important details. The > > > > system must be really overloaded when a short sleep can take minutes. > > > > > > Yes, the system was really overloaded, for I was testing below reproducer > > > on a x86_32 kernel. > > [...] > > > > I would trongly suspect that such an overloaded system doesn't need > > > > a short sleep to hold the oom lock for too long. All you need is to be > > > > preempted. So this patch doesn't really solve any _real_ problem. > > > > > > Preemption makes the OOM situation much worse. The only way to avoid all OOM > > > lockups caused by lack of CPU resource is to replace mutex_trylock(&oom_lock) > > > in __alloc_pages_may_oom() with mutex_lock(&oom_lock) (or similar) in order to > > > guarantee that all threads waiting for the OOM killer/reaper to make forward > > > progress shall give enough CPU resource. > > > > And how exactly does that help when the page allocator gets preempted by > > somebody not doing any allocation? > > The page allocator is not responsible for wasting CPU resource for something > other than memory allocation request. Wasting CPU resource due to unable to > allow the OOM killer/reaper to make forward progress is the page allocator's > bug. > > There are currently ways to artificially choke the OOM killer/reaper (e.g. let > a SCHED_IDLE priority thread which is allowed to run on only one specific CPU > invoke the OOM killer). To mitigate it, offloading the OOM killer to a dedicated > kernel thread (like the OOM reaper) which has reasonable scheduling priority and > is allowed to run on any idle CPU will help. But such enhancement is out of scope > for this patch. This patch is intended for avoid sleeping for minutes at > schedule_timeout_killable(1) with oom_lock held which can be reproduced without > using SCHED_IDLE priority and/or runnable CPU restrictions. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org