From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 144B6C4332B for ; Thu, 19 Mar 2020 07:09:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D4E9F20714 for ; Thu, 19 Mar 2020 07:09:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D4E9F20714 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6A1466B0003; Thu, 19 Mar 2020 03:09:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 651CA6B0005; Thu, 19 Mar 2020 03:09:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 567206B0006; Thu, 19 Mar 2020 03:09:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0176.hostedemail.com [216.40.44.176]) by kanga.kvack.org (Postfix) with ESMTP id 3B77C6B0003 for ; Thu, 19 Mar 2020 03:09:16 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D2C7A824556B for ; Thu, 19 Mar 2020 07:09:15 +0000 (UTC) X-FDA: 76611235470.30.road46_6df44b02d5c12 X-HE-Tag: road46_6df44b02d5c12 X-Filterd-Recvd-Size: 5591 Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Thu, 19 Mar 2020 07:09:15 +0000 (UTC) Received: by mail-wr1-f68.google.com with SMTP id h4so1345535wru.2 for ; Thu, 19 Mar 2020 00:09:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5EdXwbHlmoDhjXzrHF/yteEF/2NqvDtwOOg1hPGy1hY=; b=CtuHW0M3mQC3hhyfGMvf8xnaLG5QC/dYKjm2BNDC3jg5tRpeRTxKEwppYzVqdUQCB7 eTY+oG2lPcjzEBMZz6KRiXefN8enf0nseHBz1lsJ+ppwAyzpm82qkEx3hQvrmxWH/NvW QdDuQrJ3cXp1ube6BmpvMbOwhYJBKZLFhiG4kLsfbAGJswIkgxsXV81AYRTxTVEnDJDu Y1savDKbkiew10CNbtMVlypm8MBx5wCLU+zqBxssZpl7YfSdv7l8fHmtqsYHHNie8gUQ 9Hvzce5r9ZA6GyaiT66q170iejDs/NGJzCyMaxedvfyeQR9YwD3utLLTWUHNfs0WmLeE lALw== X-Gm-Message-State: ANhLgQ0IOIxZdcbIa1y8ThlE8OMjhKAOz8aKBwUXDNqzyfwQfveYj3H/ XH6l3RWJn5XasQ5u7CcdyS8= X-Google-Smtp-Source: ADFU+vuMB+9CO0Bl+u+dYlKaD0f4Vcgw8SbDFNPTXAMPt8ki60UKv2knVQgWQ58kmN0nw3+McZJ+HA== X-Received: by 2002:a5d:6ca7:: with SMTP id a7mr2398248wra.157.1584601754367; Thu, 19 Mar 2020 00:09:14 -0700 (PDT) Received: from localhost (ip-37-188-140-107.eurotel.cz. [37.188.140.107]) by smtp.gmail.com with ESMTPSA id a184sm1820443wmf.29.2020.03.19.00.09.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Mar 2020 00:09:12 -0700 (PDT) Date: Thu, 19 Mar 2020 08:09:11 +0100 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Tetsuo Handa , Vlastimil Babka , Robert Kolchmeyer , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch v3] mm, oom: prevent soft lockup on memcg oom for UP systems Message-ID: <20200319070911.GU21362@dhcp22.suse.cz> References: <8395df04-9b7a-0084-4bb5-e430efe18b97@i-love.sakura.ne.jp> <202003170318.02H3IpSx047471@www262.sakura.ne.jp> <20200318094219.GE21362@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 18-03-20 15:03:52, David Rientjes wrote: > When a process is oom killed as a result of memcg limits and the victim > is waiting to exit, nothing ends up actually yielding the processor back > to the victim on UP systems with preemption disabled. Instead, the > charging process simply loops in memcg reclaim and eventually soft > lockups. > > For example, on an UP system with a memcg limited to 100MB, if three > processes each charge 40MB of heap with swap disabled, one of the charging > processes can loop endlessly trying to charge memory which starves the oom > victim. This only happens if there is no reclaimable memory in the hierarchy. That is a very specific condition. I do not see any other way than having a misconfigured system with min protection preventing any reclaim. Otherwise we have cond_resched both in slab shrinking code (do_shrink_slab) and LRU shrinking shrink_lruvec. If I am wrong and those are insufficient then please be explicit about the scenario. This is a very important information to have in the changelog! [...] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1576,6 +1576,12 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > */ > ret = should_force_charge() || out_of_memory(&oc); > mutex_unlock(&oom_lock); > + /* > + * Give a killed process a good chance to exit before trying to > + * charge memory again. > + */ > + if (ret) > + schedule_timeout_killable(1); Why are you making this conditional? Say that there is no victim to kill. The charge path would simply bail out and it would really depend on the call chain whether there is a scheduling point or not. Isn't it simply safer to call schedule_timeout_killable unconditioanlly at this stage? > return ret; > } > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3861,6 +3861,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > } > out: > mutex_unlock(&oom_lock); > + /* > + * Give a killed process a good chance to exit before trying to > + * allocate memory again. > + */ > + if (*did_some_progress) > + schedule_timeout_killable(1); This doesn't make much sense either. Please remember that the primary reason you are adding this schedule_timeout_killable in this path is because you want to somehow reduce the priority inversion problem mentioned by Tetsuo. Because the page allocator path doesn't lack regular scheduling points - compaction, reclaim and should_reclaim_retry etc have them. > return page; > } > -- Michal Hocko SUSE Labs