From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73794C433B4 for ; Wed, 19 May 2021 11:10:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0447A6101E for ; Wed, 19 May 2021 11:10:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0447A6101E Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 896C86B0036; Wed, 19 May 2021 07:10:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 846DD6B006C; Wed, 19 May 2021 07:10:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C0956B006E; Wed, 19 May 2021 07:10:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0009.hostedemail.com [216.40.44.9]) by kanga.kvack.org (Postfix) with ESMTP id 35E8F6B0036 for ; Wed, 19 May 2021 07:10:36 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CA81387F8 for ; Wed, 19 May 2021 11:10:35 +0000 (UTC) X-FDA: 78157712430.18.B22A69C Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf01.hostedemail.com (Postfix) with ESMTP id 9249B50018A8 for ; Wed, 19 May 2021 11:10:34 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1621422634; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XdXzje8WfXzeezrDAcUDXmoA/xW1OT3HQqZBYc3f9WU=; b=lC/LFpTqj9VIQrfUtkdAO0ZEmb4dFq9CWroO3B4O2ug6IrgqNFAJB9YUsiJpj9sDT0ukXR FX5u9y8cspaQWrtu+3Rii1yUNPLwL/SosRhLFX+ROjlZcRoYDw4hPcJcEgQ/uITuV7Ln12 /p5xZx8KNEsqf5Wp3wSirx8nUPb5WpU= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 48404AEC6; Wed, 19 May 2021 11:10:34 +0000 (UTC) Date: Wed, 19 May 2021 13:10:33 +0200 From: Michal Hocko To: Aaron Tomlin Cc: linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Vlastimil Babka Subject: Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress Message-ID: References: <20210315165837.789593-1-atomlin@redhat.com> <20210319172901.cror2u53b7caws3a@ava.usersys.com> <20210325210159.r565fvfitoqeuykp@ava.usersys.com> <20210326112254.jy5jkiwtgj3pqkt2@ava.usersys.com> <20210518140554.dwan66i4ttmzw4hj@ava.usersys.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210518140554.dwan66i4ttmzw4hj@ava.usersys.com> X-Rspamd-Queue-Id: 9249B50018A8 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="lC/LFpTq"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf01.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Rspamd-Server: rspam03 X-Stat-Signature: mrsbyb4cq8aq5iof9zfdzqn4f9oqx8qo X-HE-Tag: 1621422634-594137 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 18-05-21 15:05:54, Aaron Tomlin wrote: > Michal, > > On Fri 2021-03-26 16:36 +0100, Michal Hocko wrote: > > OK, I kinda expected this would be not easily reproducible. > > Unfortunately, I'm still waiting for feedback on this. > > > We should be focusing on the compaction retry logic and see whether we > > can have some "run away" scenarios there. Seeing so many retries without > > compaction bailing out sounds like a bug in that retry logic. > > I suspect so. > > This is indeed a case of excessive reclaim/compaction retries (i.e. the > last known value stored in the no_progress_loops variable was 31,611,688). > > What might be particularly unique about this situation is that a fatal > signal was found pending. In this context, if I understand correctly, it > does not make sense to retry compaction when the last known compact result > was skipped and a fatal signal is pending. OK, this might be an interesting lead. > Looking at try_to_compact_pages(), indeed COMPACT_SKIPPED can be returned; > albeit, not every zone, on the zone list, would be considered in the case > a fatal signal is found to be pending. Yet, in should_compact_retry(), > given the last known compaction result, each zone, on the zone list, can be > considered/or checked (see compaction_zonelist_suitable()). If a zone e.g. > was found to succeed then reclaim/compaction would be tried again > (notwithstanding the above). I believe Vlastimil would be much better fit into looking into those details but it smells like pending fatal signals can lead to a unbound retry indeed. Thanks! -- Michal Hocko SUSE Labs