From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39935C433DB for ; Fri, 26 Mar 2021 15:36:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F367B61A24 for ; Fri, 26 Mar 2021 15:36:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230352AbhCZPg3 (ORCPT ); Fri, 26 Mar 2021 11:36:29 -0400 Received: from mx2.suse.de ([195.135.220.15]:42942 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230311AbhCZPgD (ORCPT ); Fri, 26 Mar 2021 11:36:03 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1616772962; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=F5KU6VylMRMyZvTyqPo/BHGYNwWheIqROBxQ4/DH4+8=; b=Y3ouveCx9Q+ASuaEpDN67hcpfdPanfqzzoghiKgdifFuk0NGyc7ynIPTnhYmNm5H2ouZyF QqaTv6fuxtmBOkrziGCtakHrFA2Z405U45W9qp38NhOVs3CL+s2ts7G/DvU9OldkwxnYGm j9OVL85tEnC4zxCm5yEtQbPGMyOzy0I= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 20121AC6A; Fri, 26 Mar 2021 15:36:02 +0000 (UTC) Date: Fri, 26 Mar 2021 16:36:01 +0100 From: Michal Hocko To: Aaron Tomlin Cc: linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Vlastimil Babka Subject: Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress Message-ID: References: <20210315165837.789593-1-atomlin@redhat.com> <20210319172901.cror2u53b7caws3a@ava.usersys.com> <20210325210159.r565fvfitoqeuykp@ava.usersys.com> <20210326112254.jy5jkiwtgj3pqkt2@ava.usersys.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210326112254.jy5jkiwtgj3pqkt2@ava.usersys.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 26-03-21 11:22:54, Aaron Tomlin wrote: [...] > > Both reclaim and compaction maintain their own retries counters as they > > are targeting a different operation. Although the compaction really > > depends on the reclaim to do some progress. > > Yes. Looking at should_compact_retry() if the last known compaction result > was skipped i.e. suggesting there was not enough order-0 pages to support > compaction, so assistance is needed from reclaim > (see __compaction_suitable()). > > I noticed that the value of compaction_retries, compact_result and > compact_priority was 0, COMPACT_SKIPPED and 1 i.e. COMPACT_PRIO_SYNC_LIGHT, > respectively. > > > OK, this sound unexpected as it indicates that the reclaim is able to > > make a forward progress but compaction doesn't want to give up and keeps > > retrying. Are you able to reproduce this or could you find out which > > specific condition keeps compaction retrying? I would expect that it is > > one of the 3 conditions before the max_retries is checked. > > Unfortunately, I have been told it is not entirely reproducible. > I suspect it is the following in should_compact_retry() - as I indicated > above the last known value stored in compaction_retries was 0: > > > if (order > PAGE_ALLOC_COSTLY_ORDER) > max_retries /= 4; > if (*compaction_retries <= max_retries) { > ret = true; > goto out; > } OK, I kinda expected this would be not easily reproducible. The reason I dislike your patch is that it addes yet another criterion for oom while we already do have 2 which doesn't make the resulting code easier to reason about. We should be focusing on the compaction retry logic and see whether we can have some "run away" scenarios there. Seeing so many retries without compaction bailing out sounds like a bug in that retry logic. Vlastimil is much more familiar with that. -- Michal Hocko SUSE Labs