From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18A32C04EB9 for ; Thu, 6 Dec 2018 00:54:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B8E8D208E7 for ; Thu, 6 Dec 2018 00:54:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B8E8D208E7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728260AbeLFAya (ORCPT ); Wed, 5 Dec 2018 19:54:30 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50012 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727358AbeLFAya (ORCPT ); Wed, 5 Dec 2018 19:54:30 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 03D283082130; Thu, 6 Dec 2018 00:54:29 +0000 (UTC) Received: from sky.random (ovpn-122-73.rdu2.redhat.com [10.10.122.73]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6F228604DA; Thu, 6 Dec 2018 00:54:26 +0000 (UTC) Date: Wed, 5 Dec 2018 19:54:25 -0500 From: Andrea Arcangeli To: David Rientjes Cc: Linus Torvalds , mgorman@techsingularity.net, Vlastimil Babka , mhocko@kernel.org, ying.huang@intel.com, s.priebe@profihost.ag, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression Message-ID: <20181206005425.GB21159@redhat.com> References: <20181203201214.GB3540@redhat.com> <64a4aec6-3275-a716-8345-f021f6186d9b@suse.cz> <20181204104558.GV23260@techsingularity.net> <20181205204034.GB11899@redhat.com> <20181205233632.GE11899@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.0 (2018-11-25) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Thu, 06 Dec 2018 00:54:29 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 05, 2018 at 04:18:14PM -0800, David Rientjes wrote: > On Wed, 5 Dec 2018, Andrea Arcangeli wrote: > > > __GFP_COMPACT_ONLY gave an hope it could give some middle ground but > > it shows awful compaction results, it basically destroys compaction > > effectiveness and we know why (COMPACT_SKIPPED must call reclaim or > > compaction can't succeed because there's not enough free memory in the > > node). If somebody used MADV_HUGEPAGE compaction should still work and > > not fail like that. Compaction would fail to be effective even in the > > local node where __GFP_THISNODE didn't fail. Worst of all it'd fail > > even on non-NUMA systems (that would be easy to fix though by making > > the HPAGE_PMD_ORDER check conditional to NUMA being enabled at > > runtime). > > > > Note that in addition to COMPACT_SKIPPED that you mention, compaction can > fail with COMPACT_COMPLETE, meaning the full scan has finished without > freeing a hugepage, or COMPACT_DEFERRED, meaning that doing another scan > is unlikely to produce a different result. COMPACT_SKIPPED makes sense to > do reclaim if it can become accessible to isolate_freepages() and > hopefully another allocator does not allocate from these newly freed pages > before compaction can scan the zone again. For COMPACT_COMPLETE and > COMPACT_DEFERRED, reclaim is unlikely to ever help. The COMPACT_COMPLETE and (COMPACT_PARTIAL_SKIPPED for that matter) seems just a mistake in the max() evaluation try_to_compact_pages() that let it return COMPACT_COMPLETE and COMPACT_PARTIAL_SKIPPED. I think it should just return COMPACT_DEFERRED in those two cases and it should be enforced forced for all prio. There are really only 3 cases that matter for the caller: 1) succeed -> we got the page 2) defer -> we failed (caller won't care about why) 3) skipped -> failed because not enough 4k freed -> reclaim must be invoked then compaction can be retried PARTIAL_SKIPPED/COMPLETE both fall into 2) above so for the caller they should be treated the same way. It doesn't seem very concerning that it may try like if it succeeded and do a spurious single reclaim invocation, but it's good to fix this and take the COMPACT_DEFERRED nopage path in the __GFP_NORETRY case. From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============5349879510940860440==" MIME-Version: 1.0 From: Andrea Arcangeli To: lkp@lists.01.org Subject: Re: [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression Date: Wed, 05 Dec 2018 19:54:25 -0500 Message-ID: <20181206005425.GB21159@redhat.com> In-Reply-To: List-Id: --===============5349879510940860440== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, Dec 05, 2018 at 04:18:14PM -0800, David Rientjes wrote: > On Wed, 5 Dec 2018, Andrea Arcangeli wrote: > = > > __GFP_COMPACT_ONLY gave an hope it could give some middle ground but > > it shows awful compaction results, it basically destroys compaction > > effectiveness and we know why (COMPACT_SKIPPED must call reclaim or > > compaction can't succeed because there's not enough free memory in the > > node). If somebody used MADV_HUGEPAGE compaction should still work and > > not fail like that. Compaction would fail to be effective even in the > > local node where __GFP_THISNODE didn't fail. Worst of all it'd fail > > even on non-NUMA systems (that would be easy to fix though by making > > the HPAGE_PMD_ORDER check conditional to NUMA being enabled at > > runtime). > > = > = > Note that in addition to COMPACT_SKIPPED that you mention, compaction can = > fail with COMPACT_COMPLETE, meaning the full scan has finished without = > freeing a hugepage, or COMPACT_DEFERRED, meaning that doing another scan = > is unlikely to produce a different result. COMPACT_SKIPPED makes sense t= o = > do reclaim if it can become accessible to isolate_freepages() and = > hopefully another allocator does not allocate from these newly freed page= s = > before compaction can scan the zone again. For COMPACT_COMPLETE and = > COMPACT_DEFERRED, reclaim is unlikely to ever help. The COMPACT_COMPLETE and (COMPACT_PARTIAL_SKIPPED for that matter) seems just a mistake in the max() evaluation try_to_compact_pages() that let it return COMPACT_COMPLETE and COMPACT_PARTIAL_SKIPPED. I think it should just return COMPACT_DEFERRED in those two cases and it should be enforced forced for all prio. There are really only 3 cases that matter for the caller: 1) succeed -> we got the page 2) defer -> we failed (caller won't care about why) 3) skipped -> failed because not enough 4k freed -> reclaim must be invoked= then compaction can be retried PARTIAL_SKIPPED/COMPLETE both fall into 2) above so for the caller they should be treated the same way. It doesn't seem very concerning that it may try like if it succeeded and do a spurious single reclaim invocation, but it's good to fix this and take the COMPACT_DEFERRED nopage path in the __GFP_NORETRY case. --===============5349879510940860440==--