From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7353C04EB9 for ; Mon, 3 Dec 2018 20:12:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8A80F2087F for ; Mon, 3 Dec 2018 20:12:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8A80F2087F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726025AbeLCUMU (ORCPT ); Mon, 3 Dec 2018 15:12:20 -0500 Received: from mx1.redhat.com ([209.132.183.28]:32842 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725913AbeLCUMT (ORCPT ); Mon, 3 Dec 2018 15:12:19 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3CCB63082125; Mon, 3 Dec 2018 20:12:18 +0000 (UTC) Received: from sky.random (ovpn-122-73.rdu2.redhat.com [10.10.122.73]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0DC82194AE; Mon, 3 Dec 2018 20:12:15 +0000 (UTC) Date: Mon, 3 Dec 2018 15:12:14 -0500 From: Andrea Arcangeli To: Linus Torvalds Cc: mhocko@kernel.org, ying.huang@intel.com, s.priebe@profihost.ag, mgorman@techsingularity.net, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, David Rientjes , kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu, Vlastimil Babka Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression Message-ID: <20181203201214.GB3540@redhat.com> References: <20181127205737.GI16136@redhat.com> <87tvk1yjkp.fsf@yhuang-dev.intel.com> <20181203181456.GK31738@dhcp22.suse.cz> <20181203183050.GL31738@dhcp22.suse.cz> <20181203185954.GM31738@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.0 (2018-11-25) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Mon, 03 Dec 2018 20:12:18 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 03, 2018 at 11:28:07AM -0800, Linus Torvalds wrote: > On Mon, Dec 3, 2018 at 10:59 AM Michal Hocko wrote: > > > > You are misinterpreting my words. I haven't dismissed anything. I do > > recognize both usecases under discussion. > > > > I have merely said that a better THP locality needs more work and during > > the review discussion I have even volunteered to work on that. > > We have two known patches that seem to have no real downsides. > > One is the patch posted by Andrea earlier in this thread, which seems > to target just this known regression. For the short term the important thing is to fix the VM regression one way or another, I don't personally mind which way. > The other seems to be to revert commit ac5b2c1891 and instead apply > > https://lore.kernel.org/lkml/alpine.DEB.2.21.1810081303060.221006@chino.kir.corp.google.com/ > > which also seems to be sensible. In my earlier review of David's patch, it looked runtime equivalent to the __GFP_COMPACT_ONLY solution. It has the only advantage of adding a new gfpflag until we're sure we need it but it's the worst solution available for the long term in my view. It'd be ok to apply it as stop-gap measure though. The "order == pageblock_order" hardcoding inside the allocator to workaround the __GFP_THISNODE flag passed from outside the allocator in the THP MADV_HUGEPAGE case, didn't look very attractive because it's not just THP allocating order >0 pages. It'd be nicer if whatever compaction latency optimization that applies to THP could also apply to all other allocation orders too and the hardcoding of the THP order prevents that. On the same lines if __GFP_THISNODE is so badly needed by MADV_HUGEPAGE, all other larger order allocations should also be able to take advantage of __GFP_THISNODE without ending in the same VM corner cases that required the "order == pageblock_order" hardcoding inside the allocator. If you prefer David's patch I would suggest pageblock_order to be replaced with HPAGE_PMD_ORDER so it's more likely to match the THP order in all archs. Thanks, Andrea