From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,MIME_QP_LONG_LINE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E846C4363A for ; Mon, 26 Oct 2020 19:14:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B15452085B for ; Mon, 26 Oct 2020 19:14:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ekmGaHbf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1792683AbgJZTOG (ORCPT ); Mon, 26 Oct 2020 15:14:06 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:37355 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1737870AbgJZTJv (ORCPT ); Mon, 26 Oct 2020 15:09:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603739388; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GRCWYmAWusObW3DmkzU8vx+MJL5tF/gHRsDlDMiPyg4=; b=ekmGaHbfCbHEUfIrsRa9b71sRk0ToC7eTt/XW+ok1Hz09Y6tDYia+2VvE3WSMRjfB7ssuT e0Oj9pgeQms1eWvtPMZsYVjiw0fkneg0tTaptufhhOL0Z5kSaxEmLllcCcKzDWKZ78bdeV tZoSh2t3KWLx6qufpXJcNnMwbOT1Ceg= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-204-jBf1HKB-Oamf1jy_jvpfvg-1; Mon, 26 Oct 2020 15:09:47 -0400 X-MC-Unique: jBf1HKB-Oamf1jy_jvpfvg-1 Received: by mail-ej1-f70.google.com with SMTP id lf18so5869801ejb.13 for ; Mon, 26 Oct 2020 12:09:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=GRCWYmAWusObW3DmkzU8vx+MJL5tF/gHRsDlDMiPyg4=; b=F6JLgrUkD+YTmFOOmYni2v9tX7ELQ4Gh5ZBGgBV3lgCzkhy14NbiqjDvbFV1KkW5p5 RQUUxu7UAAeOcM1S4Qh+pIAnunTSjaBpyHbLKFg+5W8Gzxwg+zfc4lzlvV65y5+oL3bf WzVLtsXiqLOndC8zKzoMHEniTCEX5jCL9dbmche8OkSpucakEM6lLRUEcJFT9SueHvWq YNtW9uYMQLEIOthe+F7YldFxLXT1poYsA3AxktPsPLT2ObRktxqH23r8ij9VcaPRKC3q a+rMZutOmIDZwr7brLXFgupEun+rCmdEBqKlnwZvb3gK8r3VEP3PIsoQ7ioYY6RJe9TM c2ig== X-Gm-Message-State: AOAM530Qcnf03kxoqpQe3a3PrfNVawgE7MyNVHF2hqL6gin/1tEUUYNl smUmmq+gTAcL57pbv6my2q+bmoKD2e7FaRdDaymKDSFU5EMlCp4jOFESc5+2pcA4Dbvw9yxhdN2 Ejf5eFTGY/aczaSH+f0+jAqhD X-Received: by 2002:a17:906:354e:: with SMTP id s14mr16798389eja.192.1603739385287; Mon, 26 Oct 2020 12:09:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJznkiGK/Teq401W0nYH6JzAXZopru82dXZyVaTY6B+fMBJZqPrTlL20E6eCMzLCLBggU7oWzQ== X-Received: by 2002:a17:906:354e:: with SMTP id s14mr16798339eja.192.1603739384950; Mon, 26 Oct 2020 12:09:44 -0700 (PDT) Received: from [192.168.3.114] (p5b0c6a8b.dip0.t-ipconnect.de. [91.12.106.139]) by smtp.gmail.com with ESMTPSA id 11sm6384639ejy.19.2020.10.26.12.09.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 26 Oct 2020 12:09:44 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [mm/page_alloc] 7fef431be9: vm-scalability.throughput 87.8% improvement Date: Mon, 26 Oct 2020 20:09:43 +0100 Message-Id: <494C73E0-452D-4503-8ED6-DAE11A8471E5@redhat.com> References: Cc: David Hildenbrand , David Rientjes , kernel test robot , Kevin Ko , Linus Torvalds , Andrew Morton , Vlastimil Babka , Oscar Salvador , Wei Yang , Pankaj Gupta , Michal Hocko , Alexander Duyck , Mel Gorman , Dave Hansen , Mike Rapoport , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Wei Liu , Matthew Wilcox , Michael Ellerman , Michal Hocko , Scott Cheloha , LKML , lkp@lists.01.org, lkp@intel.com, ying.huang@intel.com, feng.tang@intel.com, zhengjun.xing@intel.com In-Reply-To: To: Axel Rasmussen X-Mailer: iPhone Mail (18A393) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Am 26.10.2020 um 19:11 schrieb Axel Rasmussen : >=20 > =EF=BB=BFOn Mon, Oct 26, 2020 at 1:31 AM David Hildenbrand wrote: >>=20 >>> On 23.10.20 21:44, Axel Rasmussen wrote: >>> On Fri, Oct 23, 2020 at 12:29 PM David Rientjes wr= ote: >>>>=20 >>>> On Wed, 21 Oct 2020, kernel test robot wrote: >>>>=20 >>>>> Greeting, >>>>>=20 >>>>> FYI, we noticed a 87.8% improvement of vm-scalability.throughput due t= o commit: >>>>>=20 >>>>>=20 >>>>> commit: 7fef431be9c9ac255838a9578331567b9dba4477 ("mm/page_alloc: plac= e pages to tail in __free_pages_core()") >>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master= >>>>>=20 >>>>>=20 >>>>> in testcase: vm-scalability >>>>> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30= GHz with 192G memory >>>>> with following parameters: >>>>>=20 >>>>> runtime: 300s >>>>> size: 512G >>>>> test: anon-wx-rand-mt >>>>> cpufreq_governor: performance >>>>> ucode: 0x5002f01 >>>>>=20 >>>>> test-description: The motivation behind this suite is to exercise func= tions and regions of the mm/ of the Linux kernel which are of interest to us= . >>>>> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalabil= ity.git/ >>>>>=20 >>>>=20 >>>> I'm curious why we are not able to reproduce this improvement on Skylak= e >>>> and actually see a slight performance degradation, at least for >>>> 300s_128G_truncate_throughput. >>>>=20 >>>> Axel Rasmussen can provide more details on o= ur >>>> results. >>>=20 >>> Right, our results show a slight regression on a Skylake machine [1], >>> and a slight performance increase on a Rome machine [2]. For these >>> tests, I used Linus' v5.9 tag as a baseline, and then applied this >>> patchset onto that tag as a test kernel (the patches applied cleanly >>> besides one comment, I didn't have to do any code fixups). This is >>> running the same anon-wx-rand-mt test defined in the upstream >>> lkp-tests job file: >>> https://github.com/intel/lkp-tests/blob/master/jobs/vm-scalability.yaml >>=20 >> Hi, >>=20 >> looking at the yaml, am I right that each test is run after a fresh boot?= >=20 > Yes-ish. For the results I posted, the larger context would have been > something like: >=20 > - Kernel installed, machine freshly rebooted. > - Various machine management daemons start by default, some are > stopped so as not to interfere with the test. > - Some packages are installed on the machine (the thing which > orchestrates the testing in particular). > - The test is run. >=20 > So, the machine is somewhat fresh in the sense that it hasn't been > e.g. serving production traffic just before running the test, but it's > also not as clean as it could be. It seems plausible this difference > explains the difference in the results (I'm not too familiar with how > the Intel kernel test robot is implemented). Ah, okay. So most memory in the system is indeed untouched. Thanks! >=20 >>=20 >> As I already replied to David, this patch merely changes the initial >> order of the freelists. The general end result is that lower memory >> addresses will be allocated before higher memory addresses will be >> allocated - within a zone, the first time memory is getting allocated. >> Before, it was the other way around. Once a system ran for some time, >> freelists are randomized. >>=20 >> There might be benchmarks/systems where this initial system state might >> now be better suited - or worse. It doesn't really tell you that core-mm >> is behaving better/worse now - it merely means that the initial system >> state under which the benchmark was started affected the benchmark. >>=20 >> Looks like so far there is one benchmark+system where it's really >> beneficial, there is one benchmark+system where it's slightly >> beneficial, and one benchmark+system where there is a slight regression. >>=20 >>=20 >> Something like the following would revert to the previous behavior: >>=20 >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 23f5066bd4a5..fac82420cc3d 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1553,7 +1553,9 @@ void __free_pages_core(struct page *page, unsigned >> int order) >> * Bypass PCP and place fresh pages right to the tail, primarily >> * relevant for memory onlining. >> */ >> - __free_pages_ok(page, order, FPI_TO_TAIL); >> + __free_pages_ok(page, order, >> + system_state < SYSTEM_RUNNING ? FPI_NONE : >> + FPI_TO_TAIL); >> } >>=20 >> #ifdef CONFIG_NEED_MULTIPLE_NODES >>=20 >>=20 >> (Or better, passing the expected behavior via MEMINIT_EARLY/... to >> __free_pages_core().) >>=20 >>=20 >> But then, I am not convinced we should perform that change: having a >> clean (initial) state might be true for these benchmarks, but it's far >> from reality. The change in numbers doesn't show you that core-mm is >> operating better/worse, just that the baseline for you tests changed due >> to a changed initial system state. >=20 > Not to put words in David's mouth :) but at least from my perspective, > our original interest was "wow, an 87% improvement! maybe we should > deploy this patch to production!", and I'm mostly sharing my results > just to say "it actually doesn't seem to be a huge *general* > improvement", rather than to advocate for further changes / fixes. Ah, yes, I saw the +87% and thought =E2=80=9Ethat can=E2=80=98t be right=E2=80= =9C. > IIUC the original motivation for this patch was to fix somewhat of an > edge case, not to make a very general improvement, so this seems fine. >=20 Exactly.=