From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E3CEC54FC9 for ; Tue, 21 Apr 2020 12:47:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 18CF820753 for ; Tue, 21 Apr 2020 12:47:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 18CF820753 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8EE8A8E000B; Tue, 21 Apr 2020 08:47:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 89ED68E0003; Tue, 21 Apr 2020 08:47:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B43A8E000B; Tue, 21 Apr 2020 08:47:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0080.hostedemail.com [216.40.44.80]) by kanga.kvack.org (Postfix) with ESMTP id 62D978E0003 for ; Tue, 21 Apr 2020 08:47:17 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 14F3F8248D52 for ; Tue, 21 Apr 2020 12:47:17 +0000 (UTC) X-FDA: 76731837714.06.fang81_226dfdbd7aa4e X-HE-Tag: fang81_226dfdbd7aa4e X-Filterd-Recvd-Size: 3996 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Tue, 21 Apr 2020 12:47:16 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id F291FAD79; Tue, 21 Apr 2020 12:47:13 +0000 (UTC) Subject: Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user To: Will Deacon , Prathu Baronia Cc: catalin.marinas@arm.com, alexander.duyck@gmail.com, chintan.pandya@oneplus.com, mhocko@suse.com, akpm@linux-foundation.org, linux-mm@kvack.org, gregkh@linuxfoundation.com, gthelen@google.com, jack@suse.cz, ken.lin@oneplus.com, gasine.xu@oneplus.com, ying.huang@intel.com, mark.rutland@arm.com References: <20200414153829.GA15230@oneplus.com> <87r1wpzavo.fsf@yhuang-dev.intel.com> <20200419155856.dtwxomdkyujljdfi@oneplus.com> <87k12bt3ff.fsf@yhuang-dev.intel.com> <20200421093621.3fuptvf2qbyfzwfz@oneplus.com> <20200421100932.GC17256@willie-the-truck> From: Vlastimil Babka Message-ID: Date: Tue, 21 Apr 2020 14:47:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200421100932.GC17256@willie-the-truck> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/21/20 12:09 PM, Will Deacon wrote: > On Tue, Apr 21, 2020 at 03:06:21PM +0530, Prathu Baronia wrote: >> With below v2 patch we observe a significantly(~65%) improved zeroing time for >> hugepages. > > What patch? I assume you mean: > > https://lore.kernel.org/linux-mm/20200414153829.GA15230@oneplus.com/ > > but you've trimmed all the details! > >> We profiled the clear_huge_page() using ftrace on Qualcomm's SM8150 platform >> under controlled conditions(i.e. only CPU0 and 6 turned on and set to max >> frequency, and DDR set to performance governor). >> >> The existing method uses a reverse traversal of a section of a hugepage which >> based on our series of experiments proves slower than a oneshot(v2) approach on >> ARM64.(more details in mail thread) >> >> We didn't see any benefit on x86 so v2 probably won't find any place in the main >> memory.c code. > > Do you know why you don't see any benefit on x86? It seems unusual that > something like this would vary so wildly between two modern architectures. > I'd like to understand what's going on. It was suspected that current Intel can prefetch forward and backwards, and the tested ARM64 microarchitecture only backwards, can it be true? The current code does clearing backwards. >> We are currently thinking of making this optimization ARM64 specific for better >> performance by placing this in arch/arm64/mm/memory.c(to be created) file. We >> would really appreciate if you can share your opinion on this. > > There's no need for arch-specific optimisation. Please do it in core code, > and allow architectures to opt-out if necessary. That means you probably > need to respond to: > > https://lore.kernel.org/linux-mm/20200417074851.GE26326@shao2-debian/ Note that this can be also viewed differently. It was commit c79b57e462b5 ("mm: hugetlb: clear target sub-page last when clearing huge page") that introduced the existing implementation, based on x86 numbers and probably the same test that generated the regression report. It's likely that said commit thus regressed arm64. In that case the generic implementation should be just reverted to be simple and not assume any (micro)architectural details. If any architecture wants an optimized version they could add it opt-in, and justifify it by using real workloads, not microbenchmarks. > because that doesn't look as rosy as the numbers you're seeing. > > Thanks, > > Will >