From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FCEEC2BA19 for ; Tue, 21 Apr 2020 10:09:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0F06B2087E for ; Tue, 21 Apr 2020 10:09:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="q5JOJ3xO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0F06B2087E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 806708E0005; Tue, 21 Apr 2020 06:09:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B6628E0003; Tue, 21 Apr 2020 06:09:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CC048E0005; Tue, 21 Apr 2020 06:09:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id 525AE8E0003 for ; Tue, 21 Apr 2020 06:09:40 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id CD2D78248D51 for ; Tue, 21 Apr 2020 10:09:39 +0000 (UTC) X-FDA: 76731440478.03.event02_716d76e63de1e X-HE-Tag: event02_716d76e63de1e X-Filterd-Recvd-Size: 3424 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Tue, 21 Apr 2020 10:09:39 +0000 (UTC) Received: from willie-the-truck (236.31.169.217.in-addr.arpa [217.169.31.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DF4E92076C; Tue, 21 Apr 2020 10:09:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587463778; bh=Jc8LU9CEPO/+Qhmih6iCRSE5tcfC/uQy+UC1GgTBhE8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=q5JOJ3xO2tk6oh0ppocN82t8DN4nA1wem7trUzLnBNbM7oAaQvCNs/hm9B4vTUjgG uKtriUD/2vQZnJHFe3hFAd2tTq3qrWPiZmC+Fmm6piAMidv63qYWsYhRF1w0unEF5B AZZPHbxUNm84KzrqKKCRAjtYldNaoHJ26SmC/Bjs= Date: Tue, 21 Apr 2020 11:09:32 +0100 From: Will Deacon To: Prathu Baronia Cc: catalin.marinas@arm.com, alexander.duyck@gmail.com, chintan.pandya@oneplus.com, mhocko@suse.com, akpm@linux-foundation.org, linux-mm@kvack.org, gregkh@linuxfoundation.com, gthelen@google.com, jack@suse.cz, ken.lin@oneplus.com, gasine.xu@oneplus.com, ying.huang@intel.com, mark.rutland@arm.com Subject: Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user Message-ID: <20200421100932.GC17256@willie-the-truck> References: <20200414153829.GA15230@oneplus.com> <87r1wpzavo.fsf@yhuang-dev.intel.com> <20200419155856.dtwxomdkyujljdfi@oneplus.com> <87k12bt3ff.fsf@yhuang-dev.intel.com> <20200421093621.3fuptvf2qbyfzwfz@oneplus.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200421093621.3fuptvf2qbyfzwfz@oneplus.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000220, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 21, 2020 at 03:06:21PM +0530, Prathu Baronia wrote: > With below v2 patch we observe a significantly(~65%) improved zeroing time for > hugepages. What patch? I assume you mean: https://lore.kernel.org/linux-mm/20200414153829.GA15230@oneplus.com/ but you've trimmed all the details! > We profiled the clear_huge_page() using ftrace on Qualcomm's SM8150 platform > under controlled conditions(i.e. only CPU0 and 6 turned on and set to max > frequency, and DDR set to performance governor). > > The existing method uses a reverse traversal of a section of a hugepage which > based on our series of experiments proves slower than a oneshot(v2) approach on > ARM64.(more details in mail thread) > > We didn't see any benefit on x86 so v2 probably won't find any place in the main > memory.c code. Do you know why you don't see any benefit on x86? It seems unusual that something like this would vary so wildly between two modern architectures. I'd like to understand what's going on. > We are currently thinking of making this optimization ARM64 specific for better > performance by placing this in arch/arm64/mm/memory.c(to be created) file. We > would really appreciate if you can share your opinion on this. There's no need for arch-specific optimisation. Please do it in core code, and allow architectures to opt-out if necessary. That means you probably need to respond to: https://lore.kernel.org/linux-mm/20200417074851.GE26326@shao2-debian/ because that doesn't look as rosy as the numbers you're seeing. Thanks, Will