From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4277DC38A29 for ; Mon, 20 Apr 2020 00:18:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EAF772073A for ; Mon, 20 Apr 2020 00:18:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EAF772073A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 769A18E0005; Sun, 19 Apr 2020 20:18:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 719B98E0003; Sun, 19 Apr 2020 20:18:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E1548E0005; Sun, 19 Apr 2020 20:18:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0246.hostedemail.com [216.40.44.246]) by kanga.kvack.org (Postfix) with ESMTP id 42B3B8E0003 for ; Sun, 19 Apr 2020 20:18:50 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id ED91B5DC2 for ; Mon, 20 Apr 2020 00:18:49 +0000 (UTC) X-FDA: 76726322778.21.rings70_873b85770652a X-HE-Tag: rings70_873b85770652a X-Filterd-Recvd-Size: 5295 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Mon, 20 Apr 2020 00:18:49 +0000 (UTC) IronPort-SDR: XAG3SQas4Q+L9CTGk6WkP7JZQ2eEYmDHPLjBphVL9Galzars+G8JKlZJAT52pWf8OBUQlvAt+V JFQVuVyEIp+w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2020 17:18:47 -0700 IronPort-SDR: Omgd1zFLLN/AVV2eZHt/f1pcoLpFxJvXONvO3B8wRWojUGQLIWIOpUwYPaNcSaz3mTlCh22FqH TvX6LjfqKQig== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,405,1580803200"; d="scan'208";a="273014743" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.23]) by orsmga002.jf.intel.com with ESMTP; 19 Apr 2020 17:18:45 -0700 From: "Huang\, Ying" To: Prathu Baronia Cc: , , , , , , , , , Subject: Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user References: <20200414153829.GA15230@oneplus.com> <87r1wpzavo.fsf@yhuang-dev.intel.com> <20200419155856.dtwxomdkyujljdfi@oneplus.com> Date: Mon, 20 Apr 2020 08:18:44 +0800 In-Reply-To: <20200419155856.dtwxomdkyujljdfi@oneplus.com> (Prathu Baronia's message of "Sun, 19 Apr 2020 21:28:57 +0530") Message-ID: <87k12bt3ff.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Prathu Baronia writes: > The 04/15/2020 11:27, Huang, Ying wrote: >> >> Can you describe your test? >> > We profile the clear_huge_page() using ftrace while parallely force triggering it by a simple > userspace test code which allocates 100MB of anon memory and traverses through > it in loop. >> >> You have tested the chunk sizes 4KB and 2MB, can you test some values in >> between? For example 32KB or 64KB? Maybe there's a sweet point with >> some smaller granularity and good performance. > Based on your advise I tried chunk sizes of 4KB, 8KB, 16KB, 32KB and 64KB on > arm64 and x86_64 by copying the kernel memset implementation for both the archs. > ------------------------------------------------------------------------------- > Results(the sample size is 100 for each and the values are in us):- > ------------------------------------------------------------------------------- > ARM64(CPU0 & 6 on and set at max frequency, DDR set to performance governor):- > ------------------------------------------------------------------------------- > Chunk Size = 4KB > ----------------- > Oneshot > Mean : 3402.06 > Stddev : 72.6576 > Forward > Mean : 3408.04 > Stddev : 72.976 > Reverse > Mean : 17699.3 > Stddev : 132.875 > ----------------- > Chunk Size = 8KB > ----------------- > Oneshot > Mean : 3398.64 > Stddev : 80.6334 > Forward > Mean : 3391.58 > Stddev : 65.9063 > Reverse > Mean : 13909.2 > Stddev : 194.324 > ----------------- > Chunk Size = 16KB > ----------------- > Oneshot > Mean : 3393.57 > Stddev : 72.2485 > Forward > Mean : 3404.69 > Stddev : 84.4705 > Reverse > Mean : 9278.65 > Stddev : 217.725 > ----------------- > Chunk Size = 32KB > ----------------- > Oneshot > Mean : 3425.7 > Stddev : 129.156 > Forward > Mean : 3402.07 > Stddev : 82.6713 > Reverse > Mean : 6831.43 > Stddev : 184.807 > ----------------- > Chunk Size = 64KB > ----------------- > Oneshot > Mean : 3398.72 > Stddev : 77.9703 > Forward > Mean : 3413.52 > Stddev : 173.121 > Reverse > Mean : 5542.84 > Stddev : 197.017 Maybe a little larger chunk size is good enough for ARM64? > --------------------------------------------- > x86_64(Only CPU0 on and set to max frequency) > --------------------------------------------- > Chunk Size = 4KB > ----------------- > Oneshot > Mean : 6752.59 > Stddev : 298.988 > Forward > Mean : 6873.6 > Stddev : 325.607 > Reverse > Mean : 6722.88 > Stddev : 365.837 > ----------------- > Chunk Size = 8KB > ----------------- > Oneshot > Mean : 6848.57 > Stddev : 955.312 > Forward > Mean : 7012.24 > Stddev : 1377.27 > Reverse > Mean : 6688.83 > Stddev : 589.935 > ----------------- > Chunk Size = 16KB > ----------------- > Oneshot > Mean : 6846.87 > Stddev : 546.173 > Forward > Mean : 6785.26 > Stddev : 248.022 > Reverse > Mean : 6613.33 > Stddev : 350.003 > ----------------- > Chunk Size = 32KB > ----------------- > Oneshot > Mean : 6862.19 > Stddev : 870.524 > Forward > Mean : 6826.3 > Stddev : 870.023 > Reverse > Mean : 6747.69 > Stddev : 1047.5 > ----------------- > Chunk Size = 64KB > ----------------- > Oneshot > Mean : 6806.9 > Stddev : 609.112 > Forward > Mean : 6774.53 > Stddev : 311.954 > Reverse > Mean : 6553.47 > Stddev : 293.52 Per my understanding, X86 cannot benefit anything from the change. Best Regards, Huang, Ying