From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C01FC54FCC for ; Tue, 21 Apr 2020 13:48:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 283E920672 for ; Tue, 21 Apr 2020 13:48:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 283E920672 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A6DC98E0005; Tue, 21 Apr 2020 09:48:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A1E478E0003; Tue, 21 Apr 2020 09:48:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90C8E8E0005; Tue, 21 Apr 2020 09:48:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0136.hostedemail.com [216.40.44.136]) by kanga.kvack.org (Postfix) with ESMTP id 7782D8E0003 for ; Tue, 21 Apr 2020 09:48:07 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3A0A3824CA08 for ; Tue, 21 Apr 2020 13:48:07 +0000 (UTC) X-FDA: 76731991014.27.cable30_81116f2daf235 X-HE-Tag: cable30_81116f2daf235 X-Filterd-Recvd-Size: 2973 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 21 Apr 2020 13:48:06 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 7219EACF2; Tue, 21 Apr 2020 13:48:04 +0000 (UTC) Subject: Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user To: Will Deacon Cc: Prathu Baronia , catalin.marinas@arm.com, alexander.duyck@gmail.com, chintan.pandya@oneplus.com, mhocko@suse.com, akpm@linux-foundation.org, linux-mm@kvack.org, gregkh@linuxfoundation.com, gthelen@google.com, jack@suse.cz, ken.lin@oneplus.com, gasine.xu@oneplus.com, ying.huang@intel.com, mark.rutland@arm.com References: <20200414153829.GA15230@oneplus.com> <87r1wpzavo.fsf@yhuang-dev.intel.com> <20200419155856.dtwxomdkyujljdfi@oneplus.com> <87k12bt3ff.fsf@yhuang-dev.intel.com> <20200421093621.3fuptvf2qbyfzwfz@oneplus.com> <20200421100932.GC17256@willie-the-truck> <02d5daa8-ee7b-7d2d-6753-5191a7d761b9@suse.cz> <20200421133935.GC17875@willie-the-truck> From: Vlastimil Babka Message-ID: <5e334947-22e9-e59d-f7bb-63e04cc8caf0@suse.cz> Date: Tue, 21 Apr 2020 15:48:04 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200421133935.GC17875@willie-the-truck> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/21/20 3:39 PM, Will Deacon wrote: > On Tue, Apr 21, 2020 at 02:48:04PM +0200, Vlastimil Babka wrote: >> On 4/21/20 2:47 PM, Vlastimil Babka wrote: >> > >> > It was suspected that current Intel can prefetch forward and backwards, and the >> > tested ARM64 microarchitecture only backwards, can it be true? The current code >> >> Oops, tested ARM64 microarchitecture I meant "only forwards". > > I'd be surprised if that's the case, but it could be that there's an erratum > workaround in play which hampers the prefetch behaviour. We generally try > not to assume too much about the prefetcher on arm64 because they're not > well documented and vary wildly between different micro-architectures. Yeah it's probably not as simple as I thought, as the test code [1] shows the page iteration goes backwards, but per-page memsets are not special. So maybe it's not hardware specifics, but x86 memtest implementation is also done backwards, so it fits the backwards outer loop, but arm64 memset is forward, so the resulting pattern is non-linear? In that case it's also a question if the measurement was done in kernel or userspace, and if userspace memset have any implications for kernel memset... [1] https://lore.kernel.org/linux-mm/20200414153829.GA15230@oneplus.com/ > Will >