From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7D17C4708F for ; Wed, 2 Jun 2021 09:38:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C1E9613BF for ; Wed, 2 Jun 2021 09:38:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231994AbhFBJjp convert rfc822-to-8bit (ORCPT ); Wed, 2 Jun 2021 05:39:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:50214 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229818AbhFBJjn (ORCPT ); Wed, 2 Jun 2021 05:39:43 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0467E60FF2; Wed, 2 Jun 2021 09:38:01 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1loNK2-004zX9-Vi; Wed, 02 Jun 2021 10:37:59 +0100 Date: Wed, 02 Jun 2021 10:37:58 +0100 Message-ID: <878s3s1ua1.wl-maz@kernel.org> From: Marc Zyngier To: Shanker R Donthineni Cc: Catalin Marinas , Will Deacon , Vikram Sethi , Alex Williamson , Mark Kettenis , "christoffer.dall@arm.com" , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.cs.columbia.edu" , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , Jason Sequeira Subject: Re: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA In-Reply-To: <273ba1c2-dfe6-7dc1-3e40-03398e82469b@nvidia.com> References: <878s4zokll.wl-maz@kernel.org> <87eeeqvm1d.wl-maz@kernel.org> <87bl9sunnw.wl-maz@kernel.org> <20210503084432.75e0126d@x1.home.shazbot.org> <20210504083005.GA12290@willie-the-truck> <20210505180228.GA3874@arm.com> <273ba1c2-dfe6-7dc1-3e40-03398e82469b@nvidia.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: sdonthineni@nvidia.com, catalin.marinas@arm.com, will@kernel.org, vsethi@nvidia.com, alex.williamson@redhat.com, mark.kettenis@xs4all.nl, christoffer.dall@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jsequeira@nvidia.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Shanker, On Sat, 08 May 2021 17:33:11 +0100, Shanker R Donthineni wrote: > > Hi Marc, > > On 5/5/21 1:02 PM, Catalin Marinas wrote: > >>> Will/Catalin, perhaps you could explain your thought process on why you chose > >>> Normal NC for ioremap_wc on the armv8 linux port instead of Device GRE or other > >>> Device Gxx. > >> I think a combination of: compatibility with 32-bit Arm, the need to > >> support unaligned accesses and the potential for higher performance. > > IIRC the _wc suffix also matches the pgprot_writecombine() used by some > > drivers to map a video framebuffer into user space. Accesses to the > > framebuffer are not guaranteed to be aligned (memset/memcpy don't ensure > > alignment on arm64 and the user doesn't have a memset_io or memcpy_toio). > > > >> Furthermore, ioremap() already gives you a Device memory type, and we're > >> tight on MAIR space. > > We have MT_DEVICE_GRE currently reserved though no in-kernel user, we > > might as well remove it. > @Marc, Could you provide your thoughts/guidance for the next step? The > proposal of getting hints for prefetchable regions from VFIO/QEMU is not > recommended, The only option left is to implement ARM64 dependent logic > in KVM. > > Option-1: I think we could take advantage of stage-1/2 combining rules to > allow NORMAL_NC memory-type for device memory in VM. Always map > device memory at stage-2 as NORMAL-NC and trust VM's stage-1 MT. > > --------------------------------------------------------------- > Stage-2 MT     Stage-1 MT    Resultant MT (combining-rules/FWB) > --------------------------------------------------------------- > Normal-NC      Normal-WT           Normal-NC >    -           Normal-WB              - >    -           Normal-NC              - >    -           Device-       Device- > --------------------------------------------------------------- I think this is unwise. Will recently debugged a pretty horrible situation when doing exactly that: when S1 is off and S2 is on, the I-side is allowed to generate speculative accesses (see ARMv8 ARM G.a D5.2.9 for the details). And yes, implementations definitely do that. Add side-effect reads to the mix, and you're in for a treat. > We've been using this option internally for testing purpose and > validated with NVME/Mellanox/GPU pass-through devices on > Marvell-Thundex2 platform. See above. It *will* break eventually. > Option-2: Get resource properties associated with MMIO using lookup_resource() > and map at stage-2 as Normal-NC if IORESOURCE_PREFETCH is set in flags. That's a pretty roundabout way of doing exactly the same thing you initially proposed. And it suffers from the exact same problems, which is that you change the semantics of the mapping without knowing what the guest's intent is. M. -- Without deviation from the norm, progress is not possible.