From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Luis R. Rodriguez" Subject: Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants Date: Thu, 26 Mar 2015 04:00:55 +0100 Message-ID: <20150326030054.GA5622__36515.0119098418$1427339003$gmane$org@wotan.suse.de> References: <1426893517-2511-1-git-send-email-mcgrof@do-not-panic.com> <1426893517-2511-6-git-send-email-mcgrof@do-not-panic.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Yay2Y-0004cJ-GY for xen-devel@lists.xenproject.org; Thu, 26 Mar 2015 03:01:02 +0000 Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Bjorn Helgaas , Arnd Bergmann , Linus Walleij , Stefano Stabellini , Julia Lawall , Peter Senna Tschudin Cc: linux-fbdev@vger.kernel.org, "Michael S. Tsirkin" , Daniel Vetter , Dave Hansen , Jan Beulich , "H. Peter Anvin" , Ville =?iso-8859-1?Q?Syrj=E4l=E4?= , Suresh Siddha , "x86@kernel.org" , Tomi Valkeinen , linux-pci@vger.kernel.org, "xen-devel@lists.xenproject.org" , Ingo Molnar , Borislav Petkov , Jean-Christophe Plagniol-Villard , Benjamin Poirier , Antonino Daplas , Stefan Bader , Dave Airlie , Thomas Gleixner , Ingo Molnar , jgross@suse.com, Toshi Kani List-Id: xen-devel@lists.xenproject.org On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote: > Hi Luis, > > This seems OK to me, Great. > but I'm curious about a few things. > > On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez > wrote: > > From: "Luis R. Rodriguez" > > > > This allows drivers to take advantage of write-combining > > when possible. Ideally we'd have pci_read_bases() just > > peg an IORESOURCE_WC flag for us > > We do set IORESOURCE_PREFETCH. Do you mean something different? I did not think we had a WC IORESOURCE flag. Are you saying that we can use IORESOURCE_PREFETCH for that purpose? If so then great. As I read a PCI BAR can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below. > > but where exactly > > video devices memory lie varies *largely* and at times things > > are mixed with MMIO registers, sometimes we can address > > the changes in drivers, other times the change requires > > intrusive changes. > > What does a video device address have to do with this? I do see that > if a BAR maps only a frame buffer, the device might be able to mark it > prefetchable, while if the BAR mapped both a frame buffer and some > registers, it might not be able to make it prefetchable. But that > doesn't seem like it depends on the *address*. I meant the offsets for each of those, either registers or framebuffer, and that typically they are mixed (primarily on older devices), so indeed your summary of the problem is what I meant. Let's remember that we are trying to take advantage of PAT here when available and avoid MTRR in that case, do we know that the same PCI BARs that have always historically used MTRRs had IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are different things -- but its precisely why I ask. > pci_iomap_range() already makes a cacheable mapping if > IORESOURCE_CACHEABLE; I'm guessing that you would like it to > automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g., > > if (flags & IORESOURCE_CACHEABLE) > return ioremap(start, len); > if (flags & IORESOURCE_PREFETCH) > return ioremap_wc(start, len); > return ioremap_nocache(start, len); Indeed, that's exactly what I think we should strive towards. > Is there a reason not to do that? This depends on the exact defintion of IORESOURCE_PREFETCH and PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and accross *all devices*. This didn't look promising for starters: include/uapi/linux/pci_regs.h:#define PCI_BASE_ADDRESS_MEM_PREFETCH 0x08 /* prefetchable? */ PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions: 1) Can we rest assured for instance that if we check for PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full PCI BAR if the full PCI BAR does want WC? If not this can regress functionality. That seems risky. It however would not be risky if we used another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() -- that way only drivers we know that do use the full PCI bar would use this API. There's a bit of a problem with this though: 2) Do we know that if a *full PCI BAR* is used for WC that PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then the API usage would be restricted only to devices that we know *do* adhere to this. That reduces the possible uses for older drivers and can create regressions if used loosely without verification... but.. 3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH for full PCI BARs that do want WC perhaps newer devices / drivers will use this very consistently ? Can we bank on that and is it worth it ? 4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it must not never want WC ? If we don't have certainty on any of the above I'm afraid we can't do much right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH and hope folks will only use this for the full PCI BAR only if WC is desired. Thoughts? > > Although there is also arch_phys_wc_add() that makes use of > > architecture specific write-combinging alternatives (MTRR on > > x86 when a system does not have PAT) we void polluting > > pci_iomap() space with it and force drivers and subsystems > > that want to use it to be explicit. > > > > There are a few motivations for this: > > > > a) Take advantage of PAT when available > > > > b) Help bury MTRR code away, MTRR is architecture specific and on > > x86 its replaced by PAT > > > > c) Help with the goal of eventually using _PAGE_CACHE_UC over > > _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e) > > ... > > > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev, > > + int bar, > > + unsigned long offset, > > + unsigned long maxlen) > > +{ > > + resource_size_t start = pci_resource_start(dev, bar); > > + resource_size_t len = pci_resource_len(dev, bar); > > + unsigned long flags = pci_resource_flags(dev, bar); > > + > > + if (len <= offset || !start) > > + return NULL; > > + len -= offset; > > + start += offset; > > + if (maxlen && len > maxlen) > > + len = maxlen; > > + if (flags & IORESOURCE_IO) > > + return __pci_ioport_map(dev, start, len); > > + if (flags & IORESOURCE_MEM) > > Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH? > I know the driver might know it's safe even if the device didn't mark > the BAR as prefetchable, but it does seem like an easy way for a > driver to shoot itself in the foot. You tell me. I would fear this may not be consistent and we'd end up having bug reports open for something that has historically been a non-issue. The above questions can help us gauge the risk of this. Luis