From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752764AbbFYAIS (ORCPT ); Wed, 24 Jun 2015 20:08:18 -0400 Received: from cantor2.suse.de ([195.135.220.15]:40807 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752634AbbFYAIK (ORCPT ); Wed, 24 Jun 2015 20:08:10 -0400 Date: Thu, 25 Jun 2015 02:08:03 +0200 From: "Luis R. Rodriguez" To: Benjamin Herrenschmidt Cc: "Michael S. Tsirkin" , Bjorn Helgaas , Toshi Kani , Andy Lutomirski , Juergen Gross , Tomi Valkeinen , Arnd Bergmann , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "xen-devel@lists.xensource.com" , linux-fbdev , Suresh Siddha , Ingo Molnar , Thomas Gleixner , Daniel Vetter , Dave Airlie , Antonino Daplas , Jean-Christophe Plagniol-Villard , Dave Hansen , venkatesh.pallipadi@intel.com, Stefan Bader , Ville =?iso-8859-1?Q?Syrj=E4l=E4?= , Mel Gorman , Vlastimil Babka , Borislav Petkov , Davidlohr Bueso , Konrad Rzeszutek Wilk , Ville =?iso-8859-1?Q?Syrj=E4l=E4?= , David Vrabel , Jan Beulich , Roger Pau =?iso-8859-1?Q?Monn=E9?= Subject: Re: [PATCH v7 5/9] PCI: Add pci_iomap_wc() variants Message-ID: <20150625000803.GH11147@wotan.suse.de> References: <1434751712-24333-1-git-send-email-mcgrof@do-not-panic.com> <1434751712-24333-6-git-send-email-mcgrof@do-not-panic.com> <1435099343.3996.13.camel@kernel.crashing.org> <20150624163821.GF11147@wotan.suse.de> <1435183509.3790.14.camel@kernel.crashing.org> <1435189081.3790.24.camel@kernel.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1435189081.3790.24.camel@kernel.crashing.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 25, 2015 at 09:38:01AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2015-06-24 at 15:29 -0700, Luis R. Rodriguez wrote: > > > Nope but at least what made me squint at this being a possible > > "feature" was that in practice when reviewing all of the kernels > > pending device drivers using MTRR (potential write-combine candidates) > > I encountered a slew of them which had the architectural unfortunate > > practice of combining PCI bars for MMIO and their respective > > write-combined desirable area (framebuffer for video, PIO buffers for > > infiniband, etc). Now, to me that read more as a practice for old > > school devices when such things were likely still being evaluated, > > more modern devices seem to adhere to sticking a full PCI bar with > > write-combining or not. Did you not encounter such mismatch splits on > > powerpc ? Was such possibility addressed? > > Yes, I remember we dealt with some networking (or maybe IB) stuff back > in the day. We dealt with it by using a WC mapping and explicit barriers > to prevent combine when not wanted. > > It is to be noted that on powerpc at least, writel() and co will never > combine due to the memory barriers in them. Only "normal" stores (or > __raw_writel) will. > > On Intel things I different I assume... And the people who really know seem to be eaten by volcanoes or not have time. > The problem I see is that architectures can provide widely different > mechanisms and semantics in those areas and it's hard to define a > generic driver interface. Provided asm generic helpers are defined this should work though. The question is just if there is enough motivation. Doesn't sound like it or as you note maybe for userspace there might be. My position is that if it was too late for PCIE or if this was too ambigious for PCIE perhaps the next generation bus archicture or ammendments (I have no clue if this would would be possible) will make this part of future device negotiation clear and fully expected, not a wonderful side effect. > > If what you are implying here is applicable to the x86 world I'm all > > for enabling this as we'd have less code to maintain but I'll note > > that getting a clarification alone on that prefetchable != > > write-combining was in and of itself hard, I'd be surprised if we > > could get full architectural buy-in to this as an immediate automatic > > feature. > > I'm happy not to make it automatic for kernel space. OK thanks I'll proceed with these patches then. > As for user mappings, Which APIs were you considering in this regard BTW? > maybe the right thing to do is to let us do what we do by > default with a quirk that can set a flag in pci_dev to disable that > behaviour (maybe on a per BAR basis ?). That might mean it could restrict userspace WC to require devices to have WC parts on a full PCI BAR. Although this is restrictive having reviewed most WC uses in the kernel I'd think this would be a fair compromise to make, but again, if things are still murky perhaps best we kiss this idea good bye for now and hope for it to come in on future buses or ammendments (if that's even possible?). > I think the common case is that WC works. If WC does not I will note one hack which migh be worth mentioning -- just for the record, this was devised as a shortcoming of a device where they failed to split things properly and that *without* WC performance suffered quite a bit so they made one full PCI BAR WC and as a work around this: http://lkml.kernel.org/r/20150416041837.GA5712@hykim-PC That is for registers that needed it: write; wmb; Then if they wanted to wait till the NIC has seen the write, they did: write; wmb; read; Luis From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Luis R. Rodriguez" Date: Thu, 25 Jun 2015 00:08:03 +0000 Subject: Re: [PATCH v7 5/9] PCI: Add pci_iomap_wc() variants Message-Id: <20150625000803.GH11147@wotan.suse.de> List-Id: References: <1434751712-24333-1-git-send-email-mcgrof@do-not-panic.com> <1434751712-24333-6-git-send-email-mcgrof@do-not-panic.com> <1435099343.3996.13.camel@kernel.crashing.org> <20150624163821.GF11147@wotan.suse.de> <1435183509.3790.14.camel@kernel.crashing.org> <1435189081.3790.24.camel@kernel.crashing.org> In-Reply-To: <1435189081.3790.24.camel@kernel.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Benjamin Herrenschmidt Cc: "Michael S. Tsirkin" , Bjorn Helgaas , Toshi Kani , Andy Lutomirski , Juergen Gross , Tomi Valkeinen , Arnd Bergmann , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "xen-devel@lists.xensource.com" , linux-fbdev , Suresh Siddha , Ingo Molnar , Thomas Gleixner , Daniel Vetter , Dave Airlie , Antonino Daplas , Jean-Christophe Plagniol-Villard , Dave Hansen , venkatesh.pallipadi@intel.com, Stefan Bader , Ville =?iso-8859-1?Q?Syrj=E4l=E4?= , Mel Gorman , Vlastimil Babka , Borislav Petkov , Davidlohr Bueso , Konrad Rzeszutek Wilk , Ville =?iso-8859-1?Q?Syrj=E4l=E4?= , David Vrabel , Jan Beulich , Roger Pau =?iso-8859-1?Q?Monn=E9?= On Thu, Jun 25, 2015 at 09:38:01AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2015-06-24 at 15:29 -0700, Luis R. Rodriguez wrote: > > > Nope but at least what made me squint at this being a possible > > "feature" was that in practice when reviewing all of the kernels > > pending device drivers using MTRR (potential write-combine candidates) > > I encountered a slew of them which had the architectural unfortunate > > practice of combining PCI bars for MMIO and their respective > > write-combined desirable area (framebuffer for video, PIO buffers for > > infiniband, etc). Now, to me that read more as a practice for old > > school devices when such things were likely still being evaluated, > > more modern devices seem to adhere to sticking a full PCI bar with > > write-combining or not. Did you not encounter such mismatch splits on > > powerpc ? Was such possibility addressed? > > Yes, I remember we dealt with some networking (or maybe IB) stuff back > in the day. We dealt with it by using a WC mapping and explicit barriers > to prevent combine when not wanted. > > It is to be noted that on powerpc at least, writel() and co will never > combine due to the memory barriers in them. Only "normal" stores (or > __raw_writel) will. > > On Intel things I different I assume... And the people who really know seem to be eaten by volcanoes or not have time. > The problem I see is that architectures can provide widely different > mechanisms and semantics in those areas and it's hard to define a > generic driver interface. Provided asm generic helpers are defined this should work though. The question is just if there is enough motivation. Doesn't sound like it or as you note maybe for userspace there might be. My position is that if it was too late for PCIE or if this was too ambigious for PCIE perhaps the next generation bus archicture or ammendments (I have no clue if this would would be possible) will make this part of future device negotiation clear and fully expected, not a wonderful side effect. > > If what you are implying here is applicable to the x86 world I'm all > > for enabling this as we'd have less code to maintain but I'll note > > that getting a clarification alone on that prefetchable !> > write-combining was in and of itself hard, I'd be surprised if we > > could get full architectural buy-in to this as an immediate automatic > > feature. > > I'm happy not to make it automatic for kernel space. OK thanks I'll proceed with these patches then. > As for user mappings, Which APIs were you considering in this regard BTW? > maybe the right thing to do is to let us do what we do by > default with a quirk that can set a flag in pci_dev to disable that > behaviour (maybe on a per BAR basis ?). That might mean it could restrict userspace WC to require devices to have WC parts on a full PCI BAR. Although this is restrictive having reviewed most WC uses in the kernel I'd think this would be a fair compromise to make, but again, if things are still murky perhaps best we kiss this idea good bye for now and hope for it to come in on future buses or ammendments (if that's even possible?). > I think the common case is that WC works. If WC does not I will note one hack which migh be worth mentioning -- just for the record, this was devised as a shortcoming of a device where they failed to split things properly and that *without* WC performance suffered quite a bit so they made one full PCI BAR WC and as a work around this: http://lkml.kernel.org/r/20150416041837.GA5712@hykim-PC That is for registers that needed it: write; wmb; Then if they wanted to wait till the NIC has seen the write, they did: write; wmb; read; Luis From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Luis R. Rodriguez" Subject: Re: [PATCH v7 5/9] PCI: Add pci_iomap_wc() variants Date: Thu, 25 Jun 2015 02:08:03 +0200 Message-ID: <20150625000803.GH11147@wotan.suse.de> References: <1434751712-24333-1-git-send-email-mcgrof@do-not-panic.com> <1434751712-24333-6-git-send-email-mcgrof@do-not-panic.com> <1435099343.3996.13.camel@kernel.crashing.org> <20150624163821.GF11147@wotan.suse.de> <1435183509.3790.14.camel@kernel.crashing.org> <1435189081.3790.24.camel@kernel.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <1435189081.3790.24.camel@kernel.crashing.org> Sender: linux-pci-owner@vger.kernel.org To: Benjamin Herrenschmidt Cc: "Michael S. Tsirkin" , Bjorn Helgaas , Toshi Kani , Andy Lutomirski , Juergen Gross , Tomi Valkeinen , Arnd Bergmann , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "xen-devel@lists.xensource.com" , linux-fbdev , Suresh Siddha , Ingo Molnar , Thomas Gleixner , Daniel Vetter , Dave Airlie , Antonino Daplas , Jean-Christophe Plagniol-Villard , Dave Hansen , venkatesh.pallipadi@intel.com, Stefan Bader , Ville List-Id: xen-devel@lists.xenproject.org On Thu, Jun 25, 2015 at 09:38:01AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2015-06-24 at 15:29 -0700, Luis R. Rodriguez wrote: > > > Nope but at least what made me squint at this being a possible > > "feature" was that in practice when reviewing all of the kernels > > pending device drivers using MTRR (potential write-combine candidates) > > I encountered a slew of them which had the architectural unfortunate > > practice of combining PCI bars for MMIO and their respective > > write-combined desirable area (framebuffer for video, PIO buffers for > > infiniband, etc). Now, to me that read more as a practice for old > > school devices when such things were likely still being evaluated, > > more modern devices seem to adhere to sticking a full PCI bar with > > write-combining or not. Did you not encounter such mismatch splits on > > powerpc ? Was such possibility addressed? > > Yes, I remember we dealt with some networking (or maybe IB) stuff back > in the day. We dealt with it by using a WC mapping and explicit barriers > to prevent combine when not wanted. > > It is to be noted that on powerpc at least, writel() and co will never > combine due to the memory barriers in them. Only "normal" stores (or > __raw_writel) will. > > On Intel things I different I assume... And the people who really know seem to be eaten by volcanoes or not have time. > The problem I see is that architectures can provide widely different > mechanisms and semantics in those areas and it's hard to define a > generic driver interface. Provided asm generic helpers are defined this should work though. The question is just if there is enough motivation. Doesn't sound like it or as you note maybe for userspace there might be. My position is that if it was too late for PCIE or if this was too ambigious for PCIE perhaps the next generation bus archicture or ammendments (I have no clue if this would would be possible) will make this part of future device negotiation clear and fully expected, not a wonderful side effect. > > If what you are implying here is applicable to the x86 world I'm all > > for enabling this as we'd have less code to maintain but I'll note > > that getting a clarification alone on that prefetchable != > > write-combining was in and of itself hard, I'd be surprised if we > > could get full architectural buy-in to this as an immediate automatic > > feature. > > I'm happy not to make it automatic for kernel space. OK thanks I'll proceed with these patches then. > As for user mappings, Which APIs were you considering in this regard BTW? > maybe the right thing to do is to let us do what we do by > default with a quirk that can set a flag in pci_dev to disable that > behaviour (maybe on a per BAR basis ?). That might mean it could restrict userspace WC to require devices to have WC parts on a full PCI BAR. Although this is restrictive having reviewed most WC uses in the kernel I'd think this would be a fair compromise to make, but again, if things are still murky perhaps best we kiss this idea good bye for now and hope for it to come in on future buses or ammendments (if that's even possible?). > I think the common case is that WC works. If WC does not I will note one hack which migh be worth mentioning -- just for the record, this was devised as a shortcoming of a device where they failed to split things properly and that *without* WC performance suffered quite a bit so they made one full PCI BAR WC and as a work around this: http://lkml.kernel.org/r/20150416041837.GA5712@hykim-PC That is for registers that needed it: write; wmb; Then if they wanted to wait till the NIC has seen the write, they did: write; wmb; read; Luis