linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: catalin.marinas@arm.com, Leon Romanovsky <leon@kernel.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	linux-pci@vger.kernel.org, will@kernel.org,
	linux-arm-kernel@lists.infradead.org,
	Clint Sbisa <csbisa@amazon.com>
Subject: Re: [PATCH] arm64: Enable PCI write-combine resources under sysfs
Date: Thu, 10 Sep 2020 16:17:21 +0100	[thread overview]
Message-ID: <20200910151721.GA25809@e121166-lin.cambridge.arm.com> (raw)
In-Reply-To: <20200910123758.GC904879@nvidia.com>

On Thu, Sep 10, 2020 at 09:37:58AM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 10, 2020 at 10:46:00AM +0100, Lorenzo Pieralisi wrote:
> > [+Jason]
> > 
> > On Tue, Sep 08, 2020 at 09:33:42AM +1000, Benjamin Herrenschmidt wrote:
> > > On Thu, 2020-09-03 at 12:08 +0100, Lorenzo Pieralisi wrote:
> > > > > It's been what other architectures have been doing for mroe than a
> > > > > decade without significant issues... I don't think you should worry
> > > > > too
> > > > > much about this.
> > > > 
> > > > Minus what I wrote above, I agree with you. I'd still be able to
> > > > understand what this patch changes in the mellanox driver HW
> > > > handling though - not sure what they expect from
> > > > arch_can_pci_mmap_wc()
> > > > returning 1.
> > > 
> > > I don't know enough to get into the finer details but looking a bit it
> > > seems when this is set, they allow extra ioctls to create buffers
> > > mapped with pgprot_writecombine().
> > > 
> > > I suppose this means faster MMIO backet buffers for small packets (ie,
> > > non-DMA use case).
> > > 
> > > Also note that mlx5_ib_test_wc() only uses arch_can_pci_mmap_wc() for a
> > > non-ROCE ethernet port on a PF... For anyting else, it just seems to
> > > actually try to do it and see what happens :-)
> > > 
> > > Leon: Can you clarify the use of arch_can_pci_mmap_wc() in mlx5 and
> > > whether you see an issue with enabling this on arm64 ?
> > 
> > Hi Jason,
> > 
> > I was wondering if you could help us with this question, we are trying
> > to understand what enabling arch_can_pci_mmap_wc() on arm64 would cause
> > in mellanox drivers wrt mappings and whether there is an expected
> > behaviour behind them, in particular whether there is an implicit
> > reliance on x86 write-combine arch/interconnect details.
> 
> Looking back at this big thread, let me add some perspective

Thank you - it was needed.

> Mellanox drivers have a performance optimization where a 64 byte MemWr
> TLP from the root complex to the MMIO BAR will perform better, often
> quite a bit better. We run WC in full QA'd production on PPC, ARM and
> x86.
> 
> The userspace generates a burst of sequential, aligned 8 byte CPU
> writes to the MMIO address and triggers an arch specific CPU barrier
> to flush/fence the CPU WC buffer. At this point the CPU should emit
> the 64 byte TLP toward the device ASAP.

While at it - mind explaining please what those 64 bytes actully contain ?

> In other words, the only usage here is only about Write. The CPU
> should never, ever, generate a MemRD TLP. The code never does a read
> explicitly.

On arm64 pgprot_writecombine() is speculative memory (normal
non-cacheable), which may not do what you expect from it.

> If the CPU fails to generate a 64 byte TLP then the device will still
> operate correctly but does a different, slower, flow.

Side note: on ARM that TLP is not a native interconnect transaction,
reworded, it depends on what the system-bus->PCI logic does in
this respect.

> If the CPU consistently fails WC then the overhead of trying the WC
> flow is a notable net performance loss, and on these CPUs we want to
> use only 8 byte write to the MMIO BAR, with NC memory.

That's why I looped you in - that's what worries me about "enabling"
arch_can_pci_mmap_wc() on arm64. If we enable it and we have perf
regressions that's not OK.

Or we *can* enable arch_can_pci_mmap_wc() but force the mellanox
driver (or more broadly all drivers following this message push
semantics) to use "something else" for WC detection.

> There are many important details about how this works and how this
> must interact with the CPU barriers and locking.
> 
> On x86, arch_can_pci_mmap_wc() is basically meaningless.

On arm64 too, for the records - or better, write-combine is not
well defined, ergo I don't know what arch_can_pci_mmap_wc() means.

> It indicates there is a chance that pgprot_writecombine() could work.
> It can also be 0 and write combining will work just fine :\.
> 
> Thus, mlx5 switched to doing a runtime WC test to determine if the CPU
> actually supports WC or not. If the arch can reliably tell the driver
> then this test could be avoided. Based on this test the WC mode is
> allowed for userspace.

Can you elaborate on this runtime test please ?

> The one call to arch_can_pci_mmap_wc() is in a case where the HW is
> configured in a way that can't run the test, here we use
> arch_can_pci_mmap_wc() to guess if the CPU has working WC or not.
> Ideally an arch would return 1 only when the CPU has working WC.

Which means we can guarantee the TLP packet you mentioned above I
guess ?

We have to define "working WC" :)

> Depending on workload WC may not be a win. In those cases userspace
> will select NC. Thus the same PCI MMIO BAR region can have a mixture
> of pages with WC and NC mappings to userspace.
> 
> For DEVICE_GRE.. For years now, many deployments of ARM & mlx5 devices
> are using an out of tree patch to use DEVICE_GRE for WC on mlx5. This
> seems to be the preferred working configuration on at least some ARM
> SOCs. So far nobody from the ARM world has shown interest in making a
> mainline solution. :(
> 
> I can't recall if this is because the relevant ARM SOC's don't support
> pgprot_writecombine(), or it doesn't work properly.
> 
> I was told the reason ARM never enabled WC was because unaligned

When you say "enabled WC" I assume you mean making:

pgprot_writecombine() == DEVICE_GRE

> access to WC memory was not supported, and there were existing drivers
> that did unaligned writes that would malfunction. I thought this meant
> that pgprot_writecombine() was non-working in ARM Linux?

On arm64 pgprot_writecombine() is normal non-cacheable memory at the
moment - it works but that does not precisely do what you *expect* from
arch_can_pci_mmap_wc(), that's the whole point I am making.

> So, bit surprised to see a patch messing with arch_can_pci_mmap_wc()
> and not changing the defintion of pgprot_writecombine() ?

We can't change pgprot_writecombine() to DEVICE_GRE, it can trigger
issues on some drivers, see unaligned memory access.

> mlx5 is more or less a representative user WC for this kind of
> 'message push' methodology. Several other RDMA devices do this as
> well. The methodology is important enough that recent Intel CPUs have
> a dedicated instruction to push a 128 byte message in a single TLP
> avoiding this whole WC mess.
> 
> Frankly, I think the kernel should introduce a well defined pgprot for
> this working mode that all archs can agree upon. It should include the
> alignment requirement, message push function, CPU barrier macros, and
> locking macros that are needed to use this facility correctly.
> 
> Defined in a way that is compatible with DEVICE_GRE and can be used by
> these 'message push' drivers. That would switch alway most of the
> users in the kernel today.

That's probably the way forward - I still have concerns about this
patch as it stands given your clarifications above.

Lorenzo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-09-10 15:18 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200831151827.pumm2p54fyj7fz5s@amazon.com>
     [not found] ` <20200902113207.GA27676@e121166-lin.cambridge.arm.com>
     [not found]   ` <20200902142922.xc4x6m33unkzewuh@amazon.com>
2020-09-02 16:47     ` [PATCH] arm64: Enable PCI write-combine resources under sysfs Lorenzo Pieralisi
2020-09-02 17:21       ` Catalin Marinas
2020-09-02 17:54         ` Lorenzo Pieralisi
2020-09-02 23:03           ` Benjamin Herrenschmidt
2020-09-02 23:08         ` Benjamin Herrenschmidt
2020-09-02 23:08           ` Benjamin Herrenschmidt
2020-09-02 23:07       ` Benjamin Herrenschmidt
2020-09-03 11:08         ` Lorenzo Pieralisi
2020-09-03 14:36           ` Clint Sbisa
2020-09-03 22:26           ` Benjamin Herrenschmidt
2020-09-07 23:33           ` Benjamin Herrenschmidt
2020-09-10  9:46             ` Lorenzo Pieralisi
2020-09-10 10:54               ` Leon Romanovsky
2020-09-10 12:37               ` Jason Gunthorpe
2020-09-10 15:17                 ` Lorenzo Pieralisi [this message]
2020-09-10 17:10                   ` Jason Gunthorpe
2020-09-10 21:46                     ` Benjamin Herrenschmidt
2020-09-10 23:29                       ` Jason Gunthorpe
2020-09-11  0:39                         ` Benjamin Herrenschmidt
2020-09-11 14:21                           ` Jason Gunthorpe
2020-09-11 21:42                           ` Clint Sbisa
2020-09-14 14:17                             ` Jason Gunthorpe
2020-09-14 14:24                               ` Clint Sbisa
2020-09-14 14:38                                 ` Jason Gunthorpe
2020-09-14 21:42                                   ` Benjamin Herrenschmidt
2020-09-14 22:00                                     ` Benjamin Herrenschmidt
2020-09-14 22:32                                       ` Clint Sbisa
2020-09-14 22:57                                       ` Jason Gunthorpe
2020-09-14 23:25                                         ` Benjamin Herrenschmidt
2020-09-15 10:18                                           ` Lorenzo Pieralisi
2020-09-15 11:05                                             ` Jason Gunthorpe
2020-09-15 23:17                                               ` Benjamin Herrenschmidt
2020-09-15 23:40                                                 ` Jason Gunthorpe
2020-09-16  7:59                                                   ` Benjamin Herrenschmidt
2020-09-16 12:12                                                     ` Jason Gunthorpe
2020-09-16 14:09                                                       ` Lorenzo Pieralisi
2020-09-16 14:14                                                         ` Jason Gunthorpe
2020-09-16 23:59                                                       ` Benjamin Herrenschmidt
2020-09-17 10:28                                                         ` Lorenzo Pieralisi
2020-09-17 11:32                                                           ` Jason Gunthorpe
2020-09-17 14:01                                                             ` Lorenzo Pieralisi
2020-09-17 16:08                                                               ` Will Deacon
2020-09-16 12:48                                                     ` Leon Romanovsky
2020-09-16  8:33                                                   ` Will Deacon
2020-09-16  8:48                                                     ` Catalin Marinas
2020-09-16 14:15                                                       ` Lorenzo Pieralisi
2020-09-16 17:00                                                         ` Catalin Marinas
2020-09-16 21:29                                                           ` Benjamin Herrenschmidt
2020-09-16 12:08                                                     ` Jason Gunthorpe
2020-09-15 23:00                                             ` Benjamin Herrenschmidt
2020-09-15 23:12                                               ` Clint Sbisa
2020-09-14 21:41                               ` Benjamin Herrenschmidt
2020-08-21 15:51 Clint Sbisa
2020-08-27 14:41 ` Clint Sbisa
2020-08-31 15:22 ` Clint Sbisa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200910151721.GA25809@e121166-lin.cambridge.arm.com \
    --to=lorenzo.pieralisi@arm.com \
    --cc=benh@kernel.crashing.org \
    --cc=catalin.marinas@arm.com \
    --cc=csbisa@amazon.com \
    --cc=helgaas@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).