All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagigrim-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v2 01/24] mlx4-ib: Use coherent memory for priv pages
Date: Sun, 19 Jun 2016 12:58:59 +0300	[thread overview]
Message-ID: <57666CE3.3070803@gmail.com> (raw)
In-Reply-To: <4D23496A-FE01-4693-B125-82CD03B8F2D4-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>


>> In mlx5 system, we always added 2048 bytes to such allocations, for
>> reasons unknown to me. And it doesn't seem as a conservative approach
>> either.
>
> The mlx5 approach is much better than allocating a whole
> page, when you consider platforms with 64KB pages.
>
> A 1MB payload (for NFS) on such a platform comprises just
> 16 pages. So xprtrdma will allocate MRs with support for
> 16 pages. That's a priv pages array of 128 bytes, and you
> just put it in a 64KB page all by itself.
>
> So maybe adding 2048 bytes is not optimal either. But I
> think sticking with kmalloc here is a more optimal choice.

Again, the 2K constraint is not coming from any sort of dma mapping
alignment consideration, it comes from the _device_ limitation requiring
the translation vector to be aligned to 2K.

>>>> Also, I don't see how that solves the issue, I'm not sure I even
>>>> understand the issue. Do you? Were you able to reproduce it?
>>>
>>> The issue is that dma_map_single() does not seem to DMA map
>>> portions of a memory region that are past the end of the first
>>> page of that region. Maybe that's a bug?
>>
>> No, I didn't find support for that. Function dma_map_single expects
>> contiguous memory aligned to cache line, there is no limitation to be
>> page bounded.
>
> There certainly isn't, but that doesn't mean there can't
> be a bug somewhere ;-) and maybe not in dma_map_single.
> It could be that the "array on one page only" limitation
> is somewhere else in the mlx4 driver, or even in the HCA
> firmware.

I'm starting to think this is the case. Leon, I think it's time
to get the FW/HW guys involved...

>> I disagree, kmalloc with supplied flags will return contiguous memory
>> which is enough for dma_map_single. It is cache line alignment.
>
> The reason I find this hard to believe is that there is
> no end alignment guarantee at all in this code, but it
> works without issue when SLUB debugging is not enabled.
>
> xprtrdma allocates 256 elements in this array on x86.
> The code makes the array start on an 0x40 byte boundary.
> I'm pretty sure that means the end of that array will
> also be on at least an 0x40 byte boundary, and thus
> aligned to the DMA cacheline, whether or not SLUB
> debugging is enabled.
>
> Notice that in the current code, if the consumer requests
> an odd number of SGs, that array can't possibly end on
> an alignment boundary. But we've never had a complaint.
>
> SLUB debugging changes the alignment of lots of things,
> but mlx4_alloc_priv_pages is the only breakage that has
> been reported.

I tend to agree, I even have a feeling that this won't happen
on mlx5.

> DMA-API.txt says:
>
>> [T]he mapped region must begin exactly on a cache line
>> boundary and end exactly on one (to prevent two separately
>> mapped regions from sharing a single cache line)
>
> The way I read this, cacheline alignment shouldn't be
> an issue at all, as long as DMA cachelines aren't
> shared between mappings.
>
> If I simply increase the memory allocation size a little
> and ensure the end of the mapping is aligned, that should
> be enough to prevent DMA cacheline sharing with another
> memory allocation on the same page. But I still see Local
> Protection Errors when SLUB debugging is enabled, on my
> system (with patches to allocate more pages per MR).
>
> I'm not convinced this has anything to do with DMA
> cacheline alignment. The reason your patch fixes this
> issue is because it keeps the entire array on one page.

I share this feeling, I wrote several times that I don't understand
how this patch solves the issue and I would appreciate if someone
can explain it to me (preferably with evidence).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Sagi Grimberg <sagigrim@gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>, leon@kernel.org
Cc: linux-rdma@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v2 01/24] mlx4-ib: Use coherent memory for priv pages
Date: Sun, 19 Jun 2016 12:58:59 +0300	[thread overview]
Message-ID: <57666CE3.3070803@gmail.com> (raw)
In-Reply-To: <4D23496A-FE01-4693-B125-82CD03B8F2D4@oracle.com>


>> In mlx5 system, we always added 2048 bytes to such allocations, for
>> reasons unknown to me. And it doesn't seem as a conservative approach
>> either.
>
> The mlx5 approach is much better than allocating a whole
> page, when you consider platforms with 64KB pages.
>
> A 1MB payload (for NFS) on such a platform comprises just
> 16 pages. So xprtrdma will allocate MRs with support for
> 16 pages. That's a priv pages array of 128 bytes, and you
> just put it in a 64KB page all by itself.
>
> So maybe adding 2048 bytes is not optimal either. But I
> think sticking with kmalloc here is a more optimal choice.

Again, the 2K constraint is not coming from any sort of dma mapping
alignment consideration, it comes from the _device_ limitation requiring
the translation vector to be aligned to 2K.

>>>> Also, I don't see how that solves the issue, I'm not sure I even
>>>> understand the issue. Do you? Were you able to reproduce it?
>>>
>>> The issue is that dma_map_single() does not seem to DMA map
>>> portions of a memory region that are past the end of the first
>>> page of that region. Maybe that's a bug?
>>
>> No, I didn't find support for that. Function dma_map_single expects
>> contiguous memory aligned to cache line, there is no limitation to be
>> page bounded.
>
> There certainly isn't, but that doesn't mean there can't
> be a bug somewhere ;-) and maybe not in dma_map_single.
> It could be that the "array on one page only" limitation
> is somewhere else in the mlx4 driver, or even in the HCA
> firmware.

I'm starting to think this is the case. Leon, I think it's time
to get the FW/HW guys involved...

>> I disagree, kmalloc with supplied flags will return contiguous memory
>> which is enough for dma_map_single. It is cache line alignment.
>
> The reason I find this hard to believe is that there is
> no end alignment guarantee at all in this code, but it
> works without issue when SLUB debugging is not enabled.
>
> xprtrdma allocates 256 elements in this array on x86.
> The code makes the array start on an 0x40 byte boundary.
> I'm pretty sure that means the end of that array will
> also be on at least an 0x40 byte boundary, and thus
> aligned to the DMA cacheline, whether or not SLUB
> debugging is enabled.
>
> Notice that in the current code, if the consumer requests
> an odd number of SGs, that array can't possibly end on
> an alignment boundary. But we've never had a complaint.
>
> SLUB debugging changes the alignment of lots of things,
> but mlx4_alloc_priv_pages is the only breakage that has
> been reported.

I tend to agree, I even have a feeling that this won't happen
on mlx5.

> DMA-API.txt says:
>
>> [T]he mapped region must begin exactly on a cache line
>> boundary and end exactly on one (to prevent two separately
>> mapped regions from sharing a single cache line)
>
> The way I read this, cacheline alignment shouldn't be
> an issue at all, as long as DMA cachelines aren't
> shared between mappings.
>
> If I simply increase the memory allocation size a little
> and ensure the end of the mapping is aligned, that should
> be enough to prevent DMA cacheline sharing with another
> memory allocation on the same page. But I still see Local
> Protection Errors when SLUB debugging is enabled, on my
> system (with patches to allocate more pages per MR).
>
> I'm not convinced this has anything to do with DMA
> cacheline alignment. The reason your patch fixes this
> issue is because it keeps the entire array on one page.

I share this feeling, I wrote several times that I don't understand
how this patch solves the issue and I would appreciate if someone
can explain it to me (preferably with evidence).

  parent reply	other threads:[~2016-06-19  9:58 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-15  3:15 [PATCH v2 00/24] NFS/RDMA client patches proposed for v4.8 Chuck Lever
2016-06-15  3:15 ` Chuck Lever
     [not found] ` <20160615030626.14794.43805.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-06-15  3:15   ` [PATCH v2 01/24] mlx4-ib: Use coherent memory for priv pages Chuck Lever
2016-06-15  3:15     ` Chuck Lever
     [not found]     ` <20160615031525.14794.69066.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-06-15  4:28       ` Leon Romanovsky
2016-06-15  4:28         ` Leon Romanovsky
     [not found]         ` <20160615042849.GR5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-15 16:40           ` Chuck Lever
2016-06-15 16:40             ` Chuck Lever
     [not found]             ` <68F7CD80-0092-4B55-9FAD-4C54D284BCA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-06-16 14:35               ` Leon Romanovsky
2016-06-16 14:35                 ` Leon Romanovsky
     [not found]                 ` <20160616143518.GX5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-16 21:10                   ` Sagi Grimberg
2016-06-16 21:10                     ` Sagi Grimberg
     [not found]                     ` <576315C9.30002-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-06-16 21:58                       ` Chuck Lever
2016-06-16 21:58                         ` Chuck Lever
     [not found]                         ` <652EBA09-2978-414C-8606-38A96C63365A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-06-17  9:20                           ` Leon Romanovsky
2016-06-17  9:20                             ` Leon Romanovsky
     [not found]                             ` <20160617092018.GZ5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-17 19:55                               ` Chuck Lever
2016-06-17 19:55                                 ` Chuck Lever
     [not found]                                 ` <4D23496A-FE01-4693-B125-82CD03B8F2D4-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-06-18 10:56                                   ` Leon Romanovsky
2016-06-18 10:56                                     ` Leon Romanovsky
     [not found]                                     ` <20160618105650.GD5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-18 20:08                                       ` Chuck Lever
2016-06-18 20:08                                         ` Chuck Lever
     [not found]                                         ` <5D0A6B47-CB71-42DA-AE76-164B6A660ECC-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-06-19 10:04                                           ` Sagi Grimberg
2016-06-19 10:04                                             ` Sagi Grimberg
     [not found]                                             ` <57666E14.2070802-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-06-19 19:38                                               ` Or Gerlitz
2016-06-19 19:38                                                 ` Or Gerlitz
     [not found]                                                 ` <CAJ3xEMha=i5SaM+fV5XP15Fx6pymrYnvk=a8w+3h1cbBeuVX0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-19 19:43                                                   ` Or Gerlitz
2016-06-19 19:43                                                     ` Or Gerlitz
2016-06-19 20:02                                                   ` Chuck Lever
2016-06-19 20:02                                                     ` Chuck Lever
     [not found]                                                     ` <B62EEE84-454D-4D55-9C7E-653F564DF381-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-06-20  5:44                                                       ` Leon Romanovsky
2016-06-20  5:44                                                         ` Leon Romanovsky
     [not found]                                                         ` <20160620054453.GA1172-2ukJVAZIZ/Y@public.gmane.org>
2016-06-20  6:34                                                           ` Sagi Grimberg
2016-06-20  6:34                                                             ` Sagi Grimberg
     [not found]                                                             ` <57678E74.1070308-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-06-20  7:01                                                               ` Leon Romanovsky
2016-06-20  7:01                                                                 ` Leon Romanovsky
     [not found]                                                                 ` <20160620070141.GD1172-2ukJVAZIZ/Y@public.gmane.org>
2016-06-20  8:35                                                                   ` Sagi Grimberg
2016-06-20  8:35                                                                     ` Sagi Grimberg
2016-06-20 13:41                                                           ` Yishai Hadas
2016-06-20 13:41                                                             ` Yishai Hadas
     [not found]                                                             ` <12ee28bb-b838-ed4c-5f84-0cb8f1760d63-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-06-21 13:56                                                               ` Sagi Grimberg
2016-06-21 13:56                                                                 ` Sagi Grimberg
     [not found]                                                                 ` <5769479C.8070605-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-06-21 14:35                                                                   ` Laurence Oberman
2016-06-21 14:35                                                                     ` Laurence Oberman
2016-06-19  9:58                                   ` Sagi Grimberg [this message]
2016-06-19  9:58                                     ` Sagi Grimberg
2016-06-19  9:48                           ` Sagi Grimberg
2016-06-19  9:48                             ` Sagi Grimberg
2016-06-17  9:05                       ` Leon Romanovsky
2016-06-17  9:05                         ` Leon Romanovsky
     [not found]                         ` <20160617090541.GY5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-19  7:05                           ` Sagi Grimberg
2016-06-19  7:05                             ` Sagi Grimberg
2016-06-15  3:15   ` [PATCH v2 02/24] xprtrdma: Remove FMRs from the unmap list after unmapping Chuck Lever
2016-06-15  3:15     ` Chuck Lever
2016-06-15  3:15   ` [PATCH v2 03/24] xprtrdma: Create common scatterlist fields in rpcrdma_mw Chuck Lever
2016-06-15  3:15     ` Chuck Lever
2016-06-15  3:15   ` [PATCH v2 04/24] xprtrdma: Move init and release helpers Chuck Lever
2016-06-15  3:15     ` Chuck Lever
2016-06-15  3:15   ` [PATCH v2 05/24] xprtrdma: Rename fields in rpcrdma_fmr Chuck Lever
2016-06-15  3:15     ` Chuck Lever
2016-06-15  3:16   ` [PATCH v2 06/24] xprtrdma: Use scatterlist for DMA mapping and unmapping under FMR Chuck Lever
2016-06-15  3:16     ` Chuck Lever
2016-06-15  3:16   ` [PATCH v2 07/24] xprtrdma: Refactor MR recovery work queues Chuck Lever
2016-06-15  3:16     ` Chuck Lever
2016-06-15  3:16   ` [PATCH v2 08/24] xprtrdma: Do not leak an MW during a DMA map failure Chuck Lever
2016-06-15  3:16     ` Chuck Lever
2016-06-15  3:16   ` [PATCH v2 09/24] xprtrdma: Remove ALLPHYSICAL memory registration mode Chuck Lever
2016-06-15  3:16     ` Chuck Lever
2016-06-15  3:16   ` [PATCH v2 10/24] xprtrdma: Remove rpcrdma_map_one() and friends Chuck Lever
2016-06-15  3:16     ` Chuck Lever
2016-06-15  3:16   ` [PATCH v2 11/24] xprtrdma: Reply buffer exhaustion can be catastrophic Chuck Lever
2016-06-15  3:16     ` Chuck Lever
2016-06-15  3:16   ` [PATCH v2 12/24] xprtrdma: Honor ->send_request API contract Chuck Lever
2016-06-15  3:16     ` Chuck Lever
2016-06-15  3:17   ` [PATCH v2 13/24] xprtrdma: Chunk list encoders must not return zero Chuck Lever
2016-06-15  3:17     ` Chuck Lever
2016-06-15  3:17   ` [PATCH v2 14/24] xprtrdma: Allocate MRs on demand Chuck Lever
2016-06-15  3:17     ` Chuck Lever
2016-06-15  3:17   ` [PATCH v2 15/24] xprtrdma: Release orphaned MRs immediately Chuck Lever
2016-06-15  3:17     ` Chuck Lever
2016-06-15  3:17   ` [PATCH v2 16/24] xprtrdma: Place registered MWs on a per-req list Chuck Lever
2016-06-15  3:17     ` Chuck Lever
2016-06-15  3:17   ` [PATCH v2 17/24] xprtrdma: Chunk list encoders no longer share one rl_segments array Chuck Lever
2016-06-15  3:17     ` Chuck Lever
2016-06-15  3:17   ` [PATCH v2 18/24] xprtrdma: rpcrdma_inline_fixup() overruns the receive page list Chuck Lever
2016-06-15  3:17     ` Chuck Lever
2016-06-15  3:17   ` [PATCH v2 19/24] xprtrdma: Do not update {head, tail}.iov_len in rpcrdma_inline_fixup() Chuck Lever
2016-06-15  3:17     ` Chuck Lever
2016-06-15  3:18   ` [PATCH v2 20/24] xprtrdma: Update only specific fields in private receive buffer Chuck Lever
2016-06-15  3:18     ` Chuck Lever
2016-06-15  3:18   ` [PATCH v2 21/24] xprtrdma: Clean up fixup_copy_count accounting Chuck Lever
2016-06-15  3:18     ` Chuck Lever
2016-06-15  3:18   ` [PATCH v2 22/24] xprtrdma: No direct data placement with krb5i and krb5p Chuck Lever
2016-06-15  3:18     ` Chuck Lever
2016-06-15  3:18   ` [PATCH v2 23/24] svc: Avoid garbage replies when pc_func() returns rpc_drop_reply Chuck Lever
2016-06-15  3:18     ` Chuck Lever
2016-06-15  3:18   ` [PATCH v2 24/24] NFS: Don't drop CB requests with invalid principals Chuck Lever
2016-06-15  3:18     ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57666CE3.3070803@gmail.com \
    --to=sagigrim-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.