linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dynamic-sg patch has broken rdma_rxe
@ 2020-10-13 14:33 Bob Pearson
  2020-10-13 16:34 ` Bob Pearson
  2020-10-14 22:51 ` Jason Gunthorpe
  0 siblings, 2 replies; 20+ messages in thread
From: Bob Pearson @ 2020-10-13 14:33 UTC (permalink / raw)
  To: Jason Gunthorpe, linux-rdma

Jason,

Just pulled for-next and now hit the following warning.
Register user space memory is not longer working.
I am trying to debug this but if you have any idea where to look let me know.

Bob

[  209.562096] WARNING: CPU: 12 PID: 5343 at lib/scatterlist.c:438 __sg_alloc_table_from_pages+0x21/0x440
[  209.562097] Modules linked in: rdma_rxe ip6_udp_tunnel udp_tunnel rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc cmac algif_hash algif_skcipher af_alg bnep binfmt_misc iwlmvm btusb btrtl btbcm input_leds mac80211 btintel bluetooth libarc4 iwlwifi ib_uverbs snd_hda_codec_realtek snd_hda_codec_generic ecdh_generic ib_core cfg80211 igb ledtrig_audio ecc snd_hda_codec_hdmi hid_generic snd_hda_intel snd_intel_dspcfg dca snd_hda_codec snd_hda_core usbhid snd_hwdep nls_iso8859_1 amdgpu mlx5_core edac_mce_amd hid snd_pcm kvm_amd kvm iommu_v2 snd_seq_midi snd_seq_midi_event gpu_sched ttm wmi_bmof mxm_wmi snd_rawmidi drm_kms_helper snd_seq tls crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd_seq_device crypto_simd snd_timer mlxfw cryptd cec
[  209.562135]  glue_helper ccp snd rapl pci_hyperv_intf i2c_algo_bit ahci i2c_piix4 efi_pstore soundcore fb_sys_fops libahci syscopyarea k10temp sysfillrect sysimgblt wmi drm sunrpc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 nvme nvme_core mac_hid
[  209.562149] CPU: 12 PID: 5343 Comm: python3 Not tainted 5.9.0-rc8+ #20
[  209.562150] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO WIFI/X570 AORUS PRO WIFI, BIOS F11 12/06/2019
[  209.562153] RIP: 0010:__sg_alloc_table_from_pages+0x21/0x440
[  209.562155] Code: c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 89 d5 41 54 53 48 83 ec 48 48 89 7d a0 48 8b 55 10 45 85 c9 75 1b <0f> 0b 49 c7 c2 ea ff ff ff 48 83 c4 48 4c 89 d0 5b 41 5c 41 5d 41
[  209.562156] RSP: 0018:ffffb86648b879a8 EFLAGS: 00010206
[  209.562157] RAX: 0000000000001000 RBX: 000000000238c000 RCX: 0000000000000000
[  209.562158] RDX: 0000000000000000 RSI: ffff8ab0c2667000 RDI: ffff8ab0d7b0c1d0
[  209.562159] RBP: ffffb86648b87a18 R08: 0000000000001000 R09: 00000000ffffffff
[  209.562160] R10: 0000000000000022 R11: 0000000000000001 R12: ffff8ab0faace000
[  209.562161] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff8ab0c2667000
[  209.562162] FS:  00007f1846f00740(0000) GS:ffff8ab0feb00000(0000) knlGS:0000000000000000
[  209.562163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  209.562164] CR2: 00000000022c5098 CR3: 000000074ab48000 CR4: 0000000000350ee0
[  209.562165] Call Trace:
[  209.562176]  ib_umem_get+0x1b1/0x390 [ib_uverbs]
[  209.562182]  rxe_mem_init_user+0x48/0x1e0 [rdma_rxe]
[  209.562187]  rxe_reg_user_mr+0x96/0x160 [rdma_rxe]
[  209.562193]  ib_uverbs_reg_mr+0x141/0x270 [ib_uverbs]
[  209.562199]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd2/0x140 [ib_uverbs]
[  209.562202]  ? __check_object_size+0x4d/0x150
[  209.562207]  ib_uverbs_cmd_verbs+0xb37/0xc70 [ib_uverbs]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-13 14:33 dynamic-sg patch has broken rdma_rxe Bob Pearson
@ 2020-10-13 16:34 ` Bob Pearson
  2020-10-14 22:51 ` Jason Gunthorpe
  1 sibling, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2020-10-13 16:34 UTC (permalink / raw)
  To: Jason Gunthorpe, linux-rdma

On 10/13/20 9:33 AM, Bob Pearson wrote:
> Jason,
> 
> Just pulled for-next and now hit the following warning.
> Register user space memory is not longer working.
> I am trying to debug this but if you have any idea where to look let me know.
> 
> Bob
> 
> [  209.562096] WARNING: CPU: 12 PID: 5343 at lib/scatterlist.c:438 __sg_alloc_table_from_pages+0x21/0x440
SNIP
> 

I found it. The rxe driver had set the max_segment_size to UINT_MAX (0xffffffff) which triggered the warning since it has an 'offset into page' in __sg_alloc_table_from_pages. Can you tell me what this parameter is supposed to do and what is a reasonable value. What scares me is that the default used in ib_umem_get is 64K. Does this have anything to do with the largest SGL size or is it something else.

Bob

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-13 14:33 dynamic-sg patch has broken rdma_rxe Bob Pearson
  2020-10-13 16:34 ` Bob Pearson
@ 2020-10-14 22:51 ` Jason Gunthorpe
  2020-10-15  7:44   ` Maor Gottlieb
  1 sibling, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2020-10-14 22:51 UTC (permalink / raw)
  To: Bob Pearson, Maor Gottlieb, Leon Romanovsky; +Cc: linux-rdma

On Tue, Oct 13, 2020 at 09:33:14AM -0500, Bob Pearson wrote:
> Jason,
> 
> Just pulled for-next and now hit the following warning.
> Register user space memory is not longer working.
> I am trying to debug this but if you have any idea where to look let me know.

The offset_in_page is wrong, but it is protecting some other logic..

Maor? Leon? Can you sort it out tomorrow?

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-14 22:51 ` Jason Gunthorpe
@ 2020-10-15  7:44   ` Maor Gottlieb
  2020-10-15 11:23     ` Gal Pressman
  2020-10-15 15:35     ` Bob Pearson
  0 siblings, 2 replies; 20+ messages in thread
From: Maor Gottlieb @ 2020-10-15  7:44 UTC (permalink / raw)
  To: Jason Gunthorpe, Bob Pearson, Leon Romanovsky, Christoph Hellwig
  Cc: linux-rdma


On 10/15/2020 1:51 AM, Jason Gunthorpe wrote:
> On Tue, Oct 13, 2020 at 09:33:14AM -0500, Bob Pearson wrote:
>> Jason,
>>
>> Just pulled for-next and now hit the following warning.
>> Register user space memory is not longer working.
>> I am trying to debug this but if you have any idea where to look let me know.
> The offset_in_page is wrong, but it is protecting some other logic..
>
> Maor? Leon? Can you sort it out tomorrow?

Leon and I investigated it. This check existed before my series to 
protect the alloc_table_from_pages logic. It's still relevant.
This patch that broke it:  54816d3e69d1 ("RDMA: Explicitly pass in the 
dma_device to ib_register_device"), and according to below link it was 
expected.  The safest approach is to set the max_segment_size back the 
2GB in all drivers. What do you think?

https://lore.kernel.org/linux-rdma/20200923072111.GA31828@infradead.org/

>
> Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-15  7:44   ` Maor Gottlieb
@ 2020-10-15 11:23     ` Gal Pressman
  2020-10-15 12:21       ` Maor Gottlieb
  2020-10-15 15:35     ` Bob Pearson
  1 sibling, 1 reply; 20+ messages in thread
From: Gal Pressman @ 2020-10-15 11:23 UTC (permalink / raw)
  To: Maor Gottlieb, Jason Gunthorpe, Bob Pearson, Leon Romanovsky,
	Christoph Hellwig
  Cc: linux-rdma

On 15/10/2020 10:44, Maor Gottlieb wrote:
> 
> On 10/15/2020 1:51 AM, Jason Gunthorpe wrote:
>> On Tue, Oct 13, 2020 at 09:33:14AM -0500, Bob Pearson wrote:
>>> Jason,
>>>
>>> Just pulled for-next and now hit the following warning.
>>> Register user space memory is not longer working.
>>> I am trying to debug this but if you have any idea where to look let me know.
>> The offset_in_page is wrong, but it is protecting some other logic..
>>
>> Maor? Leon? Can you sort it out tomorrow?
> 
> Leon and I investigated it. This check existed before my series to protect the
> alloc_table_from_pages logic. It's still relevant.
> This patch that broke it:  54816d3e69d1 ("RDMA: Explicitly pass in the
> dma_device to ib_register_device"), and according to below link it was
> expected.  The safest approach is to set the max_segment_size back the 2GB in
> all drivers. What do you think?
> 
> https://lore.kernel.org/linux-rdma/20200923072111.GA31828@infradead.org/

FWIW, EFA is broken as well (same call trace) so it's not just software drivers.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-15 11:23     ` Gal Pressman
@ 2020-10-15 12:21       ` Maor Gottlieb
  2020-10-16  0:31         ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Maor Gottlieb @ 2020-10-15 12:21 UTC (permalink / raw)
  To: Gal Pressman, Jason Gunthorpe, Bob Pearson, Leon Romanovsky,
	Christoph Hellwig
  Cc: linux-rdma


On 10/15/2020 2:23 PM, Gal Pressman wrote:
> On 15/10/2020 10:44, Maor Gottlieb wrote:
>> On 10/15/2020 1:51 AM, Jason Gunthorpe wrote:
>>> On Tue, Oct 13, 2020 at 09:33:14AM -0500, Bob Pearson wrote:
>>>> Jason,
>>>>
>>>> Just pulled for-next and now hit the following warning.
>>>> Register user space memory is not longer working.
>>>> I am trying to debug this but if you have any idea where to look let me know.
>>> The offset_in_page is wrong, but it is protecting some other logic..
>>>
>>> Maor? Leon? Can you sort it out tomorrow?
>> Leon and I investigated it. This check existed before my series to protect the
>> alloc_table_from_pages logic. It's still relevant.
>> This patch that broke it:  54816d3e69d1 ("RDMA: Explicitly pass in the
>> dma_device to ib_register_device"), and according to below link it was
>> expected.  The safest approach is to set the max_segment_size back the 2GB in
>> all drivers. What do you think?
>>
>> https://lore.kernel.org/linux-rdma/20200923072111.GA31828@infradead.org/
> FWIW, EFA is broken as well (same call trace) so it's not just software drivers.

This is true to all drivers that call to ib_umem_get and set UINT_MAX  
as max_segment_size.
Jason,  maybe instead of set UINT_MAX as max_segment_size, need to set 
SCATTERLIST_MAX_SEGMENT which does the required alignment.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-15  7:44   ` Maor Gottlieb
  2020-10-15 11:23     ` Gal Pressman
@ 2020-10-15 15:35     ` Bob Pearson
  1 sibling, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2020-10-15 15:35 UTC (permalink / raw)
  To: Maor Gottlieb, Jason Gunthorpe, Leon Romanovsky, Christoph Hellwig
  Cc: linux-rdma

On 10/15/20 2:44 AM, Maor Gottlieb wrote:
> 
> On 10/15/2020 1:51 AM, Jason Gunthorpe wrote:
>> On Tue, Oct 13, 2020 at 09:33:14AM -0500, Bob Pearson wrote:
>>> Jason,
>>>
>>> Just pulled for-next and now hit the following warning.
>>> Register user space memory is not longer working.
>>> I am trying to debug this but if you have any idea where to look let me know.
>> The offset_in_page is wrong, but it is protecting some other logic..
>>
>> Maor? Leon? Can you sort it out tomorrow?
> 
> Leon and I investigated it. This check existed before my series to protect the alloc_table_from_pages logic. It's still relevant.
> This patch that broke it:  54816d3e69d1 ("RDMA: Explicitly pass in the dma_device to ib_register_device"), and according to below link it was expected.  The safest approach is to set the max_segment_size back the 2GB in all drivers. What do you think?
> 
> https://lore.kernel.org/linux-rdma/20200923072111.GA31828@infradead.org/
> 
>>
>> Jason
That's what I did to fix it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-15 12:21       ` Maor Gottlieb
@ 2020-10-16  0:31         ` Jason Gunthorpe
  2020-10-16  7:13           ` Christoph Hellwig
       [not found]           ` <796ca31aed8f469c957cb850385b9d09@intel.com>
  0 siblings, 2 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2020-10-16  0:31 UTC (permalink / raw)
  To: Maor Gottlieb, Tvrtko Ursulin, Christoph Hellwig
  Cc: Gal Pressman, Bob Pearson, Leon Romanovsky, linux-rdma

On Thu, Oct 15, 2020 at 03:21:34PM +0300, Maor Gottlieb wrote:
> 
> On 10/15/2020 2:23 PM, Gal Pressman wrote:
> > On 15/10/2020 10:44, Maor Gottlieb wrote:
> > > On 10/15/2020 1:51 AM, Jason Gunthorpe wrote:
> > > > On Tue, Oct 13, 2020 at 09:33:14AM -0500, Bob Pearson wrote:
> > > > > Jason,
> > > > > 
> > > > > Just pulled for-next and now hit the following warning.
> > > > > Register user space memory is not longer working.
> > > > > I am trying to debug this but if you have any idea where to look let me know.
> > > > The offset_in_page is wrong, but it is protecting some other logic..
> > > > 
> > > > Maor? Leon? Can you sort it out tomorrow?
> > > Leon and I investigated it. This check existed before my series to protect the
> > > alloc_table_from_pages logic. It's still relevant.
> > > This patch that broke it:  54816d3e69d1 ("RDMA: Explicitly pass in the
> > > dma_device to ib_register_device"), and according to below link it was
> > > expected.  The safest approach is to set the max_segment_size back the 2GB in
> > > all drivers. What do you think?
> > > 
> > > https://lore.kernel.org/linux-rdma/20200923072111.GA31828@infradead.org/
> > FWIW, EFA is broken as well (same call trace) so it's not just software drivers.
> 
> This is true to all drivers that call to ib_umem_get and set UINT_MAX  as
> max_segment_size.
> Jason,  maybe instead of set UINT_MAX as max_segment_size, need to set
> SCATTERLIST_MAX_SEGMENT which does the required alignment.

SCATTERLIST_MAX_SEGMENT is almost never used, however there are lots
of places passing UINT_MAX or similar as the max_segsize for DMA.

The only place that does use it looks goofy to me:

	dma_set_max_seg_size(dev->dev, min_t(unsigned int, U32_MAX & PAGE_MASK,
					     SCATTERLIST_MAX_SEGMENT));

The seg_size should reflect the HW capability, not be mixed in with
knowledge about SGL internals. If the SGL can't build up to the HW
limit then it is fine to internally silently reduce it.

So I think we need to fix the scatterlist code, like below, and
just remove SCATTERLIST_MAX_SEGMENT completely.

It fixes things? Are you OK with this Christoph?

I need to get this fixed for the merge window PR I want to send on
Friday.

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index e102fdfaa75be7..d158033834cdbc 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -435,7 +435,9 @@ struct scatterlist *__sg_alloc_table_from_pages(struct sg_table *sgt,
 	unsigned int added_nents = 0;
 	struct scatterlist *s = prv;
 
-	if (WARN_ON(!max_segment || offset_in_page(max_segment)))
+	/* Avoid overflow when computing sg_len + PAGE_SIZE */
+	max_segment = max_segment & PAGE_MASK;
+	if (WARN_ON(max_segment < PAGE_SIZE))
 		return ERR_PTR(-EINVAL);
 
 	if (IS_ENABLED(CONFIG_ARCH_NO_SG_CHAIN) && prv)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-16  0:31         ` Jason Gunthorpe
@ 2020-10-16  7:13           ` Christoph Hellwig
       [not found]           ` <796ca31aed8f469c957cb850385b9d09@intel.com>
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2020-10-16  7:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Maor Gottlieb, Tvrtko Ursulin, Christoph Hellwig, Gal Pressman,
	Bob Pearson, Leon Romanovsky, linux-rdma

On Thu, Oct 15, 2020 at 09:31:27PM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 15, 2020 at 03:21:34PM +0300, Maor Gottlieb wrote:
> > 
> > On 10/15/2020 2:23 PM, Gal Pressman wrote:
> > > On 15/10/2020 10:44, Maor Gottlieb wrote:
> > > > On 10/15/2020 1:51 AM, Jason Gunthorpe wrote:
> > > > > On Tue, Oct 13, 2020 at 09:33:14AM -0500, Bob Pearson wrote:
> > > > > > Jason,
> > > > > > 
> > > > > > Just pulled for-next and now hit the following warning.
> > > > > > Register user space memory is not longer working.
> > > > > > I am trying to debug this but if you have any idea where to look let me know.
> > > > > The offset_in_page is wrong, but it is protecting some other logic..
> > > > > 
> > > > > Maor? Leon? Can you sort it out tomorrow?
> > > > Leon and I investigated it. This check existed before my series to protect the
> > > > alloc_table_from_pages logic. It's still relevant.
> > > > This patch that broke it:  54816d3e69d1 ("RDMA: Explicitly pass in the
> > > > dma_device to ib_register_device"), and according to below link it was
> > > > expected.  The safest approach is to set the max_segment_size back the 2GB in
> > > > all drivers. What do you think?
> > > > 
> > > > https://lore.kernel.org/linux-rdma/20200923072111.GA31828@infradead.org/
> > > FWIW, EFA is broken as well (same call trace) so it's not just software drivers.
> > 
> > This is true to all drivers that call to ib_umem_get and set UINT_MAX  as
> > max_segment_size.
> > Jason,  maybe instead of set UINT_MAX as max_segment_size, need to set
> > SCATTERLIST_MAX_SEGMENT which does the required alignment.
> 
> SCATTERLIST_MAX_SEGMENT is almost never used, however there are lots
> of places passing UINT_MAX or similar as the max_segsize for DMA.
> 
> The only place that does use it looks goofy to me:
> 
> 	dma_set_max_seg_size(dev->dev, min_t(unsigned int, U32_MAX & PAGE_MASK,
> 					     SCATTERLIST_MAX_SEGMENT));
> 
> The seg_size should reflect the HW capability, not be mixed in with
> knowledge about SGL internals. If the SGL can't build up to the HW
> limit then it is fine to internally silently reduce it.
> 
> So I think we need to fix the scatterlist code, like below, and
> just remove SCATTERLIST_MAX_SEGMENT completely.
> 
> It fixes things? Are you OK with this Christoph?
> 
> I need to get this fixed for the merge window PR I want to send on
> Friday.

This looks ok.  Not critical, but I think we should follow up and kill
off the confusing SCATTERLIST_MAX_SEGMENT as well soon

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
       [not found]           ` <796ca31aed8f469c957cb850385b9d09@intel.com>
@ 2020-10-16 11:58             ` Jason Gunthorpe
  2020-10-19  9:50               ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2020-10-16 11:58 UTC (permalink / raw)
  To: Ursulin, Tvrtko, tvrtko.ursulin
  Cc: Maor Gottlieb, Christoph Hellwig, Gal Pressman, Bob Pearson,
	Leon Romanovsky, linux-rdma

On Fri, Oct 16, 2020 at 08:29:11AM +0000, Ursulin, Tvrtko wrote:
> 
> Hi guys,
> 
> [I removed the mailing list from cc since from this email address) I
> can't reply properly inline. (tvrtko.ursulin@linux.intel.com works
> better.)]

I put it back
 
> However:
> 
> +	/* Avoid overflow when computing sg_len + PAGE_SIZE */
> +	max_segment = max_segment & PAGE_MASK;
> +	if (WARN_ON(max_segment < PAGE_SIZE))
>  		return ERR_PTR(-EINVAL);
> 
> Maybe it's too early for me but I don't get this. It appears the
> condition can only be true if the max_segment is smaller than page
> size as passed in to the function to _start with_. Don't see what
> does filtering out low bits achieves on top.

The entire problem is the algorithm in __sg_alloc_table_from_pages()
only limits sg_len to

   sg_len == N * PAGE_SIZE <= ALIGN_UP(max_segment, PAGE_SIZE);

ie it overshoots max_segment if it is unaligned.

It also badly malfunctions if the ALIGN_UP() overflows, eg for
ALIGN_UP(UINT_MAX).

This is all internal problems inside __sg_alloc_table_from_pages() and
has nothing to do with the scatter lists themselves.

Adding an ALIGN_DOWN guarentees this algorithm produces sg_len <=
max_segment in all cases.

> If the intent is to allow unaligned max_segment then also please
> change kerneldoc.

Sure
 
> Although TBH I don't get how unaligned max segment makes sense. List
> can end on an unaligned segment but surely shouldn't have then in
> the middle.

The max_segment should either be UINT_MAX because the caller doesn't
care, or come from the DMA max_segment_size which is a HW limitation
usually derived from the # of bits available to express a length.

Conflating the HW limitation with the system PAGE_SIZE is
nonsense. This is further confused because the only reason we have an
alignment restriction is due to this algorithm design, the SGL rules
don't prevent the use of unaligned lengths, or length smaller than
PAGE_SIZE, even in the interior.

Jason

From b03302028893ce7465ba7e8736abba1922469bc1 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@nvidia.com>
Date: Fri, 16 Oct 2020 08:46:01 -0300
Subject: [PATCH] lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED
 values

The main intention of the max_segment argument to
__sg_alloc_table_from_pages() is to match the DMA layer segment size set
by dma_set_max_seg_size().

Restricting the input to be page aligned makes it impossible to just
connect the DMA layer to this API.

The only reason for a page alignment here is because the algorithm will
overshoot the max_segment if it is not a multiple of PAGE_SIZE. Simply fix
the alignment before starting and don't expose this implementation detail
to the callers.

A future patch will completely remove SCATTERLIST_MAX_SEGMENT.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 lib/scatterlist.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index e102fdfaa75be7..ed2497c79a216b 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -404,7 +404,7 @@ static struct scatterlist *get_next_sg(struct sg_table *table,
  * @n_pages:	 Number of pages in the pages array
  * @offset:      Offset from start of the first page to the start of a buffer
  * @size:        Number of valid bytes in the buffer (after offset)
- * @max_segment: Maximum size of a scatterlist node in bytes (page aligned)
+ * @max_segment: Maximum size of a scatterlist element in bytes
  * @prv:	 Last populated sge in sgt
  * @left_pages:  Left pages caller have to set after this call
  * @gfp_mask:	 GFP allocation mask
@@ -435,7 +435,12 @@ struct scatterlist *__sg_alloc_table_from_pages(struct sg_table *sgt,
 	unsigned int added_nents = 0;
 	struct scatterlist *s = prv;
 
-	if (WARN_ON(!max_segment || offset_in_page(max_segment)))
+	/*
+	 * The algorithm below requires max_segment to be aligned to PAGE_SIZE
+	 * otherwise it can overshoot.
+	 */
+	max_segment = ALIGN_DOWN(max_segment, PAGE_SIZE);
+	if (WARN_ON(max_segment < PAGE_SIZE))
 		return ERR_PTR(-EINVAL);
 
 	if (IS_ENABLED(CONFIG_ARCH_NO_SG_CHAIN) && prv)
@@ -542,8 +547,7 @@ int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
 			      unsigned long size, gfp_t gfp_mask)
 {
 	return PTR_ERR_OR_ZERO(__sg_alloc_table_from_pages(sgt, pages, n_pages,
-			offset, size, SCATTERLIST_MAX_SEGMENT,
-			NULL, 0, gfp_mask));
+			offset, size, UINT_MAX, NULL, 0, gfp_mask));
 }
 EXPORT_SYMBOL(sg_alloc_table_from_pages);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-16 11:58             ` Jason Gunthorpe
@ 2020-10-19  9:50               ` Tvrtko Ursulin
  2020-10-19 12:12                 ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2020-10-19  9:50 UTC (permalink / raw)
  To: Jason Gunthorpe, Ursulin, Tvrtko
  Cc: Maor Gottlieb, Christoph Hellwig, Gal Pressman, Bob Pearson,
	Leon Romanovsky, linux-rdma


On 16/10/2020 12:58, Jason Gunthorpe wrote:
> On Fri, Oct 16, 2020 at 08:29:11AM +0000, Ursulin, Tvrtko wrote:
>>
>> Hi guys,
>>
>> [I removed the mailing list from cc since from this email address) I
>> can't reply properly inline. (tvrtko.ursulin@linux.intel.com works
>> better.)]
> 
> I put it back
>   
>> However:
>>
>> +	/* Avoid overflow when computing sg_len + PAGE_SIZE */
>> +	max_segment = max_segment & PAGE_MASK;
>> +	if (WARN_ON(max_segment < PAGE_SIZE))
>>   		return ERR_PTR(-EINVAL);
>>
>> Maybe it's too early for me but I don't get this. It appears the
>> condition can only be true if the max_segment is smaller than page
>> size as passed in to the function to _start with_. Don't see what
>> does filtering out low bits achieves on top.
> 
> The entire problem is the algorithm in __sg_alloc_table_from_pages()
> only limits sg_len to
> 
>     sg_len == N * PAGE_SIZE <= ALIGN_UP(max_segment, PAGE_SIZE);
> 
> ie it overshoots max_segment if it is unaligned.
> 
> It also badly malfunctions if the ALIGN_UP() overflows, eg for
> ALIGN_UP(UINT_MAX).
> 
> This is all internal problems inside __sg_alloc_table_from_pages() and
> has nothing to do with the scatter lists themselves.
> 
> Adding an ALIGN_DOWN guarentees this algorithm produces sg_len <=
> max_segment in all cases.

Right, I can follow the story now that ALIGN_DOWN is in the picture.

>> If the intent is to allow unaligned max_segment then also please
>> change kerneldoc.
> 
> Sure
>   
>> Although TBH I don't get how unaligned max segment makes sense. List
>> can end on an unaligned segment but surely shouldn't have then in
>> the middle.
> 
> The max_segment should either be UINT_MAX because the caller doesn't
> care, or come from the DMA max_segment_size which is a HW limitation
> usually derived from the # of bits available to express a length.
> 
> Conflating the HW limitation with the system PAGE_SIZE is
> nonsense. This is further confused because the only reason we have an
> alignment restriction is due to this algorithm design, the SGL rules
> don't prevent the use of unaligned lengths, or length smaller than
> PAGE_SIZE, even in the interior.
> 
> Jason
> 
>>From b03302028893ce7465ba7e8736abba1922469bc1 Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe <jgg@nvidia.com>
> Date: Fri, 16 Oct 2020 08:46:01 -0300
> Subject: [PATCH] lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED
>   values
> 
> The main intention of the max_segment argument to
> __sg_alloc_table_from_pages() is to match the DMA layer segment size set
> by dma_set_max_seg_size().
> 
> Restricting the input to be page aligned makes it impossible to just
> connect the DMA layer to this API.
> 
> The only reason for a page alignment here is because the algorithm will
> overshoot the max_segment if it is not a multiple of PAGE_SIZE. Simply fix
> the alignment before starting and don't expose this implementation detail
> to the callers.

What does not make complete sense to me is the statement that input 
alignment requirement makes it impossible to connect to DMA layer, but 
then the patch goes to align behind the covers anyway.

At minimum the kerneldoc should explain that max_segment will still be 
rounded down. But wouldn't it be better for the API to be more explicit 
and just require aligned anyway?

I mean I have no idea what is the problem for connecting to the DMA 
layer. Is it a matter of too many call sites which would need to align? 
Or there is more to it?

> A future patch will completely remove SCATTERLIST_MAX_SEGMENT.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   lib/scatterlist.c | 12 ++++++++----
>   1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> index e102fdfaa75be7..ed2497c79a216b 100644
> --- a/lib/scatterlist.c
> +++ b/lib/scatterlist.c
> @@ -404,7 +404,7 @@ static struct scatterlist *get_next_sg(struct sg_table *table,
>    * @n_pages:	 Number of pages in the pages array
>    * @offset:      Offset from start of the first page to the start of a buffer
>    * @size:        Number of valid bytes in the buffer (after offset)
> - * @max_segment: Maximum size of a scatterlist node in bytes (page aligned)
> + * @max_segment: Maximum size of a scatterlist element in bytes
>    * @prv:	 Last populated sge in sgt
>    * @left_pages:  Left pages caller have to set after this call
>    * @gfp_mask:	 GFP allocation mask
> @@ -435,7 +435,12 @@ struct scatterlist *__sg_alloc_table_from_pages(struct sg_table *sgt,
>   	unsigned int added_nents = 0;
>   	struct scatterlist *s = prv;
>   
> -	if (WARN_ON(!max_segment || offset_in_page(max_segment)))
> +	/*
> +	 * The algorithm below requires max_segment to be aligned to PAGE_SIZE
> +	 * otherwise it can overshoot.
> +	 */
> +	max_segment = ALIGN_DOWN(max_segment, PAGE_SIZE);
> +	if (WARN_ON(max_segment < PAGE_SIZE))

Equivalent to !max_segment or max_segment == 0 now.

And it's a bit weird API - "you can pass in any unaligned size apart 
from unaligned sizes less than a PAGE_SIZE". Makes me think more that 
explicit requirement to pass in page aligned was better.

>   		return ERR_PTR(-EINVAL);
>   
>   	if (IS_ENABLED(CONFIG_ARCH_NO_SG_CHAIN) && prv)
> @@ -542,8 +547,7 @@ int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
>   			      unsigned long size, gfp_t gfp_mask)
>   {
>   	return PTR_ERR_OR_ZERO(__sg_alloc_table_from_pages(sgt, pages, n_pages,
> -			offset, size, SCATTERLIST_MAX_SEGMENT,
> -			NULL, 0, gfp_mask));
> +			offset, size, UINT_MAX, NULL, 0, gfp_mask));
>   }
>   EXPORT_SYMBOL(sg_alloc_table_from_pages);
>   
> 

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-19  9:50               ` Tvrtko Ursulin
@ 2020-10-19 12:12                 ` Jason Gunthorpe
  2020-10-19 12:29                   ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2020-10-19 12:12 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Ursulin, Tvrtko, Maor Gottlieb, Christoph Hellwig, Gal Pressman,
	Bob Pearson, Leon Romanovsky, linux-rdma

On Mon, Oct 19, 2020 at 10:50:14AM +0100, Tvrtko Ursulin wrote:
> > overshoot the max_segment if it is not a multiple of PAGE_SIZE. Simply fix
> > the alignment before starting and don't expose this implementation detail
> > to the callers.
> 
> What does not make complete sense to me is the statement that input
> alignment requirement makes it impossible to connect to DMA layer, but then
> the patch goes to align behind the covers anyway.
> 
> At minimum the kerneldoc should explain that max_segment will still be
> rounded down. But wouldn't it be better for the API to be more explicit and
> just require aligned anyway?

Why?

The API is to not produce sge's with a length longer than max_segment,
it isn't to produce sge's of exactly max_segment.

Everything else is an internal detail

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-19 12:12                 ` Jason Gunthorpe
@ 2020-10-19 12:29                   ` Tvrtko Ursulin
  2020-10-19 12:48                     ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2020-10-19 12:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ursulin, Tvrtko, Maor Gottlieb, Christoph Hellwig, Gal Pressman,
	Bob Pearson, Leon Romanovsky, linux-rdma


On 19/10/2020 13:12, Jason Gunthorpe wrote:
> On Mon, Oct 19, 2020 at 10:50:14AM +0100, Tvrtko Ursulin wrote:
>>> overshoot the max_segment if it is not a multiple of PAGE_SIZE. Simply fix
>>> the alignment before starting and don't expose this implementation detail
>>> to the callers.
>>
>> What does not make complete sense to me is the statement that input
>> alignment requirement makes it impossible to connect to DMA layer, but then
>> the patch goes to align behind the covers anyway.
>>
>> At minimum the kerneldoc should explain that max_segment will still be
>> rounded down. But wouldn't it be better for the API to be more explicit and
>> just require aligned anyway?
> 
> Why?
> 
> The API is to not produce sge's with a length longer than max_segment,
> it isn't to produce sge's of exactly max_segment.
> 
> Everything else is an internal detail

(Half-)pretending it is disconnected from PAGE_SIZE, when it really 
isn't, isn't the most obvious API design in my view.

In other words, if you let users pass in 4097 and it just works by 
rounding down to 4096, but you don't let them pass 4095, I just find 
this odd.

My question was why not have callers pass in page aligned max segment 
like today which makes it completely transparent.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-19 12:29                   ` Tvrtko Ursulin
@ 2020-10-19 12:48                     ` Jason Gunthorpe
  2020-10-20 11:37                       ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2020-10-19 12:48 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Ursulin, Tvrtko, Maor Gottlieb, Christoph Hellwig, Gal Pressman,
	Bob Pearson, Leon Romanovsky, linux-rdma

On Mon, Oct 19, 2020 at 01:29:42PM +0100, Tvrtko Ursulin wrote:
> 
> On 19/10/2020 13:12, Jason Gunthorpe wrote:
> > On Mon, Oct 19, 2020 at 10:50:14AM +0100, Tvrtko Ursulin wrote:
> > > > overshoot the max_segment if it is not a multiple of PAGE_SIZE. Simply fix
> > > > the alignment before starting and don't expose this implementation detail
> > > > to the callers.
> > > 
> > > What does not make complete sense to me is the statement that input
> > > alignment requirement makes it impossible to connect to DMA layer, but then
> > > the patch goes to align behind the covers anyway.
> > > 
> > > At minimum the kerneldoc should explain that max_segment will still be
> > > rounded down. But wouldn't it be better for the API to be more explicit and
> > > just require aligned anyway?
> > 
> > Why?
> > 
> > The API is to not produce sge's with a length longer than max_segment,
> > it isn't to produce sge's of exactly max_segment.
> > 
> > Everything else is an internal detail
> 
> (Half-)pretending it is disconnected from PAGE_SIZE, when it really isn't,
> isn't the most obvious API design in my view.

It is not information the callers need to care about

> In other words, if you let users pass in 4097 and it just works by rounding
> down to 4096, but you don't let them pass 4095, I just find this odd.

That is also an implementation detail, there is nothing preventing
smaller than page size other than complexity of implementing the
algorithm. If something ever needed it, it could be done.
 
> My question was why not have callers pass in page aligned max segment like
> today which makes it completely transparent.

Why put this confusing code in every caller? Especially for something
a driver is supposed to call. Will just make bugs

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-19 12:48                     ` Jason Gunthorpe
@ 2020-10-20 11:37                       ` Tvrtko Ursulin
  2020-10-20 11:47                         ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2020-10-20 11:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Maor Gottlieb, Christoph Hellwig, Gal Pressman, Bob Pearson,
	Leon Romanovsky, linux-rdma


On 19/10/2020 13:48, Jason Gunthorpe wrote:
> On Mon, Oct 19, 2020 at 01:29:42PM +0100, Tvrtko Ursulin wrote:
>> On 19/10/2020 13:12, Jason Gunthorpe wrote:
>>> On Mon, Oct 19, 2020 at 10:50:14AM +0100, Tvrtko Ursulin wrote:
>>>>> overshoot the max_segment if it is not a multiple of PAGE_SIZE. Simply fix
>>>>> the alignment before starting and don't expose this implementation detail
>>>>> to the callers.
>>>>
>>>> What does not make complete sense to me is the statement that input
>>>> alignment requirement makes it impossible to connect to DMA layer, but then
>>>> the patch goes to align behind the covers anyway.
>>>>
>>>> At minimum the kerneldoc should explain that max_segment will still be
>>>> rounded down. But wouldn't it be better for the API to be more explicit and
>>>> just require aligned anyway?
>>>
>>> Why?
>>>
>>> The API is to not produce sge's with a length longer than max_segment,
>>> it isn't to produce sge's of exactly max_segment.
>>>
>>> Everything else is an internal detail
>>
>> (Half-)pretending it is disconnected from PAGE_SIZE, when it really isn't,
>> isn't the most obvious API design in my view.
> 
> It is not information the callers need to care about

Unless they want a 1024 byte segment - then they do need to care. The 
patch just changes the rules from "any non-zero aligned" to "any greater 
or equal than PAGE_SIZE". I am simply asking why in more specific terms.

>> In other words, if you let users pass in 4097 and it just works by rounding
>> down to 4096, but you don't let them pass 4095, I just find this odd.
> 
> That is also an implementation detail, there is nothing preventing
> smaller than page size other than complexity of implementing the
> algorithm. If something ever needed it, it could be done.

But does non aligned segment size make sense? Is there any current or 
future platform which will need to to warrant changing the API in this way?

>> My question was why not have callers pass in page aligned max segment like
>> today which makes it completely transparent.
> 
> Why put this confusing code in every caller? Especially for something
> a driver is supposed to call. Will just make bugs

For max_segment to be aligned is a requirement today so callers are 
ready. I still don't understand why change that, what will it help with? 
Is it just about the desire to be able to pass in UINT_MAX?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-20 11:37                       ` Tvrtko Ursulin
@ 2020-10-20 11:47                         ` Jason Gunthorpe
  2020-10-20 12:31                           ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2020-10-20 11:47 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Maor Gottlieb, Christoph Hellwig, Gal Pressman, Bob Pearson,
	Leon Romanovsky, linux-rdma

On Tue, Oct 20, 2020 at 12:37:05PM +0100, Tvrtko Ursulin wrote:

> > Why put this confusing code in every caller? Especially for something
> > a driver is supposed to call. Will just make bugs
> 
> For max_segment to be aligned is a requirement today so callers are
> ready.

No, it turns out all the RDMA drivers were became broken when they
converted to use the proper U32_MAX for their DMA max_segment size,
then they couldn't form SGLs anymore.

I don't want to see nonsense code like this:

        dma_set_max_seg_size(dev->dev, min_t(unsigned int, U32_MAX & PAGE_MASK,
                                             SCATTERLIST_MAX_SEGMENT));

In drivers.

dma_set_max_seg_size is the *hardware* capability, and mixing in
things like PAG_MASK here is just nonsense.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-20 11:47                         ` Jason Gunthorpe
@ 2020-10-20 12:31                           ` Tvrtko Ursulin
  2020-10-20 12:56                             ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2020-10-20 12:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Maor Gottlieb, Christoph Hellwig, Gal Pressman, Bob Pearson,
	Leon Romanovsky, linux-rdma


On 20/10/2020 12:47, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 12:37:05PM +0100, Tvrtko Ursulin wrote:
> 
>>> Why put this confusing code in every caller? Especially for something
>>> a driver is supposed to call. Will just make bugs
>>
>> For max_segment to be aligned is a requirement today so callers are
>> ready.
> 
> No, it turns out all the RDMA drivers were became broken when they
> converted to use the proper U32_MAX for their DMA max_segment size,
> then they couldn't form SGLs anymore.
> 
> I don't want to see nonsense code like this:
> 
>          dma_set_max_seg_size(dev->dev, min_t(unsigned int, U32_MAX & PAGE_MASK,
>                                               SCATTERLIST_MAX_SEGMENT));
> 
> In drivers.
> 
> dma_set_max_seg_size is the *hardware* capability, and mixing in
> things like PAG_MASK here is just nonsense.

Code was obviously a no-op non-sense.

So the crux of the argument is that U32_MAX is conceptually the right 
thing which defines the DMA max_segment size? Not some DMA define or 
anything, but really U32_MAX? And that all/some DMA hardware does not 
think in pages but really in bytes?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-20 12:31                           ` Tvrtko Ursulin
@ 2020-10-20 12:56                             ` Jason Gunthorpe
  2020-10-20 13:09                               ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2020-10-20 12:56 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Maor Gottlieb, Christoph Hellwig, Gal Pressman, Bob Pearson,
	Leon Romanovsky, linux-rdma

On Tue, Oct 20, 2020 at 01:31:27PM +0100, Tvrtko Ursulin wrote:
> 
> On 20/10/2020 12:47, Jason Gunthorpe wrote:
> > On Tue, Oct 20, 2020 at 12:37:05PM +0100, Tvrtko Ursulin wrote:
> > 
> > > > Why put this confusing code in every caller? Especially for something
> > > > a driver is supposed to call. Will just make bugs
> > > 
> > > For max_segment to be aligned is a requirement today so callers are
> > > ready.
> > 
> > No, it turns out all the RDMA drivers were became broken when they
> > converted to use the proper U32_MAX for their DMA max_segment size,
> > then they couldn't form SGLs anymore.
> > 
> > I don't want to see nonsense code like this:
> > 
> >          dma_set_max_seg_size(dev->dev, min_t(unsigned int, U32_MAX & PAGE_MASK,
> >                                               SCATTERLIST_MAX_SEGMENT));
> > 
> > In drivers.
> > 
> > dma_set_max_seg_size is the *hardware* capability, and mixing in
> > things like PAG_MASK here is just nonsense.
> 
> Code was obviously a no-op non-sense.
> 
> So the crux of the argument is that U32_MAX is conceptually the right thing
> which defines the DMA max_segment size? Not some DMA define or anything, but
> really U32_MAX? And that all/some DMA hardware does not think in pages but
> really in bytes?

Yes.

The HW has 32 bits for a length field, so U32_MAX accurately defines
the HW DMA capability

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-20 12:56                             ` Jason Gunthorpe
@ 2020-10-20 13:09                               ` Tvrtko Ursulin
  2020-10-20 13:32                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2020-10-20 13:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Maor Gottlieb, Christoph Hellwig, Gal Pressman, Bob Pearson,
	Leon Romanovsky, linux-rdma


On 20/10/2020 13:56, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 01:31:27PM +0100, Tvrtko Ursulin wrote:
>>
>> On 20/10/2020 12:47, Jason Gunthorpe wrote:
>>> On Tue, Oct 20, 2020 at 12:37:05PM +0100, Tvrtko Ursulin wrote:
>>>
>>>>> Why put this confusing code in every caller? Especially for something
>>>>> a driver is supposed to call. Will just make bugs
>>>>
>>>> For max_segment to be aligned is a requirement today so callers are
>>>> ready.
>>>
>>> No, it turns out all the RDMA drivers were became broken when they
>>> converted to use the proper U32_MAX for their DMA max_segment size,
>>> then they couldn't form SGLs anymore.
>>>
>>> I don't want to see nonsense code like this:
>>>
>>>           dma_set_max_seg_size(dev->dev, min_t(unsigned int, U32_MAX & PAGE_MASK,
>>>                                                SCATTERLIST_MAX_SEGMENT));
>>>
>>> In drivers.
>>>
>>> dma_set_max_seg_size is the *hardware* capability, and mixing in
>>> things like PAG_MASK here is just nonsense.
>>
>> Code was obviously a no-op non-sense.
>>
>> So the crux of the argument is that U32_MAX is conceptually the right thing
>> which defines the DMA max_segment size? Not some DMA define or anything, but
>> really U32_MAX? And that all/some DMA hardware does not think in pages but
>> really in bytes?
> 
> Yes.
> 
> The HW has 32 bits for a length field, so U32_MAX accurately defines
> the HW DMA capability

Not just the max, but the granularity as well. If byte granularity (well 
less than PAGE_SIZE/4k) exists in some hw then I agree the API change 
makes sense.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dynamic-sg patch has broken rdma_rxe
  2020-10-20 13:09                               ` Tvrtko Ursulin
@ 2020-10-20 13:32                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2020-10-20 13:32 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Maor Gottlieb, Christoph Hellwig, Gal Pressman, Bob Pearson,
	Leon Romanovsky, linux-rdma

On Tue, Oct 20, 2020 at 02:09:23PM +0100, Tvrtko Ursulin wrote:

> Not just the max, but the granularity as well. If byte granularity (well
> less than PAGE_SIZE/4k) exists in some hw then I agree the API change makes
> sense.

scatter/gather lists are byte granular in HW, this is basically the
norm.

Page lists are something a little different.

At least in RDMA we use scatterlist for both types of objects, but
their treatement is quite different. For page lists the
max_segment_size is just something that gets in the way, we really
want the SGLs to be as large, and as highly aligned as possible.

Fewer SGE's means less work and less memory everywhere that processes
them.

When it comes times to progam the HW the SGL is analyzed and the HW
specific page size selected, then the SGL is broken up into a page
list. Each SGE may be split into many HW DMA blocks.

Again, this has nothing to do with PAGE_SIZE, the HW DMA block size
selection is HW specific and handled when splitting the SGL.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-10-20 13:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-13 14:33 dynamic-sg patch has broken rdma_rxe Bob Pearson
2020-10-13 16:34 ` Bob Pearson
2020-10-14 22:51 ` Jason Gunthorpe
2020-10-15  7:44   ` Maor Gottlieb
2020-10-15 11:23     ` Gal Pressman
2020-10-15 12:21       ` Maor Gottlieb
2020-10-16  0:31         ` Jason Gunthorpe
2020-10-16  7:13           ` Christoph Hellwig
     [not found]           ` <796ca31aed8f469c957cb850385b9d09@intel.com>
2020-10-16 11:58             ` Jason Gunthorpe
2020-10-19  9:50               ` Tvrtko Ursulin
2020-10-19 12:12                 ` Jason Gunthorpe
2020-10-19 12:29                   ` Tvrtko Ursulin
2020-10-19 12:48                     ` Jason Gunthorpe
2020-10-20 11:37                       ` Tvrtko Ursulin
2020-10-20 11:47                         ` Jason Gunthorpe
2020-10-20 12:31                           ` Tvrtko Ursulin
2020-10-20 12:56                             ` Jason Gunthorpe
2020-10-20 13:09                               ` Tvrtko Ursulin
2020-10-20 13:32                                 ` Jason Gunthorpe
2020-10-15 15:35     ` Bob Pearson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).