All of lore.kernel.org
 help / color / mirror / Atom feed
* ib_umem_get and DMA_API_DEBUG question
@ 2019-08-26 14:05 Gal Pressman
  2019-08-26 14:23 ` Leon Romanovsky
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Gal Pressman @ 2019-08-26 14:05 UTC (permalink / raw)
  To: RDMA mailing list

Hi all,

Lately I've been seeing DMA-API call traces on our automated testing runs which
complain about overlapping mappings of the same cacheline [1].
The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
same address, which as a result DMA maps the same physical addresses more than 7
(ACTIVE_CACHELINE_MAX_OVERLAP) times.

Is this considered a bad behavior by the test? Should this be caught by
ib_core/driver somehow?

Thanks,
Gal

[1]
------------[ cut here ]------------
DMA-API: exceeded 7 overlapping mappings of cacheline 0x000000004a0ad6c0
WARNING: CPU: 56 PID: 63572 at kernel/dma/debug.c:501 add_dma_entry+0x1fd/0x230
Modules linked in: sunrpc dm_mirror dm_region_hash dm_log dm_mod efa ib_uverbs
ib_core crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd
cryptd glue_helper button pcspkr evdev ip_tables x_tables xfs libcrc32c nvme
crc32c_intel nvme_core ena ipv6 crc_ccitt autofs4
CPU: 56 PID: 63572 Comm: fi_multi_res Not tainted 5.2.0-g27b7fb1ab-dirty #1
Hardware name: Amazon EC2 c5n.18xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:add_dma_entry+0x1fd/0x230
Code: a7 03 02 80 fb 01 77 44 83 e3 01 75 bb 48 8d 54 24 20 be 07 00 00 00 48 c7
c7 c0 89 29 82 c6 05 00 a7 03 02 01 e8 53 10 f0 ff <0f> 0b eb 9a e8 9a 13 f0 ff
48 63 f0 ba 01 00 00 00 48 c7 c7 00 bf
RSP: 0018:ffff8892c33a7388 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81306f9e
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88ac6a630650
RBP: 000000004a0ad6c0 R08: ffffed158d4c60cb R09: ffffed158d4c60cb
R10: 0000000000000001 R11: ffffed158d4c60ca R12: 0000000000000206
R13: 1ffff11258674e71 R14: ffff88ac60c77a80 R15: 0000000000000202
FS:  00007fec5e4f5740(0000) GS:ffff88ac6a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000263c000 CR3: 0000001425a14006 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 ? dma_debug_init+0x2b0/0x2b0
 ? lockdep_hardirqs_on+0x1b1/0x2d0
 debug_dma_map_sg+0x7a/0x4b0
 ib_umem_get+0x831/0xca0 [ib_uverbs]
 ? __kasan_kmalloc.constprop.6+0xa0/0xd0
 efa_reg_mr+0x26f/0x1920 [efa]
 ? check_chain_key+0x147/0x200
 ? check_flags.part.32+0x240/0x240
 ? efa_create_cq+0x910/0x910 [efa]
 ? lookup_get_idr_uobject.part.7+0x18d/0x290 [ib_uverbs]
 ? match_held_lock+0x1b/0x240
 ? alloc_commit_idr_uobject+0x50/0x50 [ib_uverbs]
 ? _raw_spin_unlock+0x24/0x30
 ? alloc_begin_idr_uobject+0x62/0x90 [ib_uverbs]
 ib_uverbs_reg_mr+0x20e/0x440 [ib_uverbs]
 ? ib_uverbs_ex_create_wq+0x620/0x620 [ib_uverbs]
 ? match_held_lock+0x1b/0x240
 ? match_held_lock+0x1b/0x240
 ? check_chain_key+0x147/0x200
 ? uverbs_fill_udata+0x12f/0x360 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x197/0x1f0 [ib_uverbs]
 ? uverbs_disassociate_api+0x220/0x220 [ib_uverbs]
 ? __bitmap_subset+0xd2/0x120
 ? uverbs_fill_udata+0x2ab/0x360 [ib_uverbs]
 ib_uverbs_cmd_verbs+0xb61/0x1410 [ib_uverbs]
 ? uverbs_disassociate_api+0x220/0x220 [ib_uverbs]
 ? mark_lock+0xcf/0x9a0
 ? uverbs_fill_udata+0x360/0x360 [ib_uverbs]
 ? match_held_lock+0x1b/0x240
 ? lock_acquire+0xdb/0x220
 ? lock_acquire+0xdb/0x220
 ? ib_uverbs_ioctl+0xf2/0x1f0 [ib_uverbs]
 ib_uverbs_ioctl+0x14a/0x1f0 [ib_uverbs]
 ? ib_uverbs_ioctl+0xf2/0x1f0 [ib_uverbs]
 ? ib_uverbs_cmd_verbs+0x1410/0x1410 [ib_uverbs]
 ? match_held_lock+0x1b/0x240
 ? check_chain_key+0x147/0x200
 do_vfs_ioctl+0x131/0x990
 ? ioctl_preallocate+0x170/0x170
 ? syscall_trace_enter+0x2fb/0x5a0
 ? mark_held_locks+0x1c/0xa0
 ? ktime_get_coarse_real_ts64+0x7b/0x120
 ? lockdep_hardirqs_on+0x1b1/0x2d0
 ? ktime_get_coarse_real_ts64+0xc0/0x120
 ? syscall_trace_enter+0x184/0x5a0
 ? trace_event_raw_event_sys_enter+0x2b0/0x2b0
 ? rcu_read_lock_sched_held+0x8f/0xa0
 ? kfree+0x24a/0x2c0
 ksys_ioctl+0x70/0x80
 ? mark_held_locks+0x1c/0xa0
 __x64_sys_ioctl+0x3d/0x50
 do_syscall_64+0x68/0x280
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fec5cc4b1e7
Code: b3 66 90 48 8b 05 99 3c 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3
66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3
48 8b 0d 69 3c 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffeba220f48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffeba220f90 RCX: 00007fec5cc4b1e7
RDX: 00007ffeba220fb0 RSI: 00000000c0181b01 RDI: 0000000000000003
RBP: 00007ffeba220fc8 R08: 0000000000000003 R09: 0000000001b8be90
R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000000c
R13: 0000000001b8bfd0 R14: 00007ffeba221148 R15: 0000000000000001
irq event stamp: 130178
hardirqs last  enabled at (130177): [<ffffffff81c121d2>]
_raw_spin_unlock_irqrestore+0x32/0x60
hardirqs last disabled at (130178): [<ffffffff81c12780>]
_raw_spin_lock_irqsave+0x20/0x60
softirqs last  enabled at (130124): [<ffffffff81abd353>] tcp_recvmsg+0x693/0x1360
softirqs last disabled at (130122): [<ffffffff8198abfb>] release_sock+0x1b/0xe0
---[ end trace 22c97ff4678ca8c1 ]---

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-26 14:05 ib_umem_get and DMA_API_DEBUG question Gal Pressman
@ 2019-08-26 14:23 ` Leon Romanovsky
  2019-08-26 14:39   ` Gal Pressman
  2019-08-26 14:39 ` Jason Gunthorpe
  2019-08-27  8:28 ` Gal Pressman
  2 siblings, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2019-08-26 14:23 UTC (permalink / raw)
  To: Gal Pressman; +Cc: RDMA mailing list

On Mon, Aug 26, 2019 at 05:05:12PM +0300, Gal Pressman wrote:
> Hi all,
>
> Lately I've been seeing DMA-API call traces on our automated testing runs which
> complain about overlapping mappings of the same cacheline [1].
> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
> same address, which as a result DMA maps the same physical addresses more than 7
> (ACTIVE_CACHELINE_MAX_OVERLAP) times.
>
> Is this considered a bad behavior by the test? Should this be caught by
> ib_core/driver somehow?

If I'm not mistaken here, but we (Mellanox) decided that it is a bug in
DMA debug code.

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-26 14:23 ` Leon Romanovsky
@ 2019-08-26 14:39   ` Gal Pressman
  0 siblings, 0 replies; 12+ messages in thread
From: Gal Pressman @ 2019-08-26 14:39 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: RDMA mailing list

On 26/08/2019 17:23, Leon Romanovsky wrote:
> On Mon, Aug 26, 2019 at 05:05:12PM +0300, Gal Pressman wrote:
>> Hi all,
>>
>> Lately I've been seeing DMA-API call traces on our automated testing runs which
>> complain about overlapping mappings of the same cacheline [1].
>> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
>> same address, which as a result DMA maps the same physical addresses more than 7
>> (ACTIVE_CACHELINE_MAX_OVERLAP) times.
>>
>> Is this considered a bad behavior by the test? Should this be caught by
>> ib_core/driver somehow?
> 
> If I'm not mistaken here, but we (Mellanox) decided that it is a bug in
> DMA debug code.

Thanks a lot Leon, good to know that it's not just an EFA thing.

In case you remember, is it a bug in the sense that the trace is a false alarm
or is it a bug that could cause real issues?
Did you guys by any chance analyze what are the consequences of this?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-26 14:05 ib_umem_get and DMA_API_DEBUG question Gal Pressman
  2019-08-26 14:23 ` Leon Romanovsky
@ 2019-08-26 14:39 ` Jason Gunthorpe
  2019-08-27  8:28 ` Gal Pressman
  2 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2019-08-26 14:39 UTC (permalink / raw)
  To: Gal Pressman, Christoph Hellwig; +Cc: RDMA mailing list

On Mon, Aug 26, 2019 at 05:05:12PM +0300, Gal Pressman wrote:
> Hi all,
> 
> Lately I've been seeing DMA-API call traces on our automated testing runs which
> complain about overlapping mappings of the same cacheline [1].
> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
> same address, which as a result DMA maps the same physical addresses more than 7
> (ACTIVE_CACHELINE_MAX_OVERLAP) times.
> 
> Is this considered a bad behavior by the test? Should this be caught by
> ib_core/driver somehow?
> 
> Thanks,
> Gal
> 
> [1]
> DMA-API: exceeded 7 overlapping mappings of cacheline 0x000000004a0ad6c0
> WARNING: CPU: 56 PID: 63572 at kernel/dma/debug.c:501 add_dma_entry+0x1fd/0x230

I understand it is technically a violation of the DMA Mapping API to
do this, as it can create incoherence in the CPU cache if there are
multiple entities claiming responsibility to flush it around DMA.

So if you see this from a kernel ULP it is probably a bug.

From the userspace flow.. We only support DMA cache coherent archs in
userspace, and there is no way to prevent userspace from registering
the same page multiple times (in fact there are good reasons to do
this), so it is a false message. Would be nice to be able to suppress
it from this path.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-26 14:05 ib_umem_get and DMA_API_DEBUG question Gal Pressman
  2019-08-26 14:23 ` Leon Romanovsky
  2019-08-26 14:39 ` Jason Gunthorpe
@ 2019-08-27  8:28 ` Gal Pressman
  2019-08-27 12:00   ` Jason Gunthorpe
  2 siblings, 1 reply; 12+ messages in thread
From: Gal Pressman @ 2019-08-27  8:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Christoph Hellwig; +Cc: RDMA mailing list

On 26/08/2019 17:05, Gal Pressman wrote:
> Hi all,
> 
> Lately I've been seeing DMA-API call traces on our automated testing runs which
> complain about overlapping mappings of the same cacheline [1].
> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
> same address, which as a result DMA maps the same physical addresses more than 7
> (ACTIVE_CACHELINE_MAX_OVERLAP) times.

BTW, on rare occasions I'm seeing the boundary check in check_sg_segment [1]
fail as well. I don't have a stable repro for it though.

Is this a known issue as well? The comment there states it might be a bug in the
DMA API implementation, but I'm not sure.

[1] https://elixir.bootlin.com/linux/v5.3-rc3/source/kernel/dma/debug.c#L1230

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-27  8:28 ` Gal Pressman
@ 2019-08-27 12:00   ` Jason Gunthorpe
  2019-08-27 12:53     ` Gal Pressman
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2019-08-27 12:00 UTC (permalink / raw)
  To: Gal Pressman; +Cc: Leon Romanovsky, Christoph Hellwig, RDMA mailing list

On Tue, Aug 27, 2019 at 11:28:20AM +0300, Gal Pressman wrote:
> On 26/08/2019 17:05, Gal Pressman wrote:
> > Hi all,
> > 
> > Lately I've been seeing DMA-API call traces on our automated testing runs which
> > complain about overlapping mappings of the same cacheline [1].
> > The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
> > same address, which as a result DMA maps the same physical addresses more than 7
> > (ACTIVE_CACHELINE_MAX_OVERLAP) times.
> 
> BTW, on rare occasions I'm seeing the boundary check in check_sg_segment [1]
> fail as well. I don't have a stable repro for it though.
> 
> Is this a known issue as well? The comment there states it might be a bug in the
> DMA API implementation, but I'm not sure.
> 
> [1] https://elixir.bootlin.com/linux/v5.3-rc3/source/kernel/dma/debug.c#L1230

Maybe we are missing a dma_set_seg_boundary ?

PCI uses low defaults:

	dma_set_max_seg_size(&dev->dev, 65536);
	dma_set_seg_boundary(&dev->dev, 0xffffffff);

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-27 12:00   ` Jason Gunthorpe
@ 2019-08-27 12:53     ` Gal Pressman
  2019-08-27 13:17       ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Gal Pressman @ 2019-08-27 12:53 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, Christoph Hellwig, RDMA mailing list

On 27/08/2019 15:00, Jason Gunthorpe wrote:
> On Tue, Aug 27, 2019 at 11:28:20AM +0300, Gal Pressman wrote:
>> On 26/08/2019 17:05, Gal Pressman wrote:
>>> Hi all,
>>>
>>> Lately I've been seeing DMA-API call traces on our automated testing runs which
>>> complain about overlapping mappings of the same cacheline [1].
>>> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
>>> same address, which as a result DMA maps the same physical addresses more than 7
>>> (ACTIVE_CACHELINE_MAX_OVERLAP) times.
>>
>> BTW, on rare occasions I'm seeing the boundary check in check_sg_segment [1]
>> fail as well. I don't have a stable repro for it though.
>>
>> Is this a known issue as well? The comment there states it might be a bug in the
>> DMA API implementation, but I'm not sure.
>>
>> [1] https://elixir.bootlin.com/linux/v5.3-rc3/source/kernel/dma/debug.c#L1230
> 
> Maybe we are missing a dma_set_seg_boundary ?
> 
> PCI uses low defaults:
> 
> 	dma_set_max_seg_size(&dev->dev, 65536);
> 	dma_set_seg_boundary(&dev->dev, 0xffffffff);

What would you set it to?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-27 12:53     ` Gal Pressman
@ 2019-08-27 13:17       ` Jason Gunthorpe
  2019-08-27 13:22         ` Gal Pressman
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2019-08-27 13:17 UTC (permalink / raw)
  To: Gal Pressman; +Cc: Leon Romanovsky, Christoph Hellwig, RDMA mailing list

On Tue, Aug 27, 2019 at 03:53:29PM +0300, Gal Pressman wrote:
> On 27/08/2019 15:00, Jason Gunthorpe wrote:
> > On Tue, Aug 27, 2019 at 11:28:20AM +0300, Gal Pressman wrote:
> >> On 26/08/2019 17:05, Gal Pressman wrote:
> >>> Hi all,
> >>>
> >>> Lately I've been seeing DMA-API call traces on our automated testing runs which
> >>> complain about overlapping mappings of the same cacheline [1].
> >>> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
> >>> same address, which as a result DMA maps the same physical addresses more than 7
> >>> (ACTIVE_CACHELINE_MAX_OVERLAP) times.
> >>
> >> BTW, on rare occasions I'm seeing the boundary check in check_sg_segment [1]
> >> fail as well. I don't have a stable repro for it though.
> >>
> >> Is this a known issue as well? The comment there states it might be a bug in the
> >> DMA API implementation, but I'm not sure.
> >>
> >> [1] https://elixir.bootlin.com/linux/v5.3-rc3/source/kernel/dma/debug.c#L1230
> > 
> > Maybe we are missing a dma_set_seg_boundary ?
> > 
> > PCI uses low defaults:
> > 
> > 	dma_set_max_seg_size(&dev->dev, 65536);
> > 	dma_set_seg_boundary(&dev->dev, 0xffffffff);
> 
> What would you set it to?

Full 64 bits.

For umem the driver is responsible to chop up the SGL as required, not
the core code.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-27 13:17       ` Jason Gunthorpe
@ 2019-08-27 13:22         ` Gal Pressman
  2019-08-27 13:37           ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Gal Pressman @ 2019-08-27 13:22 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, Christoph Hellwig, RDMA mailing list

On 27/08/2019 16:17, Jason Gunthorpe wrote:
> On Tue, Aug 27, 2019 at 03:53:29PM +0300, Gal Pressman wrote:
>> On 27/08/2019 15:00, Jason Gunthorpe wrote:
>>> On Tue, Aug 27, 2019 at 11:28:20AM +0300, Gal Pressman wrote:
>>>> On 26/08/2019 17:05, Gal Pressman wrote:
>>>>> Hi all,
>>>>>
>>>>> Lately I've been seeing DMA-API call traces on our automated testing runs which
>>>>> complain about overlapping mappings of the same cacheline [1].
>>>>> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
>>>>> same address, which as a result DMA maps the same physical addresses more than 7
>>>>> (ACTIVE_CACHELINE_MAX_OVERLAP) times.
>>>>
>>>> BTW, on rare occasions I'm seeing the boundary check in check_sg_segment [1]
>>>> fail as well. I don't have a stable repro for it though.
>>>>
>>>> Is this a known issue as well? The comment there states it might be a bug in the
>>>> DMA API implementation, but I'm not sure.
>>>>
>>>> [1] https://elixir.bootlin.com/linux/v5.3-rc3/source/kernel/dma/debug.c#L1230
>>>
>>> Maybe we are missing a dma_set_seg_boundary ?
>>>
>>> PCI uses low defaults:
>>>
>>> 	dma_set_max_seg_size(&dev->dev, 65536);
>>> 	dma_set_seg_boundary(&dev->dev, 0xffffffff);
>>
>> What would you set it to?
> 
> Full 64 bits.
> 
> For umem the driver is responsible to chop up the SGL as required, not
> the core code.

But wouldn't this possibly hide driver bugs? Perhaps even in other flows?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-27 13:22         ` Gal Pressman
@ 2019-08-27 13:37           ` Jason Gunthorpe
  2019-08-27 13:53             ` Gal Pressman
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2019-08-27 13:37 UTC (permalink / raw)
  To: Gal Pressman; +Cc: Leon Romanovsky, Christoph Hellwig, RDMA mailing list

On Tue, Aug 27, 2019 at 04:22:51PM +0300, Gal Pressman wrote:
> On 27/08/2019 16:17, Jason Gunthorpe wrote:
> > On Tue, Aug 27, 2019 at 03:53:29PM +0300, Gal Pressman wrote:
> >> On 27/08/2019 15:00, Jason Gunthorpe wrote:
> >>> On Tue, Aug 27, 2019 at 11:28:20AM +0300, Gal Pressman wrote:
> >>>> On 26/08/2019 17:05, Gal Pressman wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> Lately I've been seeing DMA-API call traces on our automated testing runs which
> >>>>> complain about overlapping mappings of the same cacheline [1].
> >>>>> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
> >>>>> same address, which as a result DMA maps the same physical addresses more than 7
> >>>>> (ACTIVE_CACHELINE_MAX_OVERLAP) times.
> >>>>
> >>>> BTW, on rare occasions I'm seeing the boundary check in check_sg_segment [1]
> >>>> fail as well. I don't have a stable repro for it though.
> >>>>
> >>>> Is this a known issue as well? The comment there states it might be a bug in the
> >>>> DMA API implementation, but I'm not sure.
> >>>>
> >>>> [1] https://elixir.bootlin.com/linux/v5.3-rc3/source/kernel/dma/debug.c#L1230
> >>>
> >>> Maybe we are missing a dma_set_seg_boundary ?
> >>>
> >>> PCI uses low defaults:
> >>>
> >>> 	dma_set_max_seg_size(&dev->dev, 65536);
> >>> 	dma_set_seg_boundary(&dev->dev, 0xffffffff);
> >>
> >> What would you set it to?
> > 
> > Full 64 bits.
> > 
> > For umem the driver is responsible to chop up the SGL as required, not
> > the core code.
> 
> But wouldn't this possibly hide driver bugs? Perhaps even in other flows?

The block stack also uses this information, I've been meaning to check
if we should use dma_attrs in umem so we can have different
parameters.

I'm not aware of any issue with the 32 bit boundary on RDMA devices..

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-27 13:37           ` Jason Gunthorpe
@ 2019-08-27 13:53             ` Gal Pressman
  2019-08-28 14:15               ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Gal Pressman @ 2019-08-27 13:53 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, Christoph Hellwig, RDMA mailing list

On 27/08/2019 16:37, Jason Gunthorpe wrote:
> On Tue, Aug 27, 2019 at 04:22:51PM +0300, Gal Pressman wrote:
>> On 27/08/2019 16:17, Jason Gunthorpe wrote:
>>> On Tue, Aug 27, 2019 at 03:53:29PM +0300, Gal Pressman wrote:
>>>> On 27/08/2019 15:00, Jason Gunthorpe wrote:
>>>>> On Tue, Aug 27, 2019 at 11:28:20AM +0300, Gal Pressman wrote:
>>>>>> On 26/08/2019 17:05, Gal Pressman wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Lately I've been seeing DMA-API call traces on our automated testing runs which
>>>>>>> complain about overlapping mappings of the same cacheline [1].
>>>>>>> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
>>>>>>> same address, which as a result DMA maps the same physical addresses more than 7
>>>>>>> (ACTIVE_CACHELINE_MAX_OVERLAP) times.
>>>>>>
>>>>>> BTW, on rare occasions I'm seeing the boundary check in check_sg_segment [1]
>>>>>> fail as well. I don't have a stable repro for it though.
>>>>>>
>>>>>> Is this a known issue as well? The comment there states it might be a bug in the
>>>>>> DMA API implementation, but I'm not sure.
>>>>>>
>>>>>> [1] https://elixir.bootlin.com/linux/v5.3-rc3/source/kernel/dma/debug.c#L1230
>>>>>
>>>>> Maybe we are missing a dma_set_seg_boundary ?
>>>>>
>>>>> PCI uses low defaults:
>>>>>
>>>>> 	dma_set_max_seg_size(&dev->dev, 65536);
>>>>> 	dma_set_seg_boundary(&dev->dev, 0xffffffff);
>>>>
>>>> What would you set it to?
>>>
>>> Full 64 bits.
>>>
>>> For umem the driver is responsible to chop up the SGL as required, not
>>> the core code.
>>
>> But wouldn't this possibly hide driver bugs? Perhaps even in other flows?
> 
> The block stack also uses this information, I've been meaning to check
> if we should use dma_attrs in umem so we can have different
> parameters.
> 
> I'm not aware of any issue with the 32 bit boundary on RDMA devices..

So something like this?

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 99c4a55545cf..2aa0e48f8dac 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1199,8 +1199,9 @@ static void setup_dma_device(struct ib_device *device)
 		WARN_ON_ONCE(!parent);
 		device->dma_device = parent;
 	}
-	/* Setup default max segment size for all IB devices */
+	/* Setup default DMA properties for all IB devices */
 	dma_set_max_seg_size(device->dma_device, SZ_2G);
+	dma_set_seg_boundary(device->dma_device, U64_MAX);
 
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: ib_umem_get and DMA_API_DEBUG question
  2019-08-27 13:53             ` Gal Pressman
@ 2019-08-28 14:15               ` Jason Gunthorpe
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2019-08-28 14:15 UTC (permalink / raw)
  To: Gal Pressman; +Cc: Leon Romanovsky, Christoph Hellwig, RDMA mailing list

On Tue, Aug 27, 2019 at 04:53:01PM +0300, Gal Pressman wrote:
> On 27/08/2019 16:37, Jason Gunthorpe wrote:
> > On Tue, Aug 27, 2019 at 04:22:51PM +0300, Gal Pressman wrote:
> >> On 27/08/2019 16:17, Jason Gunthorpe wrote:
> >>> On Tue, Aug 27, 2019 at 03:53:29PM +0300, Gal Pressman wrote:
> >>>> On 27/08/2019 15:00, Jason Gunthorpe wrote:
> >>>>> On Tue, Aug 27, 2019 at 11:28:20AM +0300, Gal Pressman wrote:
> >>>>>> On 26/08/2019 17:05, Gal Pressman wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> Lately I've been seeing DMA-API call traces on our automated testing runs which
> >>>>>>> complain about overlapping mappings of the same cacheline [1].
> >>>>>>> The problem is (most likely) caused due to multiple calls to ibv_reg_mr with the
> >>>>>>> same address, which as a result DMA maps the same physical addresses more than 7
> >>>>>>> (ACTIVE_CACHELINE_MAX_OVERLAP) times.
> >>>>>>
> >>>>>> BTW, on rare occasions I'm seeing the boundary check in check_sg_segment [1]
> >>>>>> fail as well. I don't have a stable repro for it though.
> >>>>>>
> >>>>>> Is this a known issue as well? The comment there states it might be a bug in the
> >>>>>> DMA API implementation, but I'm not sure.
> >>>>>>
> >>>>>> [1] https://elixir.bootlin.com/linux/v5.3-rc3/source/kernel/dma/debug.c#L1230
> >>>>>
> >>>>> Maybe we are missing a dma_set_seg_boundary ?
> >>>>>
> >>>>> PCI uses low defaults:
> >>>>>
> >>>>> 	dma_set_max_seg_size(&dev->dev, 65536);
> >>>>> 	dma_set_seg_boundary(&dev->dev, 0xffffffff);
> >>>>
> >>>> What would you set it to?
> >>>
> >>> Full 64 bits.
> >>>
> >>> For umem the driver is responsible to chop up the SGL as required, not
> >>> the core code.
> >>
> >> But wouldn't this possibly hide driver bugs? Perhaps even in other flows?
> > 
> > The block stack also uses this information, I've been meaning to check
> > if we should use dma_attrs in umem so we can have different
> > parameters.
> > 
> > I'm not aware of any issue with the 32 bit boundary on RDMA devices..
> 
> So something like this?
> 
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 99c4a55545cf..2aa0e48f8dac 100644
> +++ b/drivers/infiniband/core/device.c
> @@ -1199,8 +1199,9 @@ static void setup_dma_device(struct ib_device *device)
>  		WARN_ON_ONCE(!parent);
>  		device->dma_device = parent;
>  	}
> -	/* Setup default max segment size for all IB devices */
> +	/* Setup default DMA properties for all IB devices */
>  	dma_set_max_seg_size(device->dma_device, SZ_2G);
> +	dma_set_seg_boundary(device->dma_device, U64_MAX);
>  
>  }

Hum. So there are two issues here, the SGL combiner in umem is
supposed to respect the DMA settings, so it should be fixed to check
boundary as well as seg_size too.

Then we can add the above as well. AFAIK all PCI-E HW is OK to do
arbitary DMAs.

Jason  

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-08-28 14:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26 14:05 ib_umem_get and DMA_API_DEBUG question Gal Pressman
2019-08-26 14:23 ` Leon Romanovsky
2019-08-26 14:39   ` Gal Pressman
2019-08-26 14:39 ` Jason Gunthorpe
2019-08-27  8:28 ` Gal Pressman
2019-08-27 12:00   ` Jason Gunthorpe
2019-08-27 12:53     ` Gal Pressman
2019-08-27 13:17       ` Jason Gunthorpe
2019-08-27 13:22         ` Gal Pressman
2019-08-27 13:37           ` Jason Gunthorpe
2019-08-27 13:53             ` Gal Pressman
2019-08-28 14:15               ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.