All of lore.kernel.org
 help / color / mirror / Atom feed
From: Karsten Graul <kgraul@linux.ibm.com>
To: Ioana Ciornei <ioana.ciornei@nxp.com>,
	Jeremy Linton <jeremy.linton@arm.com>
Cc: Hamza Mahfooz <someguy@effective-light.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>
Subject: Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry
Date: Thu, 30 Sep 2021 15:37:33 +0200	[thread overview]
Message-ID: <185e7ee4-3749-4ccb-6d2e-da6bc8f30c04@linux.ibm.com> (raw)
In-Reply-To: <20210914154504.z6vqxuh3byqwgfzx@skbuf>

On 14/09/2021 17:45, Ioana Ciornei wrote:
> On Wed, Sep 08, 2021 at 10:33:26PM -0500, Jeremy Linton wrote:
>> +DPAA2, netdev maintainers
>> Hi,
>>
>> On 5/18/21 7:54 AM, Hamza Mahfooz wrote:
>>> Since, overlapping mappings are not supported by the DMA API we should
>>> report an error if active_cacheline_insert returns -EEXIST.
>>
>> It seems this patch found a victim. I was trying to run iperf3 on a
>> honeycomb (5.14.0, fedora 35) and the console is blasting this error message
>> at 100% cpu. So, I changed it to a WARN_ONCE() to get the call trace, which
>> is attached below.
>>
> 
> These frags are allocated by the stack, transformed into a scatterlist
> by skb_to_sgvec and then DMA mapped with dma_map_sg. It was not the
> dpaa2-eth's decision to use two fragments from the same page (that will
> also end un in the same cacheline) in two different in-flight skbs.
> 
> Is this behavior normal?
> 

We see the same problem here and it started with 5.15-rc2 in our nightly CI runs.
The CI has panic_on_warn enabled so we see the panic every day now.

Its always the same pattern: module SMC calls dma_map_sg_attrs() which ends
up in the EEXIST warning sooner or later.

It would be better to revert this patch now and start to better understand the 
checking logic for overlapping areas.

Thank you.


The call trace for reference:

[  864.189864] DMA-API: mlx5_core 0662:00:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported
[  864.189883] WARNING: CPU: 0 PID: 33720 at kernel/dma/debug.c:570 add_dma_entry+0x208/0x2c8
...
[  864.190747] CPU: 0 PID: 33720 Comm: smcapp Not tainted 5.15.0-20210928.rc3.git0.a59bf04db7bb.300.fc34.s390x+debug #1
[  864.190758] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
[  864.190766] Krnl PSW : 0704d00180000000 00000000fa6239fc (add_dma_entry+0x20c/0x2c8)
[  864.190783]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[  864.190795] Krnl GPRS: c0000000ffffbfff 0000000080000000 0000000000000061 0000000000000000
[  864.190804]            0000000000000001 0000000000000001 0000000000000001 0000000000000001
[  864.190813]            0700000000000001 000000000020ff00 00000000ffffffff 000000008137b300
[  864.190822]            0000000020020100 0000000000000001 00000000fa6239f8 00000380074536f8
[  864.190837] Krnl Code: 00000000fa6239ec: c020007a4964	larl	%r2,00000000fb56ccb4
                          00000000fa6239f2: c0e5005ef2ff	brasl	%r14,00000000fb201ff0
                         #00000000fa6239f8: af000000		mc	0,0
                         >00000000fa6239fc: ecb60057007c	cgij	%r11,0,6,00000000fa623aaa
                          00000000fa623a02: c01000866149	larl	%r1,00000000fb6efc94
                          00000000fa623a08: e31010000012	lt	%r1,0(%r1)
                          00000000fa623a0e: a774ff73		brc	7,00000000fa6238f4
                          00000000fa623a12: c010008a9227	larl	%r1,00000000fb775e60
[  864.202949] Call Trace:
[  864.202959]  [<00000000fa6239fc>] add_dma_entry+0x20c/0x2c8 
[  864.202971] ([<00000000fa6239f8>] add_dma_entry+0x208/0x2c8)
[  864.202981]  [<00000000fa624988>] debug_dma_map_sg+0x140/0x160 
[  864.202992]  [<00000000fa61eadc>] __dma_map_sg_attrs+0x9c/0xd8 
[  864.203002]  [<00000000fa61eb3a>] dma_map_sg_attrs+0x22/0x40 
[  864.203012]  [<000003ff80483bde>] smc_ib_buf_map_sg+0x5e/0x90 [smc] 
[  864.203036]  [<000003ff80486b44>] smcr_buf_map_link.part.0+0x12c/0x1e8 [smc] 
[  864.203053]  [<000003ff80486cb6>] _smcr_buf_map_lgr+0xb6/0xf8 [smc] 
[  864.203071]  [<000003ff8048b91c>] smcr_buf_map_lgr+0x4c/0x90 [smc] 
[  864.211496]  [<000003ff80490ac2>] smc_llc_cli_add_link+0x152/0x420 [smc] 
[  864.211522]  [<000003ff8047acbc>] smcr_clnt_conf_first_link+0x124/0x1e0 [smc] 
[  864.211537]  [<000003ff8047bfb2>] smc_connect_rdma+0x25a/0x2e8 [smc] 
[  864.211551]  [<000003ff8047da4a>] __smc_connect+0x38a/0x650 [smc] 
[  864.211566]  [<000003ff8047de70>] smc_connect+0x160/0x190 [smc] 
[  864.211580]  [<00000000faf10c70>] __sys_connect+0x98/0xd0 
[  864.211592]  [<00000000faf12e9a>] __do_sys_socketcall+0x16a/0x350 
[  864.211603]  [<00000000fb216752>] __do_syscall+0x1c2/0x1f0 
[  864.211616]  [<00000000fb229148>] system_call+0x78/0xa0 

-- 
Karsten

WARNING: multiple messages have this Message-ID (diff)
From: Karsten Graul <kgraul@linux.ibm.com>
To: Ioana Ciornei <ioana.ciornei@nxp.com>,
	Jeremy Linton <jeremy.linton@arm.com>
Cc: linux-s390 <linux-s390@vger.kernel.org>,
	Hamza Mahfooz <someguy@effective-light.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry
Date: Thu, 30 Sep 2021 15:37:33 +0200	[thread overview]
Message-ID: <185e7ee4-3749-4ccb-6d2e-da6bc8f30c04@linux.ibm.com> (raw)
In-Reply-To: <20210914154504.z6vqxuh3byqwgfzx@skbuf>

On 14/09/2021 17:45, Ioana Ciornei wrote:
> On Wed, Sep 08, 2021 at 10:33:26PM -0500, Jeremy Linton wrote:
>> +DPAA2, netdev maintainers
>> Hi,
>>
>> On 5/18/21 7:54 AM, Hamza Mahfooz wrote:
>>> Since, overlapping mappings are not supported by the DMA API we should
>>> report an error if active_cacheline_insert returns -EEXIST.
>>
>> It seems this patch found a victim. I was trying to run iperf3 on a
>> honeycomb (5.14.0, fedora 35) and the console is blasting this error message
>> at 100% cpu. So, I changed it to a WARN_ONCE() to get the call trace, which
>> is attached below.
>>
> 
> These frags are allocated by the stack, transformed into a scatterlist
> by skb_to_sgvec and then DMA mapped with dma_map_sg. It was not the
> dpaa2-eth's decision to use two fragments from the same page (that will
> also end un in the same cacheline) in two different in-flight skbs.
> 
> Is this behavior normal?
> 

We see the same problem here and it started with 5.15-rc2 in our nightly CI runs.
The CI has panic_on_warn enabled so we see the panic every day now.

Its always the same pattern: module SMC calls dma_map_sg_attrs() which ends
up in the EEXIST warning sooner or later.

It would be better to revert this patch now and start to better understand the 
checking logic for overlapping areas.

Thank you.


The call trace for reference:

[  864.189864] DMA-API: mlx5_core 0662:00:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported
[  864.189883] WARNING: CPU: 0 PID: 33720 at kernel/dma/debug.c:570 add_dma_entry+0x208/0x2c8
...
[  864.190747] CPU: 0 PID: 33720 Comm: smcapp Not tainted 5.15.0-20210928.rc3.git0.a59bf04db7bb.300.fc34.s390x+debug #1
[  864.190758] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
[  864.190766] Krnl PSW : 0704d00180000000 00000000fa6239fc (add_dma_entry+0x20c/0x2c8)
[  864.190783]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[  864.190795] Krnl GPRS: c0000000ffffbfff 0000000080000000 0000000000000061 0000000000000000
[  864.190804]            0000000000000001 0000000000000001 0000000000000001 0000000000000001
[  864.190813]            0700000000000001 000000000020ff00 00000000ffffffff 000000008137b300
[  864.190822]            0000000020020100 0000000000000001 00000000fa6239f8 00000380074536f8
[  864.190837] Krnl Code: 00000000fa6239ec: c020007a4964	larl	%r2,00000000fb56ccb4
                          00000000fa6239f2: c0e5005ef2ff	brasl	%r14,00000000fb201ff0
                         #00000000fa6239f8: af000000		mc	0,0
                         >00000000fa6239fc: ecb60057007c	cgij	%r11,0,6,00000000fa623aaa
                          00000000fa623a02: c01000866149	larl	%r1,00000000fb6efc94
                          00000000fa623a08: e31010000012	lt	%r1,0(%r1)
                          00000000fa623a0e: a774ff73		brc	7,00000000fa6238f4
                          00000000fa623a12: c010008a9227	larl	%r1,00000000fb775e60
[  864.202949] Call Trace:
[  864.202959]  [<00000000fa6239fc>] add_dma_entry+0x20c/0x2c8 
[  864.202971] ([<00000000fa6239f8>] add_dma_entry+0x208/0x2c8)
[  864.202981]  [<00000000fa624988>] debug_dma_map_sg+0x140/0x160 
[  864.202992]  [<00000000fa61eadc>] __dma_map_sg_attrs+0x9c/0xd8 
[  864.203002]  [<00000000fa61eb3a>] dma_map_sg_attrs+0x22/0x40 
[  864.203012]  [<000003ff80483bde>] smc_ib_buf_map_sg+0x5e/0x90 [smc] 
[  864.203036]  [<000003ff80486b44>] smcr_buf_map_link.part.0+0x12c/0x1e8 [smc] 
[  864.203053]  [<000003ff80486cb6>] _smcr_buf_map_lgr+0xb6/0xf8 [smc] 
[  864.203071]  [<000003ff8048b91c>] smcr_buf_map_lgr+0x4c/0x90 [smc] 
[  864.211496]  [<000003ff80490ac2>] smc_llc_cli_add_link+0x152/0x420 [smc] 
[  864.211522]  [<000003ff8047acbc>] smcr_clnt_conf_first_link+0x124/0x1e0 [smc] 
[  864.211537]  [<000003ff8047bfb2>] smc_connect_rdma+0x25a/0x2e8 [smc] 
[  864.211551]  [<000003ff8047da4a>] __smc_connect+0x38a/0x650 [smc] 
[  864.211566]  [<000003ff8047de70>] smc_connect+0x160/0x190 [smc] 
[  864.211580]  [<00000000faf10c70>] __sys_connect+0x98/0xd0 
[  864.211592]  [<00000000faf12e9a>] __do_sys_socketcall+0x16a/0x350 
[  864.211603]  [<00000000fb216752>] __do_syscall+0x1c2/0x1f0 
[  864.211616]  [<00000000fb229148>] system_call+0x78/0xa0 

-- 
Karsten
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2021-09-30 13:48 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-18 12:54 [PATCH] dma debug: report -EEXIST errors in add_dma_entry Hamza Mahfooz
2021-05-18 12:54 ` Hamza Mahfooz
2021-06-22  7:41 ` Christoph Hellwig
2021-06-22  7:41   ` Christoph Hellwig
2021-09-09  3:33 ` DPAA2 triggers, " Jeremy Linton
2021-09-09  3:33   ` Jeremy Linton
2021-09-09 21:16   ` Ioana Ciornei
2021-09-09 21:16     ` Ioana Ciornei
2021-09-10 10:23   ` Christoph Hellwig
2021-09-10 10:23     ` Christoph Hellwig
2021-09-14 15:45   ` Ioana Ciornei
2021-09-14 15:45     ` Ioana Ciornei
2021-09-30 13:37     ` Karsten Graul [this message]
2021-09-30 13:37       ` Karsten Graul
2021-10-01 12:52       ` Gerald Schaefer
2021-10-01 12:52         ` Gerald Schaefer
2021-10-06 13:10         ` Gerald Schaefer
2021-10-06 13:10           ` Gerald Schaefer
2021-10-06 13:21           ` Gerald Schaefer
2021-10-06 13:21             ` Gerald Schaefer
2021-10-06 14:23           ` Robin Murphy
2021-10-06 14:23             ` Robin Murphy
2021-10-06 15:06             ` Gerald Schaefer
2021-10-06 15:06               ` Gerald Schaefer
2021-10-07 10:59             ` Karsten Graul
2021-10-07 10:59               ` Karsten Graul
2021-10-07 16:40               ` Gerald Schaefer
2021-10-07 16:40                 ` Gerald Schaefer
2021-10-11 11:47               ` Christoph Hellwig
2021-10-11 11:47                 ` Christoph Hellwig
2021-10-01  4:19     ` Christoph Hellwig
2021-10-01  4:19       ` Christoph Hellwig
2021-10-01  9:21       ` Ioana Ciornei
2021-10-01  9:21         ` Ioana Ciornei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=185e7ee4-3749-4ccb-6d2e-da6bc8f30c04@linux.ibm.com \
    --to=kgraul@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=ioana.ciornei@nxp.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jeremy.linton@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=netdev@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=someguy@effective-light.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.