IOMMU Archive on lore.kernel.org
 help / color / Atom feed
* [Regression] "iommu/amd: Relax locking in dma_ops path" makes tg3 ethernet transmit queue timeout
@ 2020-05-18  9:06 Kai-Heng Feng
  2020-05-18 13:32 ` Joerg Roedel
  0 siblings, 1 reply; 4+ messages in thread
From: Kai-Heng Feng @ 2020-05-18  9:06 UTC (permalink / raw)
  To: jroedel; +Cc: iommu, open list

Hi,

Broadcom ethernet tg3 unusable after commit 92d420ec028d ("iommu/amd: Relax locking in dma_ops path").
After a short period it stops:
[  122.717144] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x237/0x240()
[  122.717152] NETDEV WATCHDOG: enp3s0 (tg3): transmit queue 0 timed out

After testing the patch section by section, this is the part that caused the regression:

@@ -2578,19 +2580,8 @@ static dma_addr_t map_page(struct device *dev, struct page *page,
 
        dma_mask = *dev->dma_mask;
 
-       spin_lock_irqsave(&domain->lock, flags);
-
-       addr = __map_single(dev, domain->priv, paddr, size, dir, false,
+       return __map_single(dev, domain->priv, paddr, size, dir, false,
                            dma_mask);
-       if (addr == DMA_ERROR_CODE)
-               goto out;
-
-       domain_flush_complete(domain);
-
-out:
-       spin_unlock_irqrestore(&domain->lock, flags);
-
-       return addr;
 }

Particularly, as soon as the spinlock is removed, the issue can be reproduced.
Function domain_flush_complete() doesn't seem to affect the status.

However, the .map_page callback was removed by be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api"), so there's no easy revert for this issue.

This is still reproducible as of today's mainline kernel, v5.7-rc6.

Kai-Heng
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Regression] "iommu/amd: Relax locking in dma_ops path" makes tg3 ethernet transmit queue timeout
  2020-05-18  9:06 [Regression] "iommu/amd: Relax locking in dma_ops path" makes tg3 ethernet transmit queue timeout Kai-Heng Feng
@ 2020-05-18 13:32 ` Joerg Roedel
  2020-05-18 14:05   ` Kai-Heng Feng
  0 siblings, 1 reply; 4+ messages in thread
From: Joerg Roedel @ 2020-05-18 13:32 UTC (permalink / raw)
  To: Kai-Heng Feng; +Cc: iommu, open list

On Mon, May 18, 2020 at 05:06:45PM +0800, Kai-Heng Feng wrote:
> Particularly, as soon as the spinlock is removed, the issue can be reproduced.
> Function domain_flush_complete() doesn't seem to affect the status.
> 
> However, the .map_page callback was removed by be62dbf554c5
> ("iommu/amd: Convert AMD iommu driver to the dma-iommu api"), so
> there's no easy revert for this issue.
> 
> This is still reproducible as of today's mainline kernel, v5.7-rc6.

Is there any error message from the IOMMU driver?

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Regression] "iommu/amd: Relax locking in dma_ops path" makes tg3 ethernet transmit queue timeout
  2020-05-18 13:32 ` Joerg Roedel
@ 2020-05-18 14:05   ` Kai-Heng Feng
  2020-05-18 15:32     ` Kai-Heng Feng
  0 siblings, 1 reply; 4+ messages in thread
From: Kai-Heng Feng @ 2020-05-18 14:05 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, open list



> On May 18, 2020, at 21:32, Joerg Roedel <jroedel@suse.de> wrote:
> 
> On Mon, May 18, 2020 at 05:06:45PM +0800, Kai-Heng Feng wrote:
>> Particularly, as soon as the spinlock is removed, the issue can be reproduced.
>> Function domain_flush_complete() doesn't seem to affect the status.
>> 
>> However, the .map_page callback was removed by be62dbf554c5
>> ("iommu/amd: Convert AMD iommu driver to the dma-iommu api"), so
>> there's no easy revert for this issue.
>> 
>> This is still reproducible as of today's mainline kernel, v5.7-rc6.
> 
> Is there any error message from the IOMMU driver?
> 

As of mainline kernel, there's no error message from IOMMU driver.
There are some complains from v4.15-rc1:
https://pastebin.ubuntu.com/p/qn4TXkFxsc/

Kai-Heng
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Regression] "iommu/amd: Relax locking in dma_ops path" makes tg3 ethernet transmit queue timeout
  2020-05-18 14:05   ` Kai-Heng Feng
@ 2020-05-18 15:32     ` Kai-Heng Feng
  0 siblings, 0 replies; 4+ messages in thread
From: Kai-Heng Feng @ 2020-05-18 15:32 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, open list



> On May 18, 2020, at 22:05, Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
> 
> 
> 
>> On May 18, 2020, at 21:32, Joerg Roedel <jroedel@suse.de> wrote:
>> 
>> On Mon, May 18, 2020 at 05:06:45PM +0800, Kai-Heng Feng wrote:
>>> Particularly, as soon as the spinlock is removed, the issue can be reproduced.
>>> Function domain_flush_complete() doesn't seem to affect the status.
>>> 
>>> However, the .map_page callback was removed by be62dbf554c5
>>> ("iommu/amd: Convert AMD iommu driver to the dma-iommu api"), so
>>> there's no easy revert for this issue.
>>> 
>>> This is still reproducible as of today's mainline kernel, v5.7-rc6.
>> 
>> Is there any error message from the IOMMU driver?
>> 
> 
> As of mainline kernel, there's no error message from IOMMU driver.
> There are some complains from v4.15-rc1:
> https://pastebin.ubuntu.com/p/qn4TXkFxsc/

Just tested v5.7-rc6, the issue disappears as soon as kernel boots with "iommu=off".

Kai-Heng

> 
> Kai-Heng

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-18  9:06 [Regression] "iommu/amd: Relax locking in dma_ops path" makes tg3 ethernet transmit queue timeout Kai-Heng Feng
2020-05-18 13:32 ` Joerg Roedel
2020-05-18 14:05   ` Kai-Heng Feng
2020-05-18 15:32     ` Kai-Heng Feng

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org
	public-inbox-index linux-iommu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git