All of lore.kernel.org
 help / color / mirror / Atom feed
* NTB connectivity issue under heavy load
@ 2019-05-06 14:43 trifundroid94
  2019-05-06 15:07 ` trifundroid94
  2019-05-10  2:02 ` Logan Gunthorpe
  0 siblings, 2 replies; 3+ messages in thread
From: trifundroid94 @ 2019-05-06 14:43 UTC (permalink / raw)
  To: linux-ntb


[-- Attachment #1.1: Type: text/plain, Size: 1049 bytes --]

Hello everyone,

I am not sure if this is the right place to ask but I am going to anyway 
since this problem has been bothering us for quite some time now.

A PC running kernel 4.16.7 and TI's ARM based SoC running kernel 4.14 are 
connected via NTB.
The ntb_netdev drivers are correctly loaded on both sides and the hosts are 
able to ping each other indefinitely. However, when under some heavier 
load, the connection breaks. At first we thought that it only happens when 
running TCP but it also happened for UDP as a transport protocol. 
The Wireshark logs show that the hosts simply forget where the other side 
is regarding ARP protocol. In other words, they know the IP of the peer but 
not its MAC address. We have not found a way to somehow hardcode this, 
especially knowing that the MAC address of the virtual eth made by 
ntb_netdev driver is randomized. The only solution is to restart the whole 
system and start again which is unacceptable.

Has this behavior been observed before? Is there any fix for this?

Thank you in advance.

[-- Attachment #1.2: Type: text/html, Size: 1203 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: NTB connectivity issue under heavy load
  2019-05-06 14:43 NTB connectivity issue under heavy load trifundroid94
@ 2019-05-06 15:07 ` trifundroid94
  2019-05-10  2:02 ` Logan Gunthorpe
  1 sibling, 0 replies; 3+ messages in thread
From: trifundroid94 @ 2019-05-06 15:07 UTC (permalink / raw)
  To: linux-ntb


[-- Attachment #1.1: Type: text/plain, Size: 299 bytes --]

We are using Switchtec PFX x24 switch if this is of any help. The 
ntb_hw_switchtec driver versions are 1-rc1-173-gba65e6c for the one on 
kernel 4.14 and is obtained from backport_4.13_to_4.14 branch of switchtec 
kernel repo, and 0.1 on the 4.16.7 kernel on PC which I believe was already 
there.

[-- Attachment #1.2: Type: text/html, Size: 319 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: NTB connectivity issue under heavy load
  2019-05-06 14:43 NTB connectivity issue under heavy load trifundroid94
  2019-05-06 15:07 ` trifundroid94
@ 2019-05-10  2:02 ` Logan Gunthorpe
  1 sibling, 0 replies; 3+ messages in thread
From: Logan Gunthorpe @ 2019-05-10  2:02 UTC (permalink / raw)
  To: trifundroid94, linux-ntb



On 2019-05-06 8:43 a.m., trifundroid94@gmail.com wrote:
> Hello everyone,
> 
> I am not sure if this is the right place to ask but I am going to anyway
> since this problem has been bothering us for quite some time now.
> 
> A PC running kernel 4.16.7 and TI's ARM based SoC running kernel 4.14
> are connected via NTB.
> The ntb_netdev drivers are correctly loaded on both sides and the hosts
> are able to ping each other indefinitely. However, when under some
> heavier load, the connection breaks. At first we thought that it only
> happens when running TCP but it also happened for UDP as a transport
> protocol.
> The Wireshark logs show that the hosts simply forget where the other
> side is regarding ARP protocol. In other words, they know the IP of the
> peer but not its MAC address. We have not found a way to somehow
> hardcode this, especially knowing that the MAC address of the virtual
> eth made by ntb_netdev driver is randomized. The only solution is to
> restart the whole system and start again which is unacceptable.
> 
> Has this behavior been observed before? Is there any fix for this?

I've never seen this and I'm not sure I can be much help but you should
maybe try checking the debugfs entries for ntb transport to see if
packets are still getting through. Just to narrow down the issue. Are
packets no longer getting through and thus arp requests are failing? Or
is this a problem on a higher layer with addressing?

Logan

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-05-10  2:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-06 14:43 NTB connectivity issue under heavy load trifundroid94
2019-05-06 15:07 ` trifundroid94
2019-05-10  2:02 ` Logan Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.