ntb.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* ntb_netdev Communication Failure Issue
@ 2022-02-16 16:16 Marc Smith
  2022-02-16 17:14 ` Logan Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Smith @ 2022-02-16 16:16 UTC (permalink / raw)
  To: ntb

Hi,

I'm using vanilla Linux 5.4.145 with a Celestica "cluster-in-a-box"
system (two servers in a single chassis connected internally via PCIe
switches). The PCIe switches are Microsemi Switchtec ('lspci' says
"PMC-Sierra Inc. PM8546 B-FEIP PSX 96xG3 PCIe Storage Switch
[11f8:8546]" but I believe this is not the current product/model name
from Microsemi).

I use the 'ntb_netdev' driver for the virtual Ethernet functionality
across the NTB for IP communication between the two controllers. I've
had a long standing issue where sometimes when we reboot a controller,
we're never able to establish communication between the two
controllers again -- usually requires rebooting both and/or power
cycling the entire chassis.

I've also noticed sometimes when they boot, the virtual Ethernet
device fails to pass traffic right away.

And finally, when simply doing an "ifconfig NTB_IF down" followed by
an "ifconfig NTB_IF up" on one controller, they sometimes get into the
state where they no longer pass traffic.

All three conditions seem to be related, and the symptom is the same
(they have the UP RUNNING flags with IP's configured but I'm unable to
ping between the two controllers). Usually when this happens, one side
will show the drop/error counts for the NTB virtual Ethernet interface
increasing (never ending).

In the example below, both controllers booted up, and I unloaded the
modules then reloaded with 'dyndbg=+p' but immediately was not able to
pass any traffic on those interfaces.

Node/controller 1:
[44101.188104] switchtec switchtec0: Management device registered.
[44101.224290] switchtec switchtec1: Management device registered.
[44101.224879] switchtec: loaded.
[44108.104617] switchtec switchtec0: Partition ID 0 of 3
[44108.104785] switchtec switchtec0: MWs: 2 direct, 32 lut
[44108.104912] switchtec switchtec0: Peer MWs: 2 direct, 32 lut
[44108.157019] switchtec switchtec0: Requester ID 00:00.0 -> BB:01.0
[44108.157054] switchtec switchtec0: Requester ID 3A:00.0 -> BB:01.1
[44108.208906] switchtec switchtec0: Using crosslink configuration
[44108.208981] switchtec switchtec0: Crosslink BAR0 addr: 0
[44108.209092] switchtec switchtec0: Crosslink BAR2 addr: 1000000000
[44108.209167] switchtec switchtec0: Crosslink BAR4 addr: 2000000000
[44108.468954] switchtec switchtec0: Requester ID 00:01.0 -> BB:01.4
[44108.468986] switchtec switchtec0: Requester ID 00:01.1 -> BB:01.5
[44108.521033] switchtec switchtec0: dbs: shift 0/0, mask 0fffffffffffffff
[44108.625136] switchtec switchtec0: Shared MW Ready
[44108.625163] switchtec switchtec0: irqs - event: 2, db: 0, msgs: 1
[44108.626511] switchtec switchtec0: NTB device registered
[44140.260589] switchtec switchtec0: stuser_create: 00000000df6a1e87
[44140.260594] switchtec switchtec0: switchtec_dev_open: 00000000df6a1e87
[44140.260743] switchtec switchtec0: stuser_free: 00000000df6a1e87
[44150.277982] switchtec switchtec0: stuser_create: 00000000c4701c39
[44150.277987] switchtec switchtec0: switchtec_dev_open: 00000000c4701c39
[44150.278138] switchtec switchtec0: stuser_free: 00000000c4701c39
[44231.225116] switchtec switchtec0: message: 0 00000001
[44232.799538] Software Queue-Pair Transport over NTB, version 4
[44232.800780] switchtec switchtec0: enabling link
[44232.800893] switchtec 0000:3b:00.1: Remote version = 0
[44232.800903] switchtec switchtec0: ntb link up
[44232.801001] switchtec switchtec0: message: 0 00000003
[44232.812419] switchtec 0000:3b:00.1: Remote version = 4
[44232.812430] switchtec 0000:3b:00.1: Remote max number of qps = 2
[44232.812439] switchtec 0000:3b:00.1: Remote number of mws = 2
[44232.812449] switchtec 0000:3b:00.1: Remote MW0 size = 0x200000
[44232.813114] switchtec switchtec0: MW 0: part 0 addr
0x0000001071600000 size 0x0000000000200000
[44232.916884] switchtec 0000:3b:00.1: Remote MW1 size = 0x200000
[44232.917452] switchtec switchtec0: MW 1: part 0 addr
0x0000001014e00000 size 0x0000000000200000
[44257.513987] switchtec 0000:3b:00.1: Using DMA memcpy for TX
[44257.513992] switchtec 0000:3b:00.1: Using DMA memcpy for RX
[44257.517993] switchtec 0000:3b:00.1: NTB Transport QP 0 created
[44257.524071] switchtec 0000:3b:00.1: eth0 created
[44257.533847] ntb_netdev ntb_netdev0 priv0: renamed from eth0
...
[44344.794970] switchtec switchtec0: stuser_create: 00000000b79c4946
[44344.794975] switchtec switchtec0: switchtec_dev_open: 00000000b79c4946
[44344.795158] switchtec switchtec0: stuser_free: 00000000b79c4946
[44344.871358] switchtec switchtec0: stuser_create: 000000009a68785f
[44344.871363] switchtec switchtec0: switchtec_dev_open: 000000009a68785f
[44344.871542] switchtec switchtec0: stuser_free: 000000009a68785f
[44350.386529] switchtec switchtec0: stuser_create: 00000000a3644dc9
[44350.386535] switchtec switchtec0: switchtec_dev_open: 00000000a3644dc9
[44350.386721] switchtec switchtec0: stuser_free: 00000000a3644dc9
[44350.416802] switchtec switchtec0: stuser_create: 00000000f2cc280a
[44350.416807] switchtec switchtec0: switchtec_dev_open: 00000000f2cc280a
[44350.416968] switchtec switchtec0: stuser_free: 00000000f2cc280a
[44470.486355] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.496863] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.504156] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.514905] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.527490] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.537896] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.548884] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.559902] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.570890] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.581890] switchtec 0000:3b:00.1: Remote QP link status = 0
[44470.591242] switchtec switchtec0: stuser_create: 00000000309d4e4c
[44470.591248] switchtec switchtec0: switchtec_dev_open: 00000000309d4e4c
[44470.591412] switchtec switchtec0: stuser_free: 00000000309d4e4c
[44470.625330] switchtec switchtec0: stuser_create: 000000007e7c8c1c
[44470.625335] switchtec switchtec0: switchtec_dev_open: 000000007e7c8c1c
[44470.625511] switchtec switchtec0: stuser_free: 000000007e7c8c1c
[44474.332907] switchtec 0000:3b:00.1: qp 0: Link Up
[44474.332925] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
[44474.332965] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44474.332977] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44474.332985] switchtec 0000:3b:00.1: done flag not set
[44474.333020] IPv6: ADDRCONF(NETDEV_CHANGE): priv0: link becomes ready
[44474.355200] switchtec switchtec0: doorbell
[44474.355250] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44474.355263] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 3
[44474.355272] switchtec 0000:3b:00.1: link down flag set
[44474.355320] switchtec 0000:3b:00.1: qp 0: Link Cleanup
[44474.355337] ntb_netdev ntb_netdev0 priv0: Event 0, Link 0
[44474.355372] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44474.355382] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44474.355390] switchtec 0000:3b:00.1: done flag not set
[44474.360458] switchtec switchtec0: doorbell
[44474.365910] switchtec 0000:3b:00.1: qp 0: Link Up
[44474.365923] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
[44474.365961] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44474.365971] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 90 flags 1
[44474.365984] switchtec 0000:3b:00.1: RX OK index 0 ver 0 size 90
into buf size 65524
[44474.365998] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
byte payload received
[44474.366072] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44474.366079] switchtec 0000:3b:00.1: done flag not set
[44474.366192] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44474.366195] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44474.366197] switchtec 0000:3b:00.1: done flag not set
[44474.939439] switchtec switchtec0: doorbell
[44474.939454] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44474.939458] switchtec 0000:3b:00.1: qp 0: RX ver 1 len 90 flags 1
[44474.939462] switchtec 0000:3b:00.1: RX OK index 1 ver 1 size 90
into buf size 65524
[44474.939467] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
byte payload received
[44474.939493] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44474.939495] switchtec 0000:3b:00.1: done flag not set
[44474.939566] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44474.939569] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44474.939572] switchtec 0000:3b:00.1: done flag not set
[44475.387462] switchtec switchtec0: doorbell
[44475.387475] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44475.387478] switchtec 0000:3b:00.1: qp 0: RX ver 2 len 86 flags 1
[44475.387482] switchtec 0000:3b:00.1: RX OK index 2 ver 2 size 86
into buf size 65524
[44475.387486] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 86
byte payload received
[44475.387505] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44475.387507] switchtec 0000:3b:00.1: done flag not set
[44475.387561] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44475.387563] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44475.387566] switchtec 0000:3b:00.1: done flag not set
[44476.411777] switchtec switchtec0: doorbell
[44476.411792] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44476.411796] switchtec 0000:3b:00.1: qp 0: RX ver 3 len 90 flags 1
[44476.411800] switchtec 0000:3b:00.1: RX OK index 3 ver 3 size 90
into buf size 65524
[44476.411807] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
byte payload received
[44476.411835] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44476.411837] switchtec 0000:3b:00.1: done flag not set
[44476.411907] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44476.411910] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44476.411912] switchtec 0000:3b:00.1: done flag not set
[44476.412087] switchtec switchtec0: doorbell
[44476.412091] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44476.412094] switchtec 0000:3b:00.1: qp 0: RX ver 4 len 70 flags 1
[44476.412098] switchtec 0000:3b:00.1: RX OK index 4 ver 4 size 70
into buf size 65524
[44476.412102] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
byte payload received
[44476.412120] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44476.412122] switchtec 0000:3b:00.1: done flag not set
[44476.412198] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44476.412201] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44476.412203] switchtec 0000:3b:00.1: done flag not set
[44477.307472] switchtec switchtec0: doorbell
[44477.307487] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44477.307490] switchtec 0000:3b:00.1: qp 0: RX ver 5 len 90 flags 1
[44477.307494] switchtec 0000:3b:00.1: RX OK index 5 ver 5 size 90
into buf size 65524
[44477.307501] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
byte payload received
[44477.307525] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44477.307528] switchtec 0000:3b:00.1: done flag not set
[44477.307689] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44477.307692] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44477.307694] switchtec 0000:3b:00.1: done flag not set
[44481.147576] switchtec switchtec0: doorbell
[44481.147594] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44481.147597] switchtec 0000:3b:00.1: qp 0: RX ver 6 len 70 flags 1
[44481.147601] switchtec 0000:3b:00.1: RX OK index 6 ver 6 size 70
into buf size 65524
[44481.147607] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
byte payload received
[44481.147631] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44481.147634] switchtec 0000:3b:00.1: done flag not set
[44481.147712] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44481.147715] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44481.147718] switchtec 0000:3b:00.1: done flag not set
[44489.851482] switchtec switchtec0: doorbell
[44489.851498] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44489.851502] switchtec 0000:3b:00.1: qp 0: RX ver 7 len 70 flags 1
[44489.851506] switchtec 0000:3b:00.1: RX OK index 7 ver 7 size 70
into buf size 65524
[44489.851511] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
byte payload received
[44489.851536] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44489.851538] switchtec 0000:3b:00.1: done flag not set
[44489.851616] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44489.851619] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44489.851621] switchtec 0000:3b:00.1: done flag not set
[44493.354390] switchtec switchtec0: stuser_create: 00000000098d03cb
[44493.354395] switchtec switchtec0: switchtec_dev_open: 00000000098d03cb
[44493.354560] switchtec switchtec0: stuser_free: 00000000098d03cb
[44493.377810] switchtec switchtec0: stuser_create: 00000000cac0e0f9
[44493.377815] switchtec switchtec0: switchtec_dev_open: 00000000cac0e0f9
[44493.378029] switchtec switchtec0: stuser_free: 00000000cac0e0f9
[44493.396554] switchtec switchtec0: stuser_create: 000000008fe7cb2b
[44493.396559] switchtec switchtec0: switchtec_dev_open: 000000008fe7cb2b
[44493.396719] switchtec switchtec0: stuser_free: 000000008fe7cb2b
[44493.417150] switchtec switchtec0: stuser_create: 00000000e67348bf
[44493.417154] switchtec switchtec0: switchtec_dev_open: 00000000e67348bf
[44493.417323] switchtec switchtec0: stuser_free: 00000000e67348bf
[44493.438786] switchtec switchtec0: stuser_create: 000000007384dd1d
[44493.438791] switchtec switchtec0: switchtec_dev_open: 000000007384dd1d
[44493.438969] switchtec switchtec0: stuser_free: 000000007384dd1d
[44493.459467] switchtec switchtec0: stuser_create: 00000000af970cd7
[44493.459471] switchtec switchtec0: switchtec_dev_open: 00000000af970cd7
[44493.459637] switchtec switchtec0: stuser_free: 00000000af970cd7
[44493.480517] switchtec switchtec0: stuser_create: 00000000abdf1426
[44493.480522] switchtec switchtec0: switchtec_dev_open: 00000000abdf1426
[44493.480672] switchtec switchtec0: stuser_free: 00000000abdf1426
[44493.501712] switchtec switchtec0: stuser_create: 00000000e4221771
[44493.501716] switchtec switchtec0: switchtec_dev_open: 00000000e4221771
[44493.501882] switchtec switchtec0: stuser_free: 00000000e4221771
[44493.528249] switchtec switchtec0: stuser_create: 00000000c9d7a68c
[44493.528253] switchtec switchtec0: switchtec_dev_open: 00000000c9d7a68c
[44493.528414] switchtec switchtec0: stuser_free: 00000000c9d7a68c
[44493.555529] switchtec switchtec0: stuser_create: 00000000e415ce5d
[44493.555533] switchtec switchtec0: switchtec_dev_open: 00000000e415ce5d
[44493.555699] switchtec switchtec0: stuser_free: 00000000e415ce5d
[44493.577118] switchtec switchtec0: stuser_create: 000000003bee7779
[44493.577122] switchtec switchtec0: switchtec_dev_open: 000000003bee7779
[44493.577276] switchtec switchtec0: stuser_free: 000000003bee7779
[44493.598661] switchtec switchtec0: stuser_create: 00000000e76bc00f
[44493.598668] switchtec switchtec0: switchtec_dev_open: 00000000e76bc00f
[44493.598942] switchtec switchtec0: stuser_free: 00000000e76bc00f
[44493.619690] switchtec switchtec0: stuser_create: 00000000e527a94d
[44493.619694] switchtec switchtec0: switchtec_dev_open: 00000000e527a94d
[44493.619841] switchtec switchtec0: stuser_free: 00000000e527a94d
[44493.640966] switchtec switchtec0: stuser_create: 000000004978b777
[44493.640970] switchtec switchtec0: switchtec_dev_open: 000000004978b777
[44493.641134] switchtec switchtec0: stuser_free: 000000004978b777
[44493.660708] switchtec switchtec0: stuser_create: 00000000061d84e0
[44493.660712] switchtec switchtec0: switchtec_dev_open: 00000000061d84e0
[44493.660914] switchtec switchtec0: stuser_free: 00000000061d84e0
[44493.679806] switchtec switchtec0: stuser_create: 00000000ab4b5fd5
[44493.679811] switchtec switchtec0: switchtec_dev_open: 00000000ab4b5fd5
[44493.680007] switchtec switchtec0: stuser_free: 00000000ab4b5fd5
[44493.701799] switchtec switchtec0: stuser_create: 00000000943768fe
[44493.701804] switchtec switchtec0: switchtec_dev_open: 00000000943768fe
[44493.701978] switchtec switchtec0: stuser_free: 00000000943768fe
[44493.722465] switchtec switchtec0: stuser_create: 00000000727838fa
[44493.722469] switchtec switchtec0: switchtec_dev_open: 00000000727838fa
[44493.722633] switchtec switchtec0: stuser_free: 00000000727838fa
[44493.744238] switchtec switchtec0: stuser_create: 000000008eb49672
[44493.744242] switchtec switchtec0: switchtec_dev_open: 000000008eb49672
[44493.744395] switchtec switchtec0: stuser_free: 000000008eb49672
[44493.765595] switchtec switchtec0: stuser_create: 00000000ec4dcadf
[44493.765599] switchtec switchtec0: switchtec_dev_open: 00000000ec4dcadf
[44493.765766] switchtec switchtec0: stuser_free: 00000000ec4dcadf
[44493.786396] switchtec switchtec0: stuser_create: 00000000b664b5c8
[44493.786401] switchtec switchtec0: switchtec_dev_open: 00000000b664b5c8
[44493.786554] switchtec switchtec0: stuser_free: 00000000b664b5c8
[44493.808189] switchtec switchtec0: stuser_create: 000000000b620cc8
[44493.808194] switchtec switchtec0: switchtec_dev_open: 000000000b620cc8
[44493.808359] switchtec switchtec0: stuser_free: 000000000b620cc8
[44493.826971] switchtec switchtec0: stuser_create: 00000000237afec0
[44493.826976] switchtec switchtec0: switchtec_dev_open: 00000000237afec0
[44493.827128] switchtec switchtec0: stuser_free: 00000000237afec0
[44493.847269] switchtec switchtec0: stuser_create: 00000000766c005b
[44493.847273] switchtec switchtec0: switchtec_dev_open: 00000000766c005b
[44493.847474] switchtec switchtec0: stuser_free: 00000000766c005b
[44500.851786] switchtec switchtec0: stuser_create: 000000009a68785f
[44500.851792] switchtec switchtec0: switchtec_dev_open: 000000009a68785f
[44500.851993] switchtec switchtec0: stuser_free: 000000009a68785f
[44507.259767] switchtec switchtec0: doorbell
[44507.259785] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44507.259790] switchtec 0000:3b:00.1: qp 0: RX ver 8 len 70 flags 1
[44507.259795] switchtec 0000:3b:00.1: RX OK index 8 ver 8 size 70
into buf size 65524
[44507.259801] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
byte payload received
[44507.259828] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44507.259830] switchtec 0000:3b:00.1: done flag not set
[44507.259894] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44507.259897] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44507.259899] switchtec 0000:3b:00.1: done flag not set

# cat /sys/kernel/debug/ntb_transport/0000\:3b\:00.1/qp0/stats
NTB QP stats:

rx_bytes - 1244
rx_pkts - 20
rx_memcpy - 20
rx_async - 0
rx_ring_empty - 41
rx_err_no_buf - 0
rx_err_oflow - 0
rx_err_ver - 0
rx_buff - 0x00000000fd3649f3
rx_index - 20
rx_max_entry - 31
rx_alloc_entry - 100

tx_bytes - 0
tx_pkts - 0
tx_memcpy - 0
tx_async - 0
tx_ring_full - 212083112
tx_err_no_buf - 0
tx_mw - 0x0000000029c0837d
tx_index (H) - 0
RRI (T) - 0
tx_max_entry - 31
free tx - 31

Using TX DMA - Yes
Using RX DMA - Yes
QP Link - Up

# ifconfig priv0
priv0     Link encap:Ethernet  HWaddr 1A:A5:C6:CD:F2:51
          inet addr:10.17.21.197  Bcast:10.17.21.199  Mask:255.255.255.252
          inet6 addr: fe80::18a5:c6ff:fecd:f251/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:26 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1 errors:276181510 dropped:276181510 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1496 (1.4 KiB)  TX bytes:90 (90.0 B)


Node/controller 2:
[43808.457199] switchtec switchtec0: Management device registered.
[43808.492076] switchtec switchtec1: Management device registered.
[43808.492647] switchtec: loaded.
[43815.711975] switchtec switchtec0: Partition ID 0 of 3
[43815.712140] switchtec switchtec0: MWs: 2 direct, 32 lut
[43815.712261] switchtec switchtec0: Peer MWs: 2 direct, 32 lut
[43815.764006] switchtec switchtec0: Requester ID 00:00.0 -> BB:01.0
[43815.764033] switchtec switchtec0: Requester ID 3A:00.0 -> BB:01.1
[43815.815913] switchtec switchtec0: Using crosslink configuration
[43815.815978] switchtec switchtec0: Crosslink BAR0 addr: 0
[43815.816091] switchtec switchtec0: Crosslink BAR2 addr: 1000000000
[43815.816155] switchtec switchtec0: Crosslink BAR4 addr: 2000000000
[43816.075968] switchtec switchtec0: Requester ID 00:01.0 -> BB:01.4
[43816.076000] switchtec switchtec0: Requester ID 00:01.1 -> BB:01.5
[43816.128185] switchtec switchtec0: dbs: shift 0/0, mask 0fffffffffffffff
[43816.232166] switchtec switchtec0: Shared MW Ready
[43816.232193] switchtec switchtec0: irqs - event: 2, db: 0, msgs: 1
[43816.233520] switchtec switchtec0: NTB device registered
[43817.032047] switchtec switchtec0: stuser_create: 0000000070070c46
[43817.032052] switchtec switchtec0: switchtec_dev_open: 0000000070070c46
[43817.032212] switchtec switchtec0: stuser_free: 0000000070070c46
[43855.950121] switchtec switchtec0: message: 0 00000004
[43952.022182] switchtec switchtec0: stuser_create: 00000000154869b1
[43952.022188] switchtec switchtec0: switchtec_dev_open: 00000000154869b1
[43952.022422] switchtec switchtec0: stuser_free: 00000000154869b1
[43952.095507] switchtec switchtec0: stuser_create: 00000000760ad51f
[43952.095512] switchtec switchtec0: switchtec_dev_open: 00000000760ad51f
[43952.095681] switchtec switchtec0: stuser_free: 00000000760ad51f
[43978.547393] Software Queue-Pair Transport over NTB, version 4
[43978.548191] switchtec switchtec0: enabling link
[43980.123918] switchtec switchtec0: message: 0 00000001
[43980.124031] switchtec switchtec0: message: 0 00000003
[43980.124110] switchtec switchtec0: ntb link up
[43980.124232] switchtec 0000:3b:00.1: Remote version = 4
[43980.124242] switchtec 0000:3b:00.1: Remote max number of qps = 2
[43980.124251] switchtec 0000:3b:00.1: Remote number of mws = 2
[43980.124261] switchtec 0000:3b:00.1: Remote MW0 size = 0x200000
[43980.124847] switchtec switchtec0: MW 0: part 0 addr
0x00000010d2200000 size 0x0000000000200000
[43980.227931] switchtec 0000:3b:00.1: Remote MW1 size = 0x200000
[43980.228583] switchtec switchtec0: MW 1: part 0 addr
0x00000010d1800000 size 0x0000000000200000
[43983.250163] switchtec switchtec0: stuser_create: 00000000992004c6
[43983.250168] switchtec switchtec0: switchtec_dev_open: 00000000992004c6
[43983.250326] switchtec switchtec0: stuser_free: 00000000992004c6
[43987.894388] switchtec switchtec0: stuser_create: 00000000ed0f26a7
[43987.894394] switchtec switchtec0: switchtec_dev_open: 00000000ed0f26a7
[43987.894556] switchtec switchtec0: stuser_free: 00000000ed0f26a7
[44003.236857] switchtec 0000:3b:00.1: Using DMA memcpy for TX
[44003.236862] switchtec 0000:3b:00.1: Using DMA memcpy for RX
[44003.239410] switchtec 0000:3b:00.1: NTB Transport QP 0 created
[44003.243226] switchtec 0000:3b:00.1: eth0 created
[44003.248706] ntb_netdev ntb_netdev0 priv0: renamed from eth0
...
[44104.815744] switchtec switchtec0: stuser_create: 000000007ed96477
[44104.815751] switchtec switchtec0: switchtec_dev_open: 000000007ed96477
[44104.816022] switchtec switchtec0: stuser_free: 000000007ed96477
[44104.836171] switchtec switchtec0: stuser_create: 00000000199ff872
[44104.836175] switchtec switchtec0: switchtec_dev_open: 00000000199ff872
[44104.836345] switchtec switchtec0: stuser_free: 00000000199ff872
[44104.858108] switchtec switchtec0: stuser_create: 00000000d04b93c8
[44104.858113] switchtec switchtec0: switchtec_dev_open: 00000000d04b93c8
[44104.858264] switchtec switchtec0: stuser_free: 00000000d04b93c8
[44104.879259] switchtec switchtec0: stuser_create: 00000000b2930931
[44104.879263] switchtec switchtec0: switchtec_dev_open: 00000000b2930931
[44104.879425] switchtec switchtec0: stuser_free: 00000000b2930931
[44104.898693] switchtec switchtec0: stuser_create: 000000009f08a557
[44104.898697] switchtec switchtec0: switchtec_dev_open: 000000009f08a557
[44104.898856] switchtec switchtec0: stuser_free: 000000009f08a557
[44104.920456] switchtec switchtec0: stuser_create: 0000000092c59794
[44104.920461] switchtec switchtec0: switchtec_dev_open: 0000000092c59794
[44104.920626] switchtec switchtec0: stuser_free: 0000000092c59794
[44104.939554] switchtec switchtec0: stuser_create: 00000000778eabf3
[44104.939559] switchtec switchtec0: switchtec_dev_open: 00000000778eabf3
[44104.939735] switchtec switchtec0: stuser_free: 00000000778eabf3
[44104.961929] switchtec switchtec0: stuser_create: 0000000087ec0e77
[44104.961934] switchtec switchtec0: switchtec_dev_open: 0000000087ec0e77
[44104.962100] switchtec switchtec0: stuser_free: 0000000087ec0e77
[44104.983607] switchtec switchtec0: stuser_create: 000000001677deb0
[44104.983611] switchtec switchtec0: switchtec_dev_open: 000000001677deb0
[44104.983761] switchtec switchtec0: stuser_free: 000000001677deb0
[44105.004798] switchtec switchtec0: stuser_create: 00000000c505781c
[44105.004803] switchtec switchtec0: switchtec_dev_open: 00000000c505781c
[44105.004962] switchtec switchtec0: stuser_free: 00000000c505781c
[44105.024035] switchtec switchtec0: stuser_create: 00000000772c0769
[44105.024040] switchtec switchtec0: switchtec_dev_open: 00000000772c0769
[44105.024195] switchtec switchtec0: stuser_free: 00000000772c0769
[44105.045032] switchtec switchtec0: stuser_create: 00000000c48dc21b
[44105.045037] switchtec switchtec0: switchtec_dev_open: 00000000c48dc21b
[44105.045189] switchtec switchtec0: stuser_free: 00000000c48dc21b
[44105.066256] switchtec switchtec0: stuser_create: 000000004436beee
[44105.066261] switchtec switchtec0: switchtec_dev_open: 000000004436beee
[44105.066421] switchtec switchtec0: stuser_free: 000000004436beee
[44105.087705] switchtec switchtec0: stuser_create: 0000000052b38058
[44105.087710] switchtec switchtec0: switchtec_dev_open: 0000000052b38058
[44105.087920] switchtec switchtec0: stuser_free: 0000000052b38058
[44105.107527] switchtec switchtec0: stuser_create: 000000005b15e6d3
[44105.107532] switchtec switchtec0: switchtec_dev_open: 000000005b15e6d3
[44105.107718] switchtec switchtec0: stuser_free: 000000005b15e6d3
[44105.130832] switchtec switchtec0: stuser_create: 000000005379e8de
[44105.130836] switchtec switchtec0: switchtec_dev_open: 000000005379e8de
[44105.131069] switchtec switchtec0: stuser_free: 000000005379e8de
[44105.153557] switchtec switchtec0: stuser_create: 000000001671f313
[44105.153561] switchtec switchtec0: switchtec_dev_open: 000000001671f313
[44105.153711] switchtec switchtec0: stuser_free: 000000001671f313
[44105.175527] switchtec switchtec0: stuser_create: 00000000fdbb9319
[44105.175532] switchtec switchtec0: switchtec_dev_open: 00000000fdbb9319
[44105.175679] switchtec switchtec0: stuser_free: 00000000fdbb9319
[44105.196070] switchtec switchtec0: stuser_create: 000000002b86a7a3
[44105.196075] switchtec switchtec0: switchtec_dev_open: 000000002b86a7a3
[44105.196275] switchtec switchtec0: stuser_free: 000000002b86a7a3
[44105.218913] switchtec switchtec0: stuser_create: 0000000018a2de0b
[44105.218917] switchtec switchtec0: switchtec_dev_open: 0000000018a2de0b
[44105.219085] switchtec switchtec0: stuser_free: 0000000018a2de0b
[44105.240357] switchtec switchtec0: stuser_create: 00000000f712e033
[44105.240361] switchtec switchtec0: switchtec_dev_open: 00000000f712e033
[44105.240528] switchtec switchtec0: stuser_free: 00000000f712e033
[44105.261698] switchtec switchtec0: stuser_create: 0000000023550788
[44105.261702] switchtec switchtec0: switchtec_dev_open: 0000000023550788
[44105.261843] switchtec switchtec0: stuser_free: 0000000023550788
[44105.283558] switchtec switchtec0: stuser_create: 000000004724602a
[44105.283562] switchtec switchtec0: switchtec_dev_open: 000000004724602a
[44105.283714] switchtec switchtec0: stuser_free: 000000004724602a
[44105.305739] switchtec switchtec0: stuser_create: 00000000a925ed10
[44105.305744] switchtec switchtec0: switchtec_dev_open: 00000000a925ed10
[44105.305918] switchtec switchtec0: stuser_free: 00000000a925ed10
[44112.393253] switchtec switchtec0: stuser_create: 00000000fc673562
[44112.393259] switchtec switchtec0: switchtec_dev_open: 00000000fc673562
[44112.393435] switchtec switchtec0: stuser_free: 00000000fc673562
[44112.615032] switchtec switchtec0: stuser_create: 000000006fea71d7
[44112.615037] switchtec switchtec0: switchtec_dev_open: 000000006fea71d7
[44112.615193] switchtec switchtec0: stuser_free: 000000006fea71d7
[44112.644369] switchtec switchtec0: stuser_create: 00000000154869b1
[44112.644374] switchtec switchtec0: switchtec_dev_open: 00000000154869b1
[44112.644537] switchtec switchtec0: stuser_free: 00000000154869b1
[44221.644483] switchtec 0000:3b:00.1: Remote QP link status = 1
[44221.644488] switchtec 0000:3b:00.1: qp 0: Link Up
[44221.644493] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
[44221.644504] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44221.644508] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44221.644510] switchtec 0000:3b:00.1: done flag not set
[44221.661578] switchtec switchtec0: doorbell
[44221.661586] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44221.661590] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 90 flags 1
[44221.661594] switchtec 0000:3b:00.1: RX OK index 0 ver 0 size 90
into buf size 65524
[44221.661599] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
byte payload received
[44221.661661] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44221.661663] switchtec 0000:3b:00.1: done flag not set
[44221.661777] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44221.661780] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
[44221.661782] switchtec 0000:3b:00.1: done flag not set
[44221.676760] switchtec 0000:3b:00.1: qp 0: Send Link Down
[44221.676781] switchtec 0000:3b:00.1: Remote QP link status = 1
[44221.676784] switchtec 0000:3b:00.1: qp 0: Link Up
[44221.676788] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
[44221.676800] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[44221.676803] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 90 flags 0
[44221.676806] switchtec 0000:3b:00.1: done flag not set
[44249.359502] switchtec switchtec0: stuser_create: 00000000992004c6
[44249.359508] switchtec switchtec0: switchtec_dev_open: 00000000992004c6
[44249.359716] switchtec switchtec0: stuser_free: 00000000992004c6
[44249.414933] switchtec switchtec0: stuser_create: 000000009c6b42ce
[44249.414937] switchtec switchtec0: switchtec_dev_open: 000000009c6b42ce
[44249.415128] switchtec switchtec0: stuser_free: 000000009c6b42ce

# cat /sys/kernel/debug/ntb_transport/0000\:3b\:00.1/qp0/stats
NTB QP stats:

rx_bytes - 0
rx_pkts - 0
rx_memcpy - 0
rx_async - 0
rx_ring_empty - 1
rx_err_no_buf - 0
rx_err_oflow - 0
rx_err_ver - 0
rx_buff - 0x00000000fbb8882b
rx_index - 0
rx_max_entry - 31
rx_alloc_entry - 100

tx_bytes - 866
tx_pkts - 11
tx_memcpy - 11
tx_async - 0
tx_ring_full - 0
tx_err_no_buf - 0
tx_mw - 0x000000003c9d9493
tx_index (H) - 11
RRI (T) - 10
tx_max_entry - 31
free tx - 30

Using TX DMA - Yes
Using RX DMA - Yes
QP Link - Up

# ifconfig priv0
priv0     Link encap:Ethernet  HWaddr E6:BF:E8:6F:4F:A5
          inet addr:10.17.21.198  Bcast:10.17.21.199  Mask:255.255.255.252
          inet6 addr: fe80::e4bf:e8ff:fe6f:4fa5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:90 (90.0 B)  TX bytes:1496 (1.4 KiB)


The information from the kernel debug sysfs attribute files seems to
be consistent when this condition occurs on one side:
- tx_ring_full is constantly increasing
- "tx_index (H)" and "RRI (T)" are both 0

So perhaps data is filling the transmit buffer, but not being read
(received) by the other side?

In the cases where both controllers/nodes boot up and the NTB virtual
Ethernet comes up/functions normally, I can usually induce the problem
by perform a series of interface down/up's using this loop from one
side (controller/node):
peer_ip=<IP_OF_PEER>; delay=5; iter=1; while true; do echo "iteration
${iter}..."; echo down; ifconfig priv0 down; echo "sleep ${delay}";
sleep ${delay}; echo up; ifconfig priv0 up; sleep 2; ping -c 1 -w 1
${peer_ip} || break; echo "sleep ${delay}"; sleep ${delay}; iter=$((
iter + 1 )); echo; done

Sometimes it gets into this state after just one or a handful of
iterations, other times it can take 500+ iterations before it happens.

Any help/tips debugging this would be greatly appreciated. And if the
'netdev' mailing list is more appropriate for this inquiry, please
advise and I'll re-post there.

Thanks for your time/consideration.


--Marc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ntb_netdev Communication Failure Issue
  2022-02-16 16:16 ntb_netdev Communication Failure Issue Marc Smith
@ 2022-02-16 17:14 ` Logan Gunthorpe
  2022-02-17 15:22   ` Marc Smith
  0 siblings, 1 reply; 9+ messages in thread
From: Logan Gunthorpe @ 2022-02-16 17:14 UTC (permalink / raw)
  To: Marc Smith, ntb, Kelvin Cao, kelvincao

Hi Marc,


On 2022-02-16 9:16 a.m., Marc Smith wrote:
> Hi,
> 
> I'm using vanilla Linux 5.4.145 with a Celestica "cluster-in-a-box"
> system (two servers in a single chassis connected internally via PCIe
> switches). The PCIe switches are Microsemi Switchtec ('lspci' says
> "PMC-Sierra Inc. PM8546 B-FEIP PSX 96xG3 PCIe Storage Switch
> [11f8:8546]" but I believe this is not the current product/model name
> from Microsemi).
> 
> I use the 'ntb_netdev' driver for the virtual Ethernet functionality
> across the NTB for IP communication between the two controllers. I've
> had a long standing issue where sometimes when we reboot a controller,
> we're never able to establish communication between the two
> controllers again -- usually requires rebooting both and/or power
> cycling the entire chassis.
> 
> I've also noticed sometimes when they boot, the virtual Ethernet
> device fails to pass traffic right away.
> 
> And finally, when simply doing an "ifconfig NTB_IF down" followed by
> an "ifconfig NTB_IF up" on one controller, they sometimes get into the
> state where they no longer pass traffic.
> 
> All three conditions seem to be related, and the symptom is the same
> (they have the UP RUNNING flags with IP's configured but I'm unable to
> ping between the two controllers). Usually when this happens, one side
> will show the drop/error counts for the NTB virtual Ethernet interface
> increasing (never ending).
> 
> In the example below, both controllers booted up, and I unloaded the
> modules then reloaded with 'dyndbg=+p' but immediately was not able to
> pass any traffic on those interfaces.


I've copied Kelvin who does most of the work on the switchtec driver. I
suspect this is a bug in that driver.

I also know that there have been some fixes in this area in the OOT
version of that module (which will probably make it upstream in due course).

See this PR:

https://github.com/Microsemi/switchtec-kernel/pull/107

I wouldn't be surprised if this is the same issue you are seeing. The
patches there should apply pretty easily to the upstream kernel if you
prefer.

Logan



> Node/controller 1:
> [44101.188104] switchtec switchtec0: Management device registered.
> [44101.224290] switchtec switchtec1: Management device registered.
> [44101.224879] switchtec: loaded.
> [44108.104617] switchtec switchtec0: Partition ID 0 of 3
> [44108.104785] switchtec switchtec0: MWs: 2 direct, 32 lut
> [44108.104912] switchtec switchtec0: Peer MWs: 2 direct, 32 lut
> [44108.157019] switchtec switchtec0: Requester ID 00:00.0 -> BB:01.0
> [44108.157054] switchtec switchtec0: Requester ID 3A:00.0 -> BB:01.1
> [44108.208906] switchtec switchtec0: Using crosslink configuration
> [44108.208981] switchtec switchtec0: Crosslink BAR0 addr: 0
> [44108.209092] switchtec switchtec0: Crosslink BAR2 addr: 1000000000
> [44108.209167] switchtec switchtec0: Crosslink BAR4 addr: 2000000000
> [44108.468954] switchtec switchtec0: Requester ID 00:01.0 -> BB:01.4
> [44108.468986] switchtec switchtec0: Requester ID 00:01.1 -> BB:01.5
> [44108.521033] switchtec switchtec0: dbs: shift 0/0, mask 0fffffffffffffff
> [44108.625136] switchtec switchtec0: Shared MW Ready
> [44108.625163] switchtec switchtec0: irqs - event: 2, db: 0, msgs: 1
> [44108.626511] switchtec switchtec0: NTB device registered
> [44140.260589] switchtec switchtec0: stuser_create: 00000000df6a1e87
> [44140.260594] switchtec switchtec0: switchtec_dev_open: 00000000df6a1e87
> [44140.260743] switchtec switchtec0: stuser_free: 00000000df6a1e87
> [44150.277982] switchtec switchtec0: stuser_create: 00000000c4701c39
> [44150.277987] switchtec switchtec0: switchtec_dev_open: 00000000c4701c39
> [44150.278138] switchtec switchtec0: stuser_free: 00000000c4701c39
> [44231.225116] switchtec switchtec0: message: 0 00000001
> [44232.799538] Software Queue-Pair Transport over NTB, version 4
> [44232.800780] switchtec switchtec0: enabling link
> [44232.800893] switchtec 0000:3b:00.1: Remote version = 0
> [44232.800903] switchtec switchtec0: ntb link up
> [44232.801001] switchtec switchtec0: message: 0 00000003
> [44232.812419] switchtec 0000:3b:00.1: Remote version = 4
> [44232.812430] switchtec 0000:3b:00.1: Remote max number of qps = 2
> [44232.812439] switchtec 0000:3b:00.1: Remote number of mws = 2
> [44232.812449] switchtec 0000:3b:00.1: Remote MW0 size = 0x200000
> [44232.813114] switchtec switchtec0: MW 0: part 0 addr
> 0x0000001071600000 size 0x0000000000200000
> [44232.916884] switchtec 0000:3b:00.1: Remote MW1 size = 0x200000
> [44232.917452] switchtec switchtec0: MW 1: part 0 addr
> 0x0000001014e00000 size 0x0000000000200000
> [44257.513987] switchtec 0000:3b:00.1: Using DMA memcpy for TX
> [44257.513992] switchtec 0000:3b:00.1: Using DMA memcpy for RX
> [44257.517993] switchtec 0000:3b:00.1: NTB Transport QP 0 created
> [44257.524071] switchtec 0000:3b:00.1: eth0 created
> [44257.533847] ntb_netdev ntb_netdev0 priv0: renamed from eth0
> ...
> [44344.794970] switchtec switchtec0: stuser_create: 00000000b79c4946
> [44344.794975] switchtec switchtec0: switchtec_dev_open: 00000000b79c4946
> [44344.795158] switchtec switchtec0: stuser_free: 00000000b79c4946
> [44344.871358] switchtec switchtec0: stuser_create: 000000009a68785f
> [44344.871363] switchtec switchtec0: switchtec_dev_open: 000000009a68785f
> [44344.871542] switchtec switchtec0: stuser_free: 000000009a68785f
> [44350.386529] switchtec switchtec0: stuser_create: 00000000a3644dc9
> [44350.386535] switchtec switchtec0: switchtec_dev_open: 00000000a3644dc9
> [44350.386721] switchtec switchtec0: stuser_free: 00000000a3644dc9
> [44350.416802] switchtec switchtec0: stuser_create: 00000000f2cc280a
> [44350.416807] switchtec switchtec0: switchtec_dev_open: 00000000f2cc280a
> [44350.416968] switchtec switchtec0: stuser_free: 00000000f2cc280a
> [44470.486355] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.496863] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.504156] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.514905] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.527490] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.537896] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.548884] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.559902] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.570890] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.581890] switchtec 0000:3b:00.1: Remote QP link status = 0
> [44470.591242] switchtec switchtec0: stuser_create: 00000000309d4e4c
> [44470.591248] switchtec switchtec0: switchtec_dev_open: 00000000309d4e4c
> [44470.591412] switchtec switchtec0: stuser_free: 00000000309d4e4c
> [44470.625330] switchtec switchtec0: stuser_create: 000000007e7c8c1c
> [44470.625335] switchtec switchtec0: switchtec_dev_open: 000000007e7c8c1c
> [44470.625511] switchtec switchtec0: stuser_free: 000000007e7c8c1c
> [44474.332907] switchtec 0000:3b:00.1: qp 0: Link Up
> [44474.332925] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
> [44474.332965] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44474.332977] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44474.332985] switchtec 0000:3b:00.1: done flag not set
> [44474.333020] IPv6: ADDRCONF(NETDEV_CHANGE): priv0: link becomes ready
> [44474.355200] switchtec switchtec0: doorbell
> [44474.355250] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44474.355263] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 3
> [44474.355272] switchtec 0000:3b:00.1: link down flag set
> [44474.355320] switchtec 0000:3b:00.1: qp 0: Link Cleanup
> [44474.355337] ntb_netdev ntb_netdev0 priv0: Event 0, Link 0
> [44474.355372] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44474.355382] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44474.355390] switchtec 0000:3b:00.1: done flag not set
> [44474.360458] switchtec switchtec0: doorbell
> [44474.365910] switchtec 0000:3b:00.1: qp 0: Link Up
> [44474.365923] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
> [44474.365961] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44474.365971] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 90 flags 1
> [44474.365984] switchtec 0000:3b:00.1: RX OK index 0 ver 0 size 90
> into buf size 65524
> [44474.365998] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> byte payload received
> [44474.366072] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44474.366079] switchtec 0000:3b:00.1: done flag not set
> [44474.366192] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44474.366195] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44474.366197] switchtec 0000:3b:00.1: done flag not set
> [44474.939439] switchtec switchtec0: doorbell
> [44474.939454] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44474.939458] switchtec 0000:3b:00.1: qp 0: RX ver 1 len 90 flags 1
> [44474.939462] switchtec 0000:3b:00.1: RX OK index 1 ver 1 size 90
> into buf size 65524
> [44474.939467] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> byte payload received
> [44474.939493] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44474.939495] switchtec 0000:3b:00.1: done flag not set
> [44474.939566] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44474.939569] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44474.939572] switchtec 0000:3b:00.1: done flag not set
> [44475.387462] switchtec switchtec0: doorbell
> [44475.387475] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44475.387478] switchtec 0000:3b:00.1: qp 0: RX ver 2 len 86 flags 1
> [44475.387482] switchtec 0000:3b:00.1: RX OK index 2 ver 2 size 86
> into buf size 65524
> [44475.387486] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 86
> byte payload received
> [44475.387505] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44475.387507] switchtec 0000:3b:00.1: done flag not set
> [44475.387561] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44475.387563] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44475.387566] switchtec 0000:3b:00.1: done flag not set
> [44476.411777] switchtec switchtec0: doorbell
> [44476.411792] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44476.411796] switchtec 0000:3b:00.1: qp 0: RX ver 3 len 90 flags 1
> [44476.411800] switchtec 0000:3b:00.1: RX OK index 3 ver 3 size 90
> into buf size 65524
> [44476.411807] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> byte payload received
> [44476.411835] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44476.411837] switchtec 0000:3b:00.1: done flag not set
> [44476.411907] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44476.411910] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44476.411912] switchtec 0000:3b:00.1: done flag not set
> [44476.412087] switchtec switchtec0: doorbell
> [44476.412091] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44476.412094] switchtec 0000:3b:00.1: qp 0: RX ver 4 len 70 flags 1
> [44476.412098] switchtec 0000:3b:00.1: RX OK index 4 ver 4 size 70
> into buf size 65524
> [44476.412102] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
> byte payload received
> [44476.412120] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44476.412122] switchtec 0000:3b:00.1: done flag not set
> [44476.412198] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44476.412201] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44476.412203] switchtec 0000:3b:00.1: done flag not set
> [44477.307472] switchtec switchtec0: doorbell
> [44477.307487] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44477.307490] switchtec 0000:3b:00.1: qp 0: RX ver 5 len 90 flags 1
> [44477.307494] switchtec 0000:3b:00.1: RX OK index 5 ver 5 size 90
> into buf size 65524
> [44477.307501] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> byte payload received
> [44477.307525] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44477.307528] switchtec 0000:3b:00.1: done flag not set
> [44477.307689] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44477.307692] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44477.307694] switchtec 0000:3b:00.1: done flag not set
> [44481.147576] switchtec switchtec0: doorbell
> [44481.147594] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44481.147597] switchtec 0000:3b:00.1: qp 0: RX ver 6 len 70 flags 1
> [44481.147601] switchtec 0000:3b:00.1: RX OK index 6 ver 6 size 70
> into buf size 65524
> [44481.147607] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
> byte payload received
> [44481.147631] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44481.147634] switchtec 0000:3b:00.1: done flag not set
> [44481.147712] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44481.147715] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44481.147718] switchtec 0000:3b:00.1: done flag not set
> [44489.851482] switchtec switchtec0: doorbell
> [44489.851498] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44489.851502] switchtec 0000:3b:00.1: qp 0: RX ver 7 len 70 flags 1
> [44489.851506] switchtec 0000:3b:00.1: RX OK index 7 ver 7 size 70
> into buf size 65524
> [44489.851511] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
> byte payload received
> [44489.851536] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44489.851538] switchtec 0000:3b:00.1: done flag not set
> [44489.851616] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44489.851619] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44489.851621] switchtec 0000:3b:00.1: done flag not set
> [44493.354390] switchtec switchtec0: stuser_create: 00000000098d03cb
> [44493.354395] switchtec switchtec0: switchtec_dev_open: 00000000098d03cb
> [44493.354560] switchtec switchtec0: stuser_free: 00000000098d03cb
> [44493.377810] switchtec switchtec0: stuser_create: 00000000cac0e0f9
> [44493.377815] switchtec switchtec0: switchtec_dev_open: 00000000cac0e0f9
> [44493.378029] switchtec switchtec0: stuser_free: 00000000cac0e0f9
> [44493.396554] switchtec switchtec0: stuser_create: 000000008fe7cb2b
> [44493.396559] switchtec switchtec0: switchtec_dev_open: 000000008fe7cb2b
> [44493.396719] switchtec switchtec0: stuser_free: 000000008fe7cb2b
> [44493.417150] switchtec switchtec0: stuser_create: 00000000e67348bf
> [44493.417154] switchtec switchtec0: switchtec_dev_open: 00000000e67348bf
> [44493.417323] switchtec switchtec0: stuser_free: 00000000e67348bf
> [44493.438786] switchtec switchtec0: stuser_create: 000000007384dd1d
> [44493.438791] switchtec switchtec0: switchtec_dev_open: 000000007384dd1d
> [44493.438969] switchtec switchtec0: stuser_free: 000000007384dd1d
> [44493.459467] switchtec switchtec0: stuser_create: 00000000af970cd7
> [44493.459471] switchtec switchtec0: switchtec_dev_open: 00000000af970cd7
> [44493.459637] switchtec switchtec0: stuser_free: 00000000af970cd7
> [44493.480517] switchtec switchtec0: stuser_create: 00000000abdf1426
> [44493.480522] switchtec switchtec0: switchtec_dev_open: 00000000abdf1426
> [44493.480672] switchtec switchtec0: stuser_free: 00000000abdf1426
> [44493.501712] switchtec switchtec0: stuser_create: 00000000e4221771
> [44493.501716] switchtec switchtec0: switchtec_dev_open: 00000000e4221771
> [44493.501882] switchtec switchtec0: stuser_free: 00000000e4221771
> [44493.528249] switchtec switchtec0: stuser_create: 00000000c9d7a68c
> [44493.528253] switchtec switchtec0: switchtec_dev_open: 00000000c9d7a68c
> [44493.528414] switchtec switchtec0: stuser_free: 00000000c9d7a68c
> [44493.555529] switchtec switchtec0: stuser_create: 00000000e415ce5d
> [44493.555533] switchtec switchtec0: switchtec_dev_open: 00000000e415ce5d
> [44493.555699] switchtec switchtec0: stuser_free: 00000000e415ce5d
> [44493.577118] switchtec switchtec0: stuser_create: 000000003bee7779
> [44493.577122] switchtec switchtec0: switchtec_dev_open: 000000003bee7779
> [44493.577276] switchtec switchtec0: stuser_free: 000000003bee7779
> [44493.598661] switchtec switchtec0: stuser_create: 00000000e76bc00f
> [44493.598668] switchtec switchtec0: switchtec_dev_open: 00000000e76bc00f
> [44493.598942] switchtec switchtec0: stuser_free: 00000000e76bc00f
> [44493.619690] switchtec switchtec0: stuser_create: 00000000e527a94d
> [44493.619694] switchtec switchtec0: switchtec_dev_open: 00000000e527a94d
> [44493.619841] switchtec switchtec0: stuser_free: 00000000e527a94d
> [44493.640966] switchtec switchtec0: stuser_create: 000000004978b777
> [44493.640970] switchtec switchtec0: switchtec_dev_open: 000000004978b777
> [44493.641134] switchtec switchtec0: stuser_free: 000000004978b777
> [44493.660708] switchtec switchtec0: stuser_create: 00000000061d84e0
> [44493.660712] switchtec switchtec0: switchtec_dev_open: 00000000061d84e0
> [44493.660914] switchtec switchtec0: stuser_free: 00000000061d84e0
> [44493.679806] switchtec switchtec0: stuser_create: 00000000ab4b5fd5
> [44493.679811] switchtec switchtec0: switchtec_dev_open: 00000000ab4b5fd5
> [44493.680007] switchtec switchtec0: stuser_free: 00000000ab4b5fd5
> [44493.701799] switchtec switchtec0: stuser_create: 00000000943768fe
> [44493.701804] switchtec switchtec0: switchtec_dev_open: 00000000943768fe
> [44493.701978] switchtec switchtec0: stuser_free: 00000000943768fe
> [44493.722465] switchtec switchtec0: stuser_create: 00000000727838fa
> [44493.722469] switchtec switchtec0: switchtec_dev_open: 00000000727838fa
> [44493.722633] switchtec switchtec0: stuser_free: 00000000727838fa
> [44493.744238] switchtec switchtec0: stuser_create: 000000008eb49672
> [44493.744242] switchtec switchtec0: switchtec_dev_open: 000000008eb49672
> [44493.744395] switchtec switchtec0: stuser_free: 000000008eb49672
> [44493.765595] switchtec switchtec0: stuser_create: 00000000ec4dcadf
> [44493.765599] switchtec switchtec0: switchtec_dev_open: 00000000ec4dcadf
> [44493.765766] switchtec switchtec0: stuser_free: 00000000ec4dcadf
> [44493.786396] switchtec switchtec0: stuser_create: 00000000b664b5c8
> [44493.786401] switchtec switchtec0: switchtec_dev_open: 00000000b664b5c8
> [44493.786554] switchtec switchtec0: stuser_free: 00000000b664b5c8
> [44493.808189] switchtec switchtec0: stuser_create: 000000000b620cc8
> [44493.808194] switchtec switchtec0: switchtec_dev_open: 000000000b620cc8
> [44493.808359] switchtec switchtec0: stuser_free: 000000000b620cc8
> [44493.826971] switchtec switchtec0: stuser_create: 00000000237afec0
> [44493.826976] switchtec switchtec0: switchtec_dev_open: 00000000237afec0
> [44493.827128] switchtec switchtec0: stuser_free: 00000000237afec0
> [44493.847269] switchtec switchtec0: stuser_create: 00000000766c005b
> [44493.847273] switchtec switchtec0: switchtec_dev_open: 00000000766c005b
> [44493.847474] switchtec switchtec0: stuser_free: 00000000766c005b
> [44500.851786] switchtec switchtec0: stuser_create: 000000009a68785f
> [44500.851792] switchtec switchtec0: switchtec_dev_open: 000000009a68785f
> [44500.851993] switchtec switchtec0: stuser_free: 000000009a68785f
> [44507.259767] switchtec switchtec0: doorbell
> [44507.259785] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44507.259790] switchtec 0000:3b:00.1: qp 0: RX ver 8 len 70 flags 1
> [44507.259795] switchtec 0000:3b:00.1: RX OK index 8 ver 8 size 70
> into buf size 65524
> [44507.259801] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
> byte payload received
> [44507.259828] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44507.259830] switchtec 0000:3b:00.1: done flag not set
> [44507.259894] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44507.259897] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44507.259899] switchtec 0000:3b:00.1: done flag not set
> 
> # cat /sys/kernel/debug/ntb_transport/0000\:3b\:00.1/qp0/stats
> NTB QP stats:
> 
> rx_bytes - 1244
> rx_pkts - 20
> rx_memcpy - 20
> rx_async - 0
> rx_ring_empty - 41
> rx_err_no_buf - 0
> rx_err_oflow - 0
> rx_err_ver - 0
> rx_buff - 0x00000000fd3649f3
> rx_index - 20
> rx_max_entry - 31
> rx_alloc_entry - 100
> 
> tx_bytes - 0
> tx_pkts - 0
> tx_memcpy - 0
> tx_async - 0
> tx_ring_full - 212083112
> tx_err_no_buf - 0
> tx_mw - 0x0000000029c0837d
> tx_index (H) - 0
> RRI (T) - 0
> tx_max_entry - 31
> free tx - 31
> 
> Using TX DMA - Yes
> Using RX DMA - Yes
> QP Link - Up
> 
> # ifconfig priv0
> priv0     Link encap:Ethernet  HWaddr 1A:A5:C6:CD:F2:51
>           inet addr:10.17.21.197  Bcast:10.17.21.199  Mask:255.255.255.252
>           inet6 addr: fe80::18a5:c6ff:fecd:f251/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:26 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1 errors:276181510 dropped:276181510 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:1496 (1.4 KiB)  TX bytes:90 (90.0 B)
> 
> 
> Node/controller 2:
> [43808.457199] switchtec switchtec0: Management device registered.
> [43808.492076] switchtec switchtec1: Management device registered.
> [43808.492647] switchtec: loaded.
> [43815.711975] switchtec switchtec0: Partition ID 0 of 3
> [43815.712140] switchtec switchtec0: MWs: 2 direct, 32 lut
> [43815.712261] switchtec switchtec0: Peer MWs: 2 direct, 32 lut
> [43815.764006] switchtec switchtec0: Requester ID 00:00.0 -> BB:01.0
> [43815.764033] switchtec switchtec0: Requester ID 3A:00.0 -> BB:01.1
> [43815.815913] switchtec switchtec0: Using crosslink configuration
> [43815.815978] switchtec switchtec0: Crosslink BAR0 addr: 0
> [43815.816091] switchtec switchtec0: Crosslink BAR2 addr: 1000000000
> [43815.816155] switchtec switchtec0: Crosslink BAR4 addr: 2000000000
> [43816.075968] switchtec switchtec0: Requester ID 00:01.0 -> BB:01.4
> [43816.076000] switchtec switchtec0: Requester ID 00:01.1 -> BB:01.5
> [43816.128185] switchtec switchtec0: dbs: shift 0/0, mask 0fffffffffffffff
> [43816.232166] switchtec switchtec0: Shared MW Ready
> [43816.232193] switchtec switchtec0: irqs - event: 2, db: 0, msgs: 1
> [43816.233520] switchtec switchtec0: NTB device registered
> [43817.032047] switchtec switchtec0: stuser_create: 0000000070070c46
> [43817.032052] switchtec switchtec0: switchtec_dev_open: 0000000070070c46
> [43817.032212] switchtec switchtec0: stuser_free: 0000000070070c46
> [43855.950121] switchtec switchtec0: message: 0 00000004
> [43952.022182] switchtec switchtec0: stuser_create: 00000000154869b1
> [43952.022188] switchtec switchtec0: switchtec_dev_open: 00000000154869b1
> [43952.022422] switchtec switchtec0: stuser_free: 00000000154869b1
> [43952.095507] switchtec switchtec0: stuser_create: 00000000760ad51f
> [43952.095512] switchtec switchtec0: switchtec_dev_open: 00000000760ad51f
> [43952.095681] switchtec switchtec0: stuser_free: 00000000760ad51f
> [43978.547393] Software Queue-Pair Transport over NTB, version 4
> [43978.548191] switchtec switchtec0: enabling link
> [43980.123918] switchtec switchtec0: message: 0 00000001
> [43980.124031] switchtec switchtec0: message: 0 00000003
> [43980.124110] switchtec switchtec0: ntb link up
> [43980.124232] switchtec 0000:3b:00.1: Remote version = 4
> [43980.124242] switchtec 0000:3b:00.1: Remote max number of qps = 2
> [43980.124251] switchtec 0000:3b:00.1: Remote number of mws = 2
> [43980.124261] switchtec 0000:3b:00.1: Remote MW0 size = 0x200000
> [43980.124847] switchtec switchtec0: MW 0: part 0 addr
> 0x00000010d2200000 size 0x0000000000200000
> [43980.227931] switchtec 0000:3b:00.1: Remote MW1 size = 0x200000
> [43980.228583] switchtec switchtec0: MW 1: part 0 addr
> 0x00000010d1800000 size 0x0000000000200000
> [43983.250163] switchtec switchtec0: stuser_create: 00000000992004c6
> [43983.250168] switchtec switchtec0: switchtec_dev_open: 00000000992004c6
> [43983.250326] switchtec switchtec0: stuser_free: 00000000992004c6
> [43987.894388] switchtec switchtec0: stuser_create: 00000000ed0f26a7
> [43987.894394] switchtec switchtec0: switchtec_dev_open: 00000000ed0f26a7
> [43987.894556] switchtec switchtec0: stuser_free: 00000000ed0f26a7
> [44003.236857] switchtec 0000:3b:00.1: Using DMA memcpy for TX
> [44003.236862] switchtec 0000:3b:00.1: Using DMA memcpy for RX
> [44003.239410] switchtec 0000:3b:00.1: NTB Transport QP 0 created
> [44003.243226] switchtec 0000:3b:00.1: eth0 created
> [44003.248706] ntb_netdev ntb_netdev0 priv0: renamed from eth0
> ...
> [44104.815744] switchtec switchtec0: stuser_create: 000000007ed96477
> [44104.815751] switchtec switchtec0: switchtec_dev_open: 000000007ed96477
> [44104.816022] switchtec switchtec0: stuser_free: 000000007ed96477
> [44104.836171] switchtec switchtec0: stuser_create: 00000000199ff872
> [44104.836175] switchtec switchtec0: switchtec_dev_open: 00000000199ff872
> [44104.836345] switchtec switchtec0: stuser_free: 00000000199ff872
> [44104.858108] switchtec switchtec0: stuser_create: 00000000d04b93c8
> [44104.858113] switchtec switchtec0: switchtec_dev_open: 00000000d04b93c8
> [44104.858264] switchtec switchtec0: stuser_free: 00000000d04b93c8
> [44104.879259] switchtec switchtec0: stuser_create: 00000000b2930931
> [44104.879263] switchtec switchtec0: switchtec_dev_open: 00000000b2930931
> [44104.879425] switchtec switchtec0: stuser_free: 00000000b2930931
> [44104.898693] switchtec switchtec0: stuser_create: 000000009f08a557
> [44104.898697] switchtec switchtec0: switchtec_dev_open: 000000009f08a557
> [44104.898856] switchtec switchtec0: stuser_free: 000000009f08a557
> [44104.920456] switchtec switchtec0: stuser_create: 0000000092c59794
> [44104.920461] switchtec switchtec0: switchtec_dev_open: 0000000092c59794
> [44104.920626] switchtec switchtec0: stuser_free: 0000000092c59794
> [44104.939554] switchtec switchtec0: stuser_create: 00000000778eabf3
> [44104.939559] switchtec switchtec0: switchtec_dev_open: 00000000778eabf3
> [44104.939735] switchtec switchtec0: stuser_free: 00000000778eabf3
> [44104.961929] switchtec switchtec0: stuser_create: 0000000087ec0e77
> [44104.961934] switchtec switchtec0: switchtec_dev_open: 0000000087ec0e77
> [44104.962100] switchtec switchtec0: stuser_free: 0000000087ec0e77
> [44104.983607] switchtec switchtec0: stuser_create: 000000001677deb0
> [44104.983611] switchtec switchtec0: switchtec_dev_open: 000000001677deb0
> [44104.983761] switchtec switchtec0: stuser_free: 000000001677deb0
> [44105.004798] switchtec switchtec0: stuser_create: 00000000c505781c
> [44105.004803] switchtec switchtec0: switchtec_dev_open: 00000000c505781c
> [44105.004962] switchtec switchtec0: stuser_free: 00000000c505781c
> [44105.024035] switchtec switchtec0: stuser_create: 00000000772c0769
> [44105.024040] switchtec switchtec0: switchtec_dev_open: 00000000772c0769
> [44105.024195] switchtec switchtec0: stuser_free: 00000000772c0769
> [44105.045032] switchtec switchtec0: stuser_create: 00000000c48dc21b
> [44105.045037] switchtec switchtec0: switchtec_dev_open: 00000000c48dc21b
> [44105.045189] switchtec switchtec0: stuser_free: 00000000c48dc21b
> [44105.066256] switchtec switchtec0: stuser_create: 000000004436beee
> [44105.066261] switchtec switchtec0: switchtec_dev_open: 000000004436beee
> [44105.066421] switchtec switchtec0: stuser_free: 000000004436beee
> [44105.087705] switchtec switchtec0: stuser_create: 0000000052b38058
> [44105.087710] switchtec switchtec0: switchtec_dev_open: 0000000052b38058
> [44105.087920] switchtec switchtec0: stuser_free: 0000000052b38058
> [44105.107527] switchtec switchtec0: stuser_create: 000000005b15e6d3
> [44105.107532] switchtec switchtec0: switchtec_dev_open: 000000005b15e6d3
> [44105.107718] switchtec switchtec0: stuser_free: 000000005b15e6d3
> [44105.130832] switchtec switchtec0: stuser_create: 000000005379e8de
> [44105.130836] switchtec switchtec0: switchtec_dev_open: 000000005379e8de
> [44105.131069] switchtec switchtec0: stuser_free: 000000005379e8de
> [44105.153557] switchtec switchtec0: stuser_create: 000000001671f313
> [44105.153561] switchtec switchtec0: switchtec_dev_open: 000000001671f313
> [44105.153711] switchtec switchtec0: stuser_free: 000000001671f313
> [44105.175527] switchtec switchtec0: stuser_create: 00000000fdbb9319
> [44105.175532] switchtec switchtec0: switchtec_dev_open: 00000000fdbb9319
> [44105.175679] switchtec switchtec0: stuser_free: 00000000fdbb9319
> [44105.196070] switchtec switchtec0: stuser_create: 000000002b86a7a3
> [44105.196075] switchtec switchtec0: switchtec_dev_open: 000000002b86a7a3
> [44105.196275] switchtec switchtec0: stuser_free: 000000002b86a7a3
> [44105.218913] switchtec switchtec0: stuser_create: 0000000018a2de0b
> [44105.218917] switchtec switchtec0: switchtec_dev_open: 0000000018a2de0b
> [44105.219085] switchtec switchtec0: stuser_free: 0000000018a2de0b
> [44105.240357] switchtec switchtec0: stuser_create: 00000000f712e033
> [44105.240361] switchtec switchtec0: switchtec_dev_open: 00000000f712e033
> [44105.240528] switchtec switchtec0: stuser_free: 00000000f712e033
> [44105.261698] switchtec switchtec0: stuser_create: 0000000023550788
> [44105.261702] switchtec switchtec0: switchtec_dev_open: 0000000023550788
> [44105.261843] switchtec switchtec0: stuser_free: 0000000023550788
> [44105.283558] switchtec switchtec0: stuser_create: 000000004724602a
> [44105.283562] switchtec switchtec0: switchtec_dev_open: 000000004724602a
> [44105.283714] switchtec switchtec0: stuser_free: 000000004724602a
> [44105.305739] switchtec switchtec0: stuser_create: 00000000a925ed10
> [44105.305744] switchtec switchtec0: switchtec_dev_open: 00000000a925ed10
> [44105.305918] switchtec switchtec0: stuser_free: 00000000a925ed10
> [44112.393253] switchtec switchtec0: stuser_create: 00000000fc673562
> [44112.393259] switchtec switchtec0: switchtec_dev_open: 00000000fc673562
> [44112.393435] switchtec switchtec0: stuser_free: 00000000fc673562
> [44112.615032] switchtec switchtec0: stuser_create: 000000006fea71d7
> [44112.615037] switchtec switchtec0: switchtec_dev_open: 000000006fea71d7
> [44112.615193] switchtec switchtec0: stuser_free: 000000006fea71d7
> [44112.644369] switchtec switchtec0: stuser_create: 00000000154869b1
> [44112.644374] switchtec switchtec0: switchtec_dev_open: 00000000154869b1
> [44112.644537] switchtec switchtec0: stuser_free: 00000000154869b1
> [44221.644483] switchtec 0000:3b:00.1: Remote QP link status = 1
> [44221.644488] switchtec 0000:3b:00.1: qp 0: Link Up
> [44221.644493] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
> [44221.644504] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44221.644508] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44221.644510] switchtec 0000:3b:00.1: done flag not set
> [44221.661578] switchtec switchtec0: doorbell
> [44221.661586] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44221.661590] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 90 flags 1
> [44221.661594] switchtec 0000:3b:00.1: RX OK index 0 ver 0 size 90
> into buf size 65524
> [44221.661599] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> byte payload received
> [44221.661661] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44221.661663] switchtec 0000:3b:00.1: done flag not set
> [44221.661777] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44221.661780] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> [44221.661782] switchtec 0000:3b:00.1: done flag not set
> [44221.676760] switchtec 0000:3b:00.1: qp 0: Send Link Down
> [44221.676781] switchtec 0000:3b:00.1: Remote QP link status = 1
> [44221.676784] switchtec 0000:3b:00.1: qp 0: Link Up
> [44221.676788] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
> [44221.676800] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> [44221.676803] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 90 flags 0
> [44221.676806] switchtec 0000:3b:00.1: done flag not set
> [44249.359502] switchtec switchtec0: stuser_create: 00000000992004c6
> [44249.359508] switchtec switchtec0: switchtec_dev_open: 00000000992004c6
> [44249.359716] switchtec switchtec0: stuser_free: 00000000992004c6
> [44249.414933] switchtec switchtec0: stuser_create: 000000009c6b42ce
> [44249.414937] switchtec switchtec0: switchtec_dev_open: 000000009c6b42ce
> [44249.415128] switchtec switchtec0: stuser_free: 000000009c6b42ce
> 
> # cat /sys/kernel/debug/ntb_transport/0000\:3b\:00.1/qp0/stats
> NTB QP stats:
> 
> rx_bytes - 0
> rx_pkts - 0
> rx_memcpy - 0
> rx_async - 0
> rx_ring_empty - 1
> rx_err_no_buf - 0
> rx_err_oflow - 0
> rx_err_ver - 0
> rx_buff - 0x00000000fbb8882b
> rx_index - 0
> rx_max_entry - 31
> rx_alloc_entry - 100
> 
> tx_bytes - 866
> tx_pkts - 11
> tx_memcpy - 11
> tx_async - 0
> tx_ring_full - 0
> tx_err_no_buf - 0
> tx_mw - 0x000000003c9d9493
> tx_index (H) - 11
> RRI (T) - 10
> tx_max_entry - 31
> free tx - 30
> 
> Using TX DMA - Yes
> Using RX DMA - Yes
> QP Link - Up
> 
> # ifconfig priv0
> priv0     Link encap:Ethernet  HWaddr E6:BF:E8:6F:4F:A5
>           inet addr:10.17.21.198  Bcast:10.17.21.199  Mask:255.255.255.252
>           inet6 addr: fe80::e4bf:e8ff:fe6f:4fa5/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:90 (90.0 B)  TX bytes:1496 (1.4 KiB)
> 
> 
> The information from the kernel debug sysfs attribute files seems to
> be consistent when this condition occurs on one side:
> - tx_ring_full is constantly increasing
> - "tx_index (H)" and "RRI (T)" are both 0
> 
> So perhaps data is filling the transmit buffer, but not being read
> (received) by the other side?
> 
> In the cases where both controllers/nodes boot up and the NTB virtual
> Ethernet comes up/functions normally, I can usually induce the problem
> by perform a series of interface down/up's using this loop from one
> side (controller/node):
> peer_ip=<IP_OF_PEER>; delay=5; iter=1; while true; do echo "iteration
> ${iter}..."; echo down; ifconfig priv0 down; echo "sleep ${delay}";
> sleep ${delay}; echo up; ifconfig priv0 up; sleep 2; ping -c 1 -w 1
> ${peer_ip} || break; echo "sleep ${delay}"; sleep ${delay}; iter=$((
> iter + 1 )); echo; done
> 
> Sometimes it gets into this state after just one or a handful of
> iterations, other times it can take 500+ iterations before it happens.
> 
> Any help/tips debugging this would be greatly appreciated. And if the
> 'netdev' mailing list is more appropriate for this inquiry, please
> advise and I'll re-post there.
> 
> Thanks for your time/consideration.
> 
> 
> --Marc
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ntb_netdev Communication Failure Issue
  2022-02-16 17:14 ` Logan Gunthorpe
@ 2022-02-17 15:22   ` Marc Smith
  2022-02-17 16:31     ` Eric Pilmore
  2022-02-17 16:49     ` Logan Gunthorpe
  0 siblings, 2 replies; 9+ messages in thread
From: Marc Smith @ 2022-02-17 15:22 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: ntb, Kelvin Cao, kelvincao

On Wed, Feb 16, 2022 at 12:14 PM Logan Gunthorpe <logang@deltatee.com> wrote:
>
> Hi Marc,
>
>
> On 2022-02-16 9:16 a.m., Marc Smith wrote:
> > Hi,
> >
> > I'm using vanilla Linux 5.4.145 with a Celestica "cluster-in-a-box"
> > system (two servers in a single chassis connected internally via PCIe
> > switches). The PCIe switches are Microsemi Switchtec ('lspci' says
> > "PMC-Sierra Inc. PM8546 B-FEIP PSX 96xG3 PCIe Storage Switch
> > [11f8:8546]" but I believe this is not the current product/model name
> > from Microsemi).
> >
> > I use the 'ntb_netdev' driver for the virtual Ethernet functionality
> > across the NTB for IP communication between the two controllers. I've
> > had a long standing issue where sometimes when we reboot a controller,
> > we're never able to establish communication between the two
> > controllers again -- usually requires rebooting both and/or power
> > cycling the entire chassis.
> >
> > I've also noticed sometimes when they boot, the virtual Ethernet
> > device fails to pass traffic right away.
> >
> > And finally, when simply doing an "ifconfig NTB_IF down" followed by
> > an "ifconfig NTB_IF up" on one controller, they sometimes get into the
> > state where they no longer pass traffic.
> >
> > All three conditions seem to be related, and the symptom is the same
> > (they have the UP RUNNING flags with IP's configured but I'm unable to
> > ping between the two controllers). Usually when this happens, one side
> > will show the drop/error counts for the NTB virtual Ethernet interface
> > increasing (never ending).
> >
> > In the example below, both controllers booted up, and I unloaded the
> > modules then reloaded with 'dyndbg=+p' but immediately was not able to
> > pass any traffic on those interfaces.
>
>
> I've copied Kelvin who does most of the work on the switchtec driver. I
> suspect this is a bug in that driver.
>
> I also know that there have been some fixes in this area in the OOT
> version of that module (which will probably make it upstream in due course).
>
> See this PR:
>
> https://github.com/Microsemi/switchtec-kernel/pull/107
>
> I wouldn't be surprised if this is the same issue you are seeing. The
> patches there should apply pretty easily to the upstream kernel if you
> prefer.

Thanks Logan. I used the driver source from
"https://github.com/Microsemi/switchtec-kernel/tree/master" and the
problem still persists. On one side tx_ring_full continues to climb
and the errors/dropped counts continue to rise for the virtual
Ethernet interface:

# cat /sys/kernel/debug/ntb_transport/0000\:3b\:00.1/qp0/stats

NTB QP stats:

rx_bytes - 0
rx_pkts - 0
rx_memcpy - 0
rx_async - 0
rx_ring_empty - 8
rx_err_no_buf - 0
rx_err_oflow - 0
rx_err_ver - 0
rx_buff - 0x00000000e7a16187
rx_index - 0
rx_max_entry - 31
rx_alloc_entry - 100

tx_bytes - 0
tx_pkts - 0
tx_memcpy - 0
tx_async - 0
tx_ring_full - 23809328
tx_err_no_buf - 0
tx_mw - 0x0000000029c0837d
tx_index (H) - 0
RRI (T) - 0
tx_max_entry - 31
free tx - 31

Using TX DMA - Yes
Using RX DMA - Yes
QP Link - Up

# ifconfig priv0
priv0     Link encap:Ethernet  HWaddr 32:15:21:03:C0:31
          inet addr:10.17.21.197  Bcast:10.17.21.199  Mask:255.255.255.252
          inet6 addr: fe80::3015:21ff:fe03:c031/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65510  Metric:1
          RX packets:18 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1 errors:321933488 dropped:321933488 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1188 (1.1 KiB)  TX bytes:90 (90.0 B)

Is it possible there are some special NTB/BIOS settings needed for
this hardware?


Thanks,

Marc


>
> Logan
>
>
>
> > Node/controller 1:
> > [44101.188104] switchtec switchtec0: Management device registered.
> > [44101.224290] switchtec switchtec1: Management device registered.
> > [44101.224879] switchtec: loaded.
> > [44108.104617] switchtec switchtec0: Partition ID 0 of 3
> > [44108.104785] switchtec switchtec0: MWs: 2 direct, 32 lut
> > [44108.104912] switchtec switchtec0: Peer MWs: 2 direct, 32 lut
> > [44108.157019] switchtec switchtec0: Requester ID 00:00.0 -> BB:01.0
> > [44108.157054] switchtec switchtec0: Requester ID 3A:00.0 -> BB:01.1
> > [44108.208906] switchtec switchtec0: Using crosslink configuration
> > [44108.208981] switchtec switchtec0: Crosslink BAR0 addr: 0
> > [44108.209092] switchtec switchtec0: Crosslink BAR2 addr: 1000000000
> > [44108.209167] switchtec switchtec0: Crosslink BAR4 addr: 2000000000
> > [44108.468954] switchtec switchtec0: Requester ID 00:01.0 -> BB:01.4
> > [44108.468986] switchtec switchtec0: Requester ID 00:01.1 -> BB:01.5
> > [44108.521033] switchtec switchtec0: dbs: shift 0/0, mask 0fffffffffffffff
> > [44108.625136] switchtec switchtec0: Shared MW Ready
> > [44108.625163] switchtec switchtec0: irqs - event: 2, db: 0, msgs: 1
> > [44108.626511] switchtec switchtec0: NTB device registered
> > [44140.260589] switchtec switchtec0: stuser_create: 00000000df6a1e87
> > [44140.260594] switchtec switchtec0: switchtec_dev_open: 00000000df6a1e87
> > [44140.260743] switchtec switchtec0: stuser_free: 00000000df6a1e87
> > [44150.277982] switchtec switchtec0: stuser_create: 00000000c4701c39
> > [44150.277987] switchtec switchtec0: switchtec_dev_open: 00000000c4701c39
> > [44150.278138] switchtec switchtec0: stuser_free: 00000000c4701c39
> > [44231.225116] switchtec switchtec0: message: 0 00000001
> > [44232.799538] Software Queue-Pair Transport over NTB, version 4
> > [44232.800780] switchtec switchtec0: enabling link
> > [44232.800893] switchtec 0000:3b:00.1: Remote version = 0
> > [44232.800903] switchtec switchtec0: ntb link up
> > [44232.801001] switchtec switchtec0: message: 0 00000003
> > [44232.812419] switchtec 0000:3b:00.1: Remote version = 4
> > [44232.812430] switchtec 0000:3b:00.1: Remote max number of qps = 2
> > [44232.812439] switchtec 0000:3b:00.1: Remote number of mws = 2
> > [44232.812449] switchtec 0000:3b:00.1: Remote MW0 size = 0x200000
> > [44232.813114] switchtec switchtec0: MW 0: part 0 addr
> > 0x0000001071600000 size 0x0000000000200000
> > [44232.916884] switchtec 0000:3b:00.1: Remote MW1 size = 0x200000
> > [44232.917452] switchtec switchtec0: MW 1: part 0 addr
> > 0x0000001014e00000 size 0x0000000000200000
> > [44257.513987] switchtec 0000:3b:00.1: Using DMA memcpy for TX
> > [44257.513992] switchtec 0000:3b:00.1: Using DMA memcpy for RX
> > [44257.517993] switchtec 0000:3b:00.1: NTB Transport QP 0 created
> > [44257.524071] switchtec 0000:3b:00.1: eth0 created
> > [44257.533847] ntb_netdev ntb_netdev0 priv0: renamed from eth0
> > ...
> > [44344.794970] switchtec switchtec0: stuser_create: 00000000b79c4946
> > [44344.794975] switchtec switchtec0: switchtec_dev_open: 00000000b79c4946
> > [44344.795158] switchtec switchtec0: stuser_free: 00000000b79c4946
> > [44344.871358] switchtec switchtec0: stuser_create: 000000009a68785f
> > [44344.871363] switchtec switchtec0: switchtec_dev_open: 000000009a68785f
> > [44344.871542] switchtec switchtec0: stuser_free: 000000009a68785f
> > [44350.386529] switchtec switchtec0: stuser_create: 00000000a3644dc9
> > [44350.386535] switchtec switchtec0: switchtec_dev_open: 00000000a3644dc9
> > [44350.386721] switchtec switchtec0: stuser_free: 00000000a3644dc9
> > [44350.416802] switchtec switchtec0: stuser_create: 00000000f2cc280a
> > [44350.416807] switchtec switchtec0: switchtec_dev_open: 00000000f2cc280a
> > [44350.416968] switchtec switchtec0: stuser_free: 00000000f2cc280a
> > [44470.486355] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.496863] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.504156] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.514905] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.527490] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.537896] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.548884] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.559902] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.570890] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.581890] switchtec 0000:3b:00.1: Remote QP link status = 0
> > [44470.591242] switchtec switchtec0: stuser_create: 00000000309d4e4c
> > [44470.591248] switchtec switchtec0: switchtec_dev_open: 00000000309d4e4c
> > [44470.591412] switchtec switchtec0: stuser_free: 00000000309d4e4c
> > [44470.625330] switchtec switchtec0: stuser_create: 000000007e7c8c1c
> > [44470.625335] switchtec switchtec0: switchtec_dev_open: 000000007e7c8c1c
> > [44470.625511] switchtec switchtec0: stuser_free: 000000007e7c8c1c
> > [44474.332907] switchtec 0000:3b:00.1: qp 0: Link Up
> > [44474.332925] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
> > [44474.332965] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44474.332977] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44474.332985] switchtec 0000:3b:00.1: done flag not set
> > [44474.333020] IPv6: ADDRCONF(NETDEV_CHANGE): priv0: link becomes ready
> > [44474.355200] switchtec switchtec0: doorbell
> > [44474.355250] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44474.355263] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 3
> > [44474.355272] switchtec 0000:3b:00.1: link down flag set
> > [44474.355320] switchtec 0000:3b:00.1: qp 0: Link Cleanup
> > [44474.355337] ntb_netdev ntb_netdev0 priv0: Event 0, Link 0
> > [44474.355372] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44474.355382] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44474.355390] switchtec 0000:3b:00.1: done flag not set
> > [44474.360458] switchtec switchtec0: doorbell
> > [44474.365910] switchtec 0000:3b:00.1: qp 0: Link Up
> > [44474.365923] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
> > [44474.365961] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44474.365971] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 90 flags 1
> > [44474.365984] switchtec 0000:3b:00.1: RX OK index 0 ver 0 size 90
> > into buf size 65524
> > [44474.365998] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> > byte payload received
> > [44474.366072] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44474.366079] switchtec 0000:3b:00.1: done flag not set
> > [44474.366192] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44474.366195] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44474.366197] switchtec 0000:3b:00.1: done flag not set
> > [44474.939439] switchtec switchtec0: doorbell
> > [44474.939454] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44474.939458] switchtec 0000:3b:00.1: qp 0: RX ver 1 len 90 flags 1
> > [44474.939462] switchtec 0000:3b:00.1: RX OK index 1 ver 1 size 90
> > into buf size 65524
> > [44474.939467] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> > byte payload received
> > [44474.939493] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44474.939495] switchtec 0000:3b:00.1: done flag not set
> > [44474.939566] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44474.939569] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44474.939572] switchtec 0000:3b:00.1: done flag not set
> > [44475.387462] switchtec switchtec0: doorbell
> > [44475.387475] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44475.387478] switchtec 0000:3b:00.1: qp 0: RX ver 2 len 86 flags 1
> > [44475.387482] switchtec 0000:3b:00.1: RX OK index 2 ver 2 size 86
> > into buf size 65524
> > [44475.387486] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 86
> > byte payload received
> > [44475.387505] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44475.387507] switchtec 0000:3b:00.1: done flag not set
> > [44475.387561] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44475.387563] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44475.387566] switchtec 0000:3b:00.1: done flag not set
> > [44476.411777] switchtec switchtec0: doorbell
> > [44476.411792] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44476.411796] switchtec 0000:3b:00.1: qp 0: RX ver 3 len 90 flags 1
> > [44476.411800] switchtec 0000:3b:00.1: RX OK index 3 ver 3 size 90
> > into buf size 65524
> > [44476.411807] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> > byte payload received
> > [44476.411835] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44476.411837] switchtec 0000:3b:00.1: done flag not set
> > [44476.411907] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44476.411910] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44476.411912] switchtec 0000:3b:00.1: done flag not set
> > [44476.412087] switchtec switchtec0: doorbell
> > [44476.412091] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44476.412094] switchtec 0000:3b:00.1: qp 0: RX ver 4 len 70 flags 1
> > [44476.412098] switchtec 0000:3b:00.1: RX OK index 4 ver 4 size 70
> > into buf size 65524
> > [44476.412102] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
> > byte payload received
> > [44476.412120] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44476.412122] switchtec 0000:3b:00.1: done flag not set
> > [44476.412198] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44476.412201] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44476.412203] switchtec 0000:3b:00.1: done flag not set
> > [44477.307472] switchtec switchtec0: doorbell
> > [44477.307487] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44477.307490] switchtec 0000:3b:00.1: qp 0: RX ver 5 len 90 flags 1
> > [44477.307494] switchtec 0000:3b:00.1: RX OK index 5 ver 5 size 90
> > into buf size 65524
> > [44477.307501] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> > byte payload received
> > [44477.307525] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44477.307528] switchtec 0000:3b:00.1: done flag not set
> > [44477.307689] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44477.307692] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44477.307694] switchtec 0000:3b:00.1: done flag not set
> > [44481.147576] switchtec switchtec0: doorbell
> > [44481.147594] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44481.147597] switchtec 0000:3b:00.1: qp 0: RX ver 6 len 70 flags 1
> > [44481.147601] switchtec 0000:3b:00.1: RX OK index 6 ver 6 size 70
> > into buf size 65524
> > [44481.147607] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
> > byte payload received
> > [44481.147631] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44481.147634] switchtec 0000:3b:00.1: done flag not set
> > [44481.147712] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44481.147715] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44481.147718] switchtec 0000:3b:00.1: done flag not set
> > [44489.851482] switchtec switchtec0: doorbell
> > [44489.851498] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44489.851502] switchtec 0000:3b:00.1: qp 0: RX ver 7 len 70 flags 1
> > [44489.851506] switchtec 0000:3b:00.1: RX OK index 7 ver 7 size 70
> > into buf size 65524
> > [44489.851511] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
> > byte payload received
> > [44489.851536] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44489.851538] switchtec 0000:3b:00.1: done flag not set
> > [44489.851616] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44489.851619] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44489.851621] switchtec 0000:3b:00.1: done flag not set
> > [44493.354390] switchtec switchtec0: stuser_create: 00000000098d03cb
> > [44493.354395] switchtec switchtec0: switchtec_dev_open: 00000000098d03cb
> > [44493.354560] switchtec switchtec0: stuser_free: 00000000098d03cb
> > [44493.377810] switchtec switchtec0: stuser_create: 00000000cac0e0f9
> > [44493.377815] switchtec switchtec0: switchtec_dev_open: 00000000cac0e0f9
> > [44493.378029] switchtec switchtec0: stuser_free: 00000000cac0e0f9
> > [44493.396554] switchtec switchtec0: stuser_create: 000000008fe7cb2b
> > [44493.396559] switchtec switchtec0: switchtec_dev_open: 000000008fe7cb2b
> > [44493.396719] switchtec switchtec0: stuser_free: 000000008fe7cb2b
> > [44493.417150] switchtec switchtec0: stuser_create: 00000000e67348bf
> > [44493.417154] switchtec switchtec0: switchtec_dev_open: 00000000e67348bf
> > [44493.417323] switchtec switchtec0: stuser_free: 00000000e67348bf
> > [44493.438786] switchtec switchtec0: stuser_create: 000000007384dd1d
> > [44493.438791] switchtec switchtec0: switchtec_dev_open: 000000007384dd1d
> > [44493.438969] switchtec switchtec0: stuser_free: 000000007384dd1d
> > [44493.459467] switchtec switchtec0: stuser_create: 00000000af970cd7
> > [44493.459471] switchtec switchtec0: switchtec_dev_open: 00000000af970cd7
> > [44493.459637] switchtec switchtec0: stuser_free: 00000000af970cd7
> > [44493.480517] switchtec switchtec0: stuser_create: 00000000abdf1426
> > [44493.480522] switchtec switchtec0: switchtec_dev_open: 00000000abdf1426
> > [44493.480672] switchtec switchtec0: stuser_free: 00000000abdf1426
> > [44493.501712] switchtec switchtec0: stuser_create: 00000000e4221771
> > [44493.501716] switchtec switchtec0: switchtec_dev_open: 00000000e4221771
> > [44493.501882] switchtec switchtec0: stuser_free: 00000000e4221771
> > [44493.528249] switchtec switchtec0: stuser_create: 00000000c9d7a68c
> > [44493.528253] switchtec switchtec0: switchtec_dev_open: 00000000c9d7a68c
> > [44493.528414] switchtec switchtec0: stuser_free: 00000000c9d7a68c
> > [44493.555529] switchtec switchtec0: stuser_create: 00000000e415ce5d
> > [44493.555533] switchtec switchtec0: switchtec_dev_open: 00000000e415ce5d
> > [44493.555699] switchtec switchtec0: stuser_free: 00000000e415ce5d
> > [44493.577118] switchtec switchtec0: stuser_create: 000000003bee7779
> > [44493.577122] switchtec switchtec0: switchtec_dev_open: 000000003bee7779
> > [44493.577276] switchtec switchtec0: stuser_free: 000000003bee7779
> > [44493.598661] switchtec switchtec0: stuser_create: 00000000e76bc00f
> > [44493.598668] switchtec switchtec0: switchtec_dev_open: 00000000e76bc00f
> > [44493.598942] switchtec switchtec0: stuser_free: 00000000e76bc00f
> > [44493.619690] switchtec switchtec0: stuser_create: 00000000e527a94d
> > [44493.619694] switchtec switchtec0: switchtec_dev_open: 00000000e527a94d
> > [44493.619841] switchtec switchtec0: stuser_free: 00000000e527a94d
> > [44493.640966] switchtec switchtec0: stuser_create: 000000004978b777
> > [44493.640970] switchtec switchtec0: switchtec_dev_open: 000000004978b777
> > [44493.641134] switchtec switchtec0: stuser_free: 000000004978b777
> > [44493.660708] switchtec switchtec0: stuser_create: 00000000061d84e0
> > [44493.660712] switchtec switchtec0: switchtec_dev_open: 00000000061d84e0
> > [44493.660914] switchtec switchtec0: stuser_free: 00000000061d84e0
> > [44493.679806] switchtec switchtec0: stuser_create: 00000000ab4b5fd5
> > [44493.679811] switchtec switchtec0: switchtec_dev_open: 00000000ab4b5fd5
> > [44493.680007] switchtec switchtec0: stuser_free: 00000000ab4b5fd5
> > [44493.701799] switchtec switchtec0: stuser_create: 00000000943768fe
> > [44493.701804] switchtec switchtec0: switchtec_dev_open: 00000000943768fe
> > [44493.701978] switchtec switchtec0: stuser_free: 00000000943768fe
> > [44493.722465] switchtec switchtec0: stuser_create: 00000000727838fa
> > [44493.722469] switchtec switchtec0: switchtec_dev_open: 00000000727838fa
> > [44493.722633] switchtec switchtec0: stuser_free: 00000000727838fa
> > [44493.744238] switchtec switchtec0: stuser_create: 000000008eb49672
> > [44493.744242] switchtec switchtec0: switchtec_dev_open: 000000008eb49672
> > [44493.744395] switchtec switchtec0: stuser_free: 000000008eb49672
> > [44493.765595] switchtec switchtec0: stuser_create: 00000000ec4dcadf
> > [44493.765599] switchtec switchtec0: switchtec_dev_open: 00000000ec4dcadf
> > [44493.765766] switchtec switchtec0: stuser_free: 00000000ec4dcadf
> > [44493.786396] switchtec switchtec0: stuser_create: 00000000b664b5c8
> > [44493.786401] switchtec switchtec0: switchtec_dev_open: 00000000b664b5c8
> > [44493.786554] switchtec switchtec0: stuser_free: 00000000b664b5c8
> > [44493.808189] switchtec switchtec0: stuser_create: 000000000b620cc8
> > [44493.808194] switchtec switchtec0: switchtec_dev_open: 000000000b620cc8
> > [44493.808359] switchtec switchtec0: stuser_free: 000000000b620cc8
> > [44493.826971] switchtec switchtec0: stuser_create: 00000000237afec0
> > [44493.826976] switchtec switchtec0: switchtec_dev_open: 00000000237afec0
> > [44493.827128] switchtec switchtec0: stuser_free: 00000000237afec0
> > [44493.847269] switchtec switchtec0: stuser_create: 00000000766c005b
> > [44493.847273] switchtec switchtec0: switchtec_dev_open: 00000000766c005b
> > [44493.847474] switchtec switchtec0: stuser_free: 00000000766c005b
> > [44500.851786] switchtec switchtec0: stuser_create: 000000009a68785f
> > [44500.851792] switchtec switchtec0: switchtec_dev_open: 000000009a68785f
> > [44500.851993] switchtec switchtec0: stuser_free: 000000009a68785f
> > [44507.259767] switchtec switchtec0: doorbell
> > [44507.259785] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44507.259790] switchtec 0000:3b:00.1: qp 0: RX ver 8 len 70 flags 1
> > [44507.259795] switchtec 0000:3b:00.1: RX OK index 8 ver 8 size 70
> > into buf size 65524
> > [44507.259801] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 70
> > byte payload received
> > [44507.259828] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44507.259830] switchtec 0000:3b:00.1: done flag not set
> > [44507.259894] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44507.259897] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44507.259899] switchtec 0000:3b:00.1: done flag not set
> >
> > # cat /sys/kernel/debug/ntb_transport/0000\:3b\:00.1/qp0/stats
> > NTB QP stats:
> >
> > rx_bytes - 1244
> > rx_pkts - 20
> > rx_memcpy - 20
> > rx_async - 0
> > rx_ring_empty - 41
> > rx_err_no_buf - 0
> > rx_err_oflow - 0
> > rx_err_ver - 0
> > rx_buff - 0x00000000fd3649f3
> > rx_index - 20
> > rx_max_entry - 31
> > rx_alloc_entry - 100
> >
> > tx_bytes - 0
> > tx_pkts - 0
> > tx_memcpy - 0
> > tx_async - 0
> > tx_ring_full - 212083112
> > tx_err_no_buf - 0
> > tx_mw - 0x0000000029c0837d
> > tx_index (H) - 0
> > RRI (T) - 0
> > tx_max_entry - 31
> > free tx - 31
> >
> > Using TX DMA - Yes
> > Using RX DMA - Yes
> > QP Link - Up
> >
> > # ifconfig priv0
> > priv0     Link encap:Ethernet  HWaddr 1A:A5:C6:CD:F2:51
> >           inet addr:10.17.21.197  Bcast:10.17.21.199  Mask:255.255.255.252
> >           inet6 addr: fe80::18a5:c6ff:fecd:f251/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:26 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:1 errors:276181510 dropped:276181510 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:1496 (1.4 KiB)  TX bytes:90 (90.0 B)
> >
> >
> > Node/controller 2:
> > [43808.457199] switchtec switchtec0: Management device registered.
> > [43808.492076] switchtec switchtec1: Management device registered.
> > [43808.492647] switchtec: loaded.
> > [43815.711975] switchtec switchtec0: Partition ID 0 of 3
> > [43815.712140] switchtec switchtec0: MWs: 2 direct, 32 lut
> > [43815.712261] switchtec switchtec0: Peer MWs: 2 direct, 32 lut
> > [43815.764006] switchtec switchtec0: Requester ID 00:00.0 -> BB:01.0
> > [43815.764033] switchtec switchtec0: Requester ID 3A:00.0 -> BB:01.1
> > [43815.815913] switchtec switchtec0: Using crosslink configuration
> > [43815.815978] switchtec switchtec0: Crosslink BAR0 addr: 0
> > [43815.816091] switchtec switchtec0: Crosslink BAR2 addr: 1000000000
> > [43815.816155] switchtec switchtec0: Crosslink BAR4 addr: 2000000000
> > [43816.075968] switchtec switchtec0: Requester ID 00:01.0 -> BB:01.4
> > [43816.076000] switchtec switchtec0: Requester ID 00:01.1 -> BB:01.5
> > [43816.128185] switchtec switchtec0: dbs: shift 0/0, mask 0fffffffffffffff
> > [43816.232166] switchtec switchtec0: Shared MW Ready
> > [43816.232193] switchtec switchtec0: irqs - event: 2, db: 0, msgs: 1
> > [43816.233520] switchtec switchtec0: NTB device registered
> > [43817.032047] switchtec switchtec0: stuser_create: 0000000070070c46
> > [43817.032052] switchtec switchtec0: switchtec_dev_open: 0000000070070c46
> > [43817.032212] switchtec switchtec0: stuser_free: 0000000070070c46
> > [43855.950121] switchtec switchtec0: message: 0 00000004
> > [43952.022182] switchtec switchtec0: stuser_create: 00000000154869b1
> > [43952.022188] switchtec switchtec0: switchtec_dev_open: 00000000154869b1
> > [43952.022422] switchtec switchtec0: stuser_free: 00000000154869b1
> > [43952.095507] switchtec switchtec0: stuser_create: 00000000760ad51f
> > [43952.095512] switchtec switchtec0: switchtec_dev_open: 00000000760ad51f
> > [43952.095681] switchtec switchtec0: stuser_free: 00000000760ad51f
> > [43978.547393] Software Queue-Pair Transport over NTB, version 4
> > [43978.548191] switchtec switchtec0: enabling link
> > [43980.123918] switchtec switchtec0: message: 0 00000001
> > [43980.124031] switchtec switchtec0: message: 0 00000003
> > [43980.124110] switchtec switchtec0: ntb link up
> > [43980.124232] switchtec 0000:3b:00.1: Remote version = 4
> > [43980.124242] switchtec 0000:3b:00.1: Remote max number of qps = 2
> > [43980.124251] switchtec 0000:3b:00.1: Remote number of mws = 2
> > [43980.124261] switchtec 0000:3b:00.1: Remote MW0 size = 0x200000
> > [43980.124847] switchtec switchtec0: MW 0: part 0 addr
> > 0x00000010d2200000 size 0x0000000000200000
> > [43980.227931] switchtec 0000:3b:00.1: Remote MW1 size = 0x200000
> > [43980.228583] switchtec switchtec0: MW 1: part 0 addr
> > 0x00000010d1800000 size 0x0000000000200000
> > [43983.250163] switchtec switchtec0: stuser_create: 00000000992004c6
> > [43983.250168] switchtec switchtec0: switchtec_dev_open: 00000000992004c6
> > [43983.250326] switchtec switchtec0: stuser_free: 00000000992004c6
> > [43987.894388] switchtec switchtec0: stuser_create: 00000000ed0f26a7
> > [43987.894394] switchtec switchtec0: switchtec_dev_open: 00000000ed0f26a7
> > [43987.894556] switchtec switchtec0: stuser_free: 00000000ed0f26a7
> > [44003.236857] switchtec 0000:3b:00.1: Using DMA memcpy for TX
> > [44003.236862] switchtec 0000:3b:00.1: Using DMA memcpy for RX
> > [44003.239410] switchtec 0000:3b:00.1: NTB Transport QP 0 created
> > [44003.243226] switchtec 0000:3b:00.1: eth0 created
> > [44003.248706] ntb_netdev ntb_netdev0 priv0: renamed from eth0
> > ...
> > [44104.815744] switchtec switchtec0: stuser_create: 000000007ed96477
> > [44104.815751] switchtec switchtec0: switchtec_dev_open: 000000007ed96477
> > [44104.816022] switchtec switchtec0: stuser_free: 000000007ed96477
> > [44104.836171] switchtec switchtec0: stuser_create: 00000000199ff872
> > [44104.836175] switchtec switchtec0: switchtec_dev_open: 00000000199ff872
> > [44104.836345] switchtec switchtec0: stuser_free: 00000000199ff872
> > [44104.858108] switchtec switchtec0: stuser_create: 00000000d04b93c8
> > [44104.858113] switchtec switchtec0: switchtec_dev_open: 00000000d04b93c8
> > [44104.858264] switchtec switchtec0: stuser_free: 00000000d04b93c8
> > [44104.879259] switchtec switchtec0: stuser_create: 00000000b2930931
> > [44104.879263] switchtec switchtec0: switchtec_dev_open: 00000000b2930931
> > [44104.879425] switchtec switchtec0: stuser_free: 00000000b2930931
> > [44104.898693] switchtec switchtec0: stuser_create: 000000009f08a557
> > [44104.898697] switchtec switchtec0: switchtec_dev_open: 000000009f08a557
> > [44104.898856] switchtec switchtec0: stuser_free: 000000009f08a557
> > [44104.920456] switchtec switchtec0: stuser_create: 0000000092c59794
> > [44104.920461] switchtec switchtec0: switchtec_dev_open: 0000000092c59794
> > [44104.920626] switchtec switchtec0: stuser_free: 0000000092c59794
> > [44104.939554] switchtec switchtec0: stuser_create: 00000000778eabf3
> > [44104.939559] switchtec switchtec0: switchtec_dev_open: 00000000778eabf3
> > [44104.939735] switchtec switchtec0: stuser_free: 00000000778eabf3
> > [44104.961929] switchtec switchtec0: stuser_create: 0000000087ec0e77
> > [44104.961934] switchtec switchtec0: switchtec_dev_open: 0000000087ec0e77
> > [44104.962100] switchtec switchtec0: stuser_free: 0000000087ec0e77
> > [44104.983607] switchtec switchtec0: stuser_create: 000000001677deb0
> > [44104.983611] switchtec switchtec0: switchtec_dev_open: 000000001677deb0
> > [44104.983761] switchtec switchtec0: stuser_free: 000000001677deb0
> > [44105.004798] switchtec switchtec0: stuser_create: 00000000c505781c
> > [44105.004803] switchtec switchtec0: switchtec_dev_open: 00000000c505781c
> > [44105.004962] switchtec switchtec0: stuser_free: 00000000c505781c
> > [44105.024035] switchtec switchtec0: stuser_create: 00000000772c0769
> > [44105.024040] switchtec switchtec0: switchtec_dev_open: 00000000772c0769
> > [44105.024195] switchtec switchtec0: stuser_free: 00000000772c0769
> > [44105.045032] switchtec switchtec0: stuser_create: 00000000c48dc21b
> > [44105.045037] switchtec switchtec0: switchtec_dev_open: 00000000c48dc21b
> > [44105.045189] switchtec switchtec0: stuser_free: 00000000c48dc21b
> > [44105.066256] switchtec switchtec0: stuser_create: 000000004436beee
> > [44105.066261] switchtec switchtec0: switchtec_dev_open: 000000004436beee
> > [44105.066421] switchtec switchtec0: stuser_free: 000000004436beee
> > [44105.087705] switchtec switchtec0: stuser_create: 0000000052b38058
> > [44105.087710] switchtec switchtec0: switchtec_dev_open: 0000000052b38058
> > [44105.087920] switchtec switchtec0: stuser_free: 0000000052b38058
> > [44105.107527] switchtec switchtec0: stuser_create: 000000005b15e6d3
> > [44105.107532] switchtec switchtec0: switchtec_dev_open: 000000005b15e6d3
> > [44105.107718] switchtec switchtec0: stuser_free: 000000005b15e6d3
> > [44105.130832] switchtec switchtec0: stuser_create: 000000005379e8de
> > [44105.130836] switchtec switchtec0: switchtec_dev_open: 000000005379e8de
> > [44105.131069] switchtec switchtec0: stuser_free: 000000005379e8de
> > [44105.153557] switchtec switchtec0: stuser_create: 000000001671f313
> > [44105.153561] switchtec switchtec0: switchtec_dev_open: 000000001671f313
> > [44105.153711] switchtec switchtec0: stuser_free: 000000001671f313
> > [44105.175527] switchtec switchtec0: stuser_create: 00000000fdbb9319
> > [44105.175532] switchtec switchtec0: switchtec_dev_open: 00000000fdbb9319
> > [44105.175679] switchtec switchtec0: stuser_free: 00000000fdbb9319
> > [44105.196070] switchtec switchtec0: stuser_create: 000000002b86a7a3
> > [44105.196075] switchtec switchtec0: switchtec_dev_open: 000000002b86a7a3
> > [44105.196275] switchtec switchtec0: stuser_free: 000000002b86a7a3
> > [44105.218913] switchtec switchtec0: stuser_create: 0000000018a2de0b
> > [44105.218917] switchtec switchtec0: switchtec_dev_open: 0000000018a2de0b
> > [44105.219085] switchtec switchtec0: stuser_free: 0000000018a2de0b
> > [44105.240357] switchtec switchtec0: stuser_create: 00000000f712e033
> > [44105.240361] switchtec switchtec0: switchtec_dev_open: 00000000f712e033
> > [44105.240528] switchtec switchtec0: stuser_free: 00000000f712e033
> > [44105.261698] switchtec switchtec0: stuser_create: 0000000023550788
> > [44105.261702] switchtec switchtec0: switchtec_dev_open: 0000000023550788
> > [44105.261843] switchtec switchtec0: stuser_free: 0000000023550788
> > [44105.283558] switchtec switchtec0: stuser_create: 000000004724602a
> > [44105.283562] switchtec switchtec0: switchtec_dev_open: 000000004724602a
> > [44105.283714] switchtec switchtec0: stuser_free: 000000004724602a
> > [44105.305739] switchtec switchtec0: stuser_create: 00000000a925ed10
> > [44105.305744] switchtec switchtec0: switchtec_dev_open: 00000000a925ed10
> > [44105.305918] switchtec switchtec0: stuser_free: 00000000a925ed10
> > [44112.393253] switchtec switchtec0: stuser_create: 00000000fc673562
> > [44112.393259] switchtec switchtec0: switchtec_dev_open: 00000000fc673562
> > [44112.393435] switchtec switchtec0: stuser_free: 00000000fc673562
> > [44112.615032] switchtec switchtec0: stuser_create: 000000006fea71d7
> > [44112.615037] switchtec switchtec0: switchtec_dev_open: 000000006fea71d7
> > [44112.615193] switchtec switchtec0: stuser_free: 000000006fea71d7
> > [44112.644369] switchtec switchtec0: stuser_create: 00000000154869b1
> > [44112.644374] switchtec switchtec0: switchtec_dev_open: 00000000154869b1
> > [44112.644537] switchtec switchtec0: stuser_free: 00000000154869b1
> > [44221.644483] switchtec 0000:3b:00.1: Remote QP link status = 1
> > [44221.644488] switchtec 0000:3b:00.1: qp 0: Link Up
> > [44221.644493] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
> > [44221.644504] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44221.644508] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44221.644510] switchtec 0000:3b:00.1: done flag not set
> > [44221.661578] switchtec switchtec0: doorbell
> > [44221.661586] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44221.661590] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 90 flags 1
> > [44221.661594] switchtec 0000:3b:00.1: RX OK index 0 ver 0 size 90
> > into buf size 65524
> > [44221.661599] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 90
> > byte payload received
> > [44221.661661] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44221.661663] switchtec 0000:3b:00.1: done flag not set
> > [44221.661777] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44221.661780] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 0 flags 0
> > [44221.661782] switchtec 0000:3b:00.1: done flag not set
> > [44221.676760] switchtec 0000:3b:00.1: qp 0: Send Link Down
> > [44221.676781] switchtec 0000:3b:00.1: Remote QP link status = 1
> > [44221.676784] switchtec 0000:3b:00.1: qp 0: Link Up
> > [44221.676788] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
> > [44221.676800] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
> > [44221.676803] switchtec 0000:3b:00.1: qp 0: RX ver 0 len 90 flags 0
> > [44221.676806] switchtec 0000:3b:00.1: done flag not set
> > [44249.359502] switchtec switchtec0: stuser_create: 00000000992004c6
> > [44249.359508] switchtec switchtec0: switchtec_dev_open: 00000000992004c6
> > [44249.359716] switchtec switchtec0: stuser_free: 00000000992004c6
> > [44249.414933] switchtec switchtec0: stuser_create: 000000009c6b42ce
> > [44249.414937] switchtec switchtec0: switchtec_dev_open: 000000009c6b42ce
> > [44249.415128] switchtec switchtec0: stuser_free: 000000009c6b42ce
> >
> > # cat /sys/kernel/debug/ntb_transport/0000\:3b\:00.1/qp0/stats
> > NTB QP stats:
> >
> > rx_bytes - 0
> > rx_pkts - 0
> > rx_memcpy - 0
> > rx_async - 0
> > rx_ring_empty - 1
> > rx_err_no_buf - 0
> > rx_err_oflow - 0
> > rx_err_ver - 0
> > rx_buff - 0x00000000fbb8882b
> > rx_index - 0
> > rx_max_entry - 31
> > rx_alloc_entry - 100
> >
> > tx_bytes - 866
> > tx_pkts - 11
> > tx_memcpy - 11
> > tx_async - 0
> > tx_ring_full - 0
> > tx_err_no_buf - 0
> > tx_mw - 0x000000003c9d9493
> > tx_index (H) - 11
> > RRI (T) - 10
> > tx_max_entry - 31
> > free tx - 30
> >
> > Using TX DMA - Yes
> > Using RX DMA - Yes
> > QP Link - Up
> >
> > # ifconfig priv0
> > priv0     Link encap:Ethernet  HWaddr E6:BF:E8:6F:4F:A5
> >           inet addr:10.17.21.198  Bcast:10.17.21.199  Mask:255.255.255.252
> >           inet6 addr: fe80::e4bf:e8ff:fe6f:4fa5/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:1 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:90 (90.0 B)  TX bytes:1496 (1.4 KiB)
> >
> >
> > The information from the kernel debug sysfs attribute files seems to
> > be consistent when this condition occurs on one side:
> > - tx_ring_full is constantly increasing
> > - "tx_index (H)" and "RRI (T)" are both 0
> >
> > So perhaps data is filling the transmit buffer, but not being read
> > (received) by the other side?
> >
> > In the cases where both controllers/nodes boot up and the NTB virtual
> > Ethernet comes up/functions normally, I can usually induce the problem
> > by perform a series of interface down/up's using this loop from one
> > side (controller/node):
> > peer_ip=<IP_OF_PEER>; delay=5; iter=1; while true; do echo "iteration
> > ${iter}..."; echo down; ifconfig priv0 down; echo "sleep ${delay}";
> > sleep ${delay}; echo up; ifconfig priv0 up; sleep 2; ping -c 1 -w 1
> > ${peer_ip} || break; echo "sleep ${delay}"; sleep ${delay}; iter=$((
> > iter + 1 )); echo; done
> >
> > Sometimes it gets into this state after just one or a handful of
> > iterations, other times it can take 500+ iterations before it happens.
> >
> > Any help/tips debugging this would be greatly appreciated. And if the
> > 'netdev' mailing list is more appropriate for this inquiry, please
> > advise and I'll re-post there.
> >
> > Thanks for your time/consideration.
> >
> >
> > --Marc
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ntb_netdev Communication Failure Issue
  2022-02-17 15:22   ` Marc Smith
@ 2022-02-17 16:31     ` Eric Pilmore
  2022-02-17 16:49     ` Logan Gunthorpe
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Pilmore @ 2022-02-17 16:31 UTC (permalink / raw)
  To: Marc Smith; +Cc: Logan Gunthorpe, ntb, Kelvin Cao, kelvincao

On Thu, Feb 17, 2022 at 7:22 AM Marc Smith <msmith626@gmail.com> wrote:
>
>
> Is it possible there are some special NTB/BIOS settings needed for
> this hardware?

I'm assuming you have the switch (NTB) properly configured so that the
respective partitions can talk to each other. The other part is
ensuring that the NT BAR did properly enumerate. If it is large, it
can become a challenge for some BIOSes to be able to enumerate it.  Do
a "sudo lspci -vvv -s <BDF>" with the BDF of the NT EP, and ensure
that the BARs (Regions 2,4) have MMIO space assigned.

When you try to run, anything interesting show up in "dmesg", such as
DMAR errors?

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ntb_netdev Communication Failure Issue
  2022-02-17 15:22   ` Marc Smith
  2022-02-17 16:31     ` Eric Pilmore
@ 2022-02-17 16:49     ` Logan Gunthorpe
  2022-03-09 14:35       ` Marc Smith
  1 sibling, 1 reply; 9+ messages in thread
From: Logan Gunthorpe @ 2022-02-17 16:49 UTC (permalink / raw)
  To: Marc Smith; +Cc: ntb, Kelvin Cao, kelvincao




On 2022-02-17 8:22 a.m., Marc Smith wrote:
> On Wed, Feb 16, 2022 at 12:14 PM Logan Gunthorpe <logang@deltatee.com> wrote:
>>
>> Hi Marc,
>>
>>
>> On 2022-02-16 9:16 a.m., Marc Smith wrote:
>>> Hi,
>>>
>>> I'm using vanilla Linux 5.4.145 with a Celestica "cluster-in-a-box"
>>> system (two servers in a single chassis connected internally via PCIe
>>> switches). The PCIe switches are Microsemi Switchtec ('lspci' says
>>> "PMC-Sierra Inc. PM8546 B-FEIP PSX 96xG3 PCIe Storage Switch
>>> [11f8:8546]" but I believe this is not the current product/model name
>>> from Microsemi).
>>>
>>> I use the 'ntb_netdev' driver for the virtual Ethernet functionality
>>> across the NTB for IP communication between the two controllers. I've
>>> had a long standing issue where sometimes when we reboot a controller,
>>> we're never able to establish communication between the two
>>> controllers again -- usually requires rebooting both and/or power
>>> cycling the entire chassis.
>>>
>>> I've also noticed sometimes when they boot, the virtual Ethernet
>>> device fails to pass traffic right away.
>>>
>>> And finally, when simply doing an "ifconfig NTB_IF down" followed by
>>> an "ifconfig NTB_IF up" on one controller, they sometimes get into the
>>> state where they no longer pass traffic.
>>>
>>> All three conditions seem to be related, and the symptom is the same
>>> (they have the UP RUNNING flags with IP's configured but I'm unable to
>>> ping between the two controllers). Usually when this happens, one side
>>> will show the drop/error counts for the NTB virtual Ethernet interface
>>> increasing (never ending).
>>>
>>> In the example below, both controllers booted up, and I unloaded the
>>> modules then reloaded with 'dyndbg=+p' but immediately was not able to
>>> pass any traffic on those interfaces.
>>
>>
>> I've copied Kelvin who does most of the work on the switchtec driver. I
>> suspect this is a bug in that driver.
>>
>> I also know that there have been some fixes in this area in the OOT
>> version of that module (which will probably make it upstream in due course).
>>
>> See this PR:
>>
>> https://github.com/Microsemi/switchtec-kernel/pull/107
>>
>> I wouldn't be surprised if this is the same issue you are seeing. The
>> patches there should apply pretty easily to the upstream kernel if you
>> prefer.
> 
> Thanks Logan. I used the driver source from
> "https://github.com/Microsemi/switchtec-kernel/tree/master" and the
> problem still persists. On one side tx_ring_full continues to climb
> and the errors/dropped counts continue to rise for the virtual
> Ethernet interface:
> 
> # cat /sys/kernel/debug/ntb_transport/0000\:3b\:00.1/qp0/stats
> 
> NTB QP stats:
> 
> rx_bytes - 0
> rx_pkts - 0
> rx_memcpy - 0
> rx_async - 0
> rx_ring_empty - 8
> rx_err_no_buf - 0
> rx_err_oflow - 0
> rx_err_ver - 0
> rx_buff - 0x00000000e7a16187
> rx_index - 0
> rx_max_entry - 31
> rx_alloc_entry - 100
> 
> tx_bytes - 0
> tx_pkts - 0
> tx_memcpy - 0
> tx_async - 0
> tx_ring_full - 23809328
> tx_err_no_buf - 0
> tx_mw - 0x0000000029c0837d
> tx_index (H) - 0
> RRI (T) - 0
> tx_max_entry - 31
> free tx - 31
> 
> Using TX DMA - Yes
> Using RX DMA - Yes
> QP Link - Up
> 
> # ifconfig priv0
> priv0     Link encap:Ethernet  HWaddr 32:15:21:03:C0:31
>           inet addr:10.17.21.197  Bcast:10.17.21.199  Mask:255.255.255.252
>           inet6 addr: fe80::3015:21ff:fe03:c031/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:65510  Metric:1
>           RX packets:18 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1 errors:321933488 dropped:321933488 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:1188 (1.1 KiB)  TX bytes:90 (90.0 B)
> 
> Is it possible there are some special NTB/BIOS settings needed for
> this hardware?

No, I don't think there's any BIOS settings needed. It could be a
problem with the switch configuration, but I'm not entirely sure what it
would be. Crosslink configurations are also super tricky.

The fact that it comes up some of the time suggest there's a race
condition or something along those lines while the driver sets up the
switch. So this may take some effort to debug.

It's hard to say, but I suspect the doorbells on the receive side are
not getting through. Can you confirm that when it is broken you don't
get any debug messages like "ntb_transport_rxc_db: doorbell 0 received".

Another thing that you could try is to dump the registers for both
switches when it's working and broken. You can use the switchtec user
tool for this. 'switchtec gas dump' will dump the entire register space
in hex. The NTB registers to take a look at start at 0x10000. If we can
notice a difference between broken and working we can narrow down what
is not being setup correctly.

Logan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ntb_netdev Communication Failure Issue
  2022-02-17 16:49     ` Logan Gunthorpe
@ 2022-03-09 14:35       ` Marc Smith
  2022-03-09 16:52         ` Logan Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Smith @ 2022-03-09 14:35 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: ntb, Kelvin Cao, kelvincao

On Thu, Feb 17, 2022 at 11:50 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
>
> On 2022-02-17 8:22 a.m., Marc Smith wrote:
> > On Wed, Feb 16, 2022 at 12:14 PM Logan Gunthorpe <logang@deltatee.com> wrote:
> >>
> >> Hi Marc,
> >>
> >>
> >> On 2022-02-16 9:16 a.m., Marc Smith wrote:
> >>> Hi,
> >>>
> >>> I'm using vanilla Linux 5.4.145 with a Celestica "cluster-in-a-box"
> >>> system (two servers in a single chassis connected internally via PCIe
> >>> switches). The PCIe switches are Microsemi Switchtec ('lspci' says
> >>> "PMC-Sierra Inc. PM8546 B-FEIP PSX 96xG3 PCIe Storage Switch
> >>> [11f8:8546]" but I believe this is not the current product/model name
> >>> from Microsemi).
> >>>
> >>> I use the 'ntb_netdev' driver for the virtual Ethernet functionality
> >>> across the NTB for IP communication between the two controllers. I've
> >>> had a long standing issue where sometimes when we reboot a controller,
> >>> we're never able to establish communication between the two
> >>> controllers again -- usually requires rebooting both and/or power
> >>> cycling the entire chassis.
> >>>
> >>> I've also noticed sometimes when they boot, the virtual Ethernet
> >>> device fails to pass traffic right away.
> >>>
> >>> And finally, when simply doing an "ifconfig NTB_IF down" followed by
> >>> an "ifconfig NTB_IF up" on one controller, they sometimes get into the
> >>> state where they no longer pass traffic.
> >>>
> >>> All three conditions seem to be related, and the symptom is the same
> >>> (they have the UP RUNNING flags with IP's configured but I'm unable to
> >>> ping between the two controllers). Usually when this happens, one side
> >>> will show the drop/error counts for the NTB virtual Ethernet interface
> >>> increasing (never ending).
> >>>
> >>> In the example below, both controllers booted up, and I unloaded the
> >>> modules then reloaded with 'dyndbg=+p' but immediately was not able to
> >>> pass any traffic on those interfaces.
> >>
> >>
> >> I've copied Kelvin who does most of the work on the switchtec driver. I
> >> suspect this is a bug in that driver.
> >>
> >> I also know that there have been some fixes in this area in the OOT
> >> version of that module (which will probably make it upstream in due course).
> >>
> >> See this PR:
> >>
> >> https://github.com/Microsemi/switchtec-kernel/pull/107
> >>
> >> I wouldn't be surprised if this is the same issue you are seeing. The
> >> patches there should apply pretty easily to the upstream kernel if you
> >> prefer.
> >
> > Thanks Logan. I used the driver source from
> > "https://github.com/Microsemi/switchtec-kernel/tree/master" and the
> > problem still persists. On one side tx_ring_full continues to climb
> > and the errors/dropped counts continue to rise for the virtual
> > Ethernet interface:
> >
> > # cat /sys/kernel/debug/ntb_transport/0000\:3b\:00.1/qp0/stats
> >
> > NTB QP stats:
> >
> > rx_bytes - 0
> > rx_pkts - 0
> > rx_memcpy - 0
> > rx_async - 0
> > rx_ring_empty - 8
> > rx_err_no_buf - 0
> > rx_err_oflow - 0
> > rx_err_ver - 0
> > rx_buff - 0x00000000e7a16187
> > rx_index - 0
> > rx_max_entry - 31
> > rx_alloc_entry - 100
> >
> > tx_bytes - 0
> > tx_pkts - 0
> > tx_memcpy - 0
> > tx_async - 0
> > tx_ring_full - 23809328
> > tx_err_no_buf - 0
> > tx_mw - 0x0000000029c0837d
> > tx_index (H) - 0
> > RRI (T) - 0
> > tx_max_entry - 31
> > free tx - 31
> >
> > Using TX DMA - Yes
> > Using RX DMA - Yes
> > QP Link - Up
> >
> > # ifconfig priv0
> > priv0     Link encap:Ethernet  HWaddr 32:15:21:03:C0:31
> >           inet addr:10.17.21.197  Bcast:10.17.21.199  Mask:255.255.255.252
> >           inet6 addr: fe80::3015:21ff:fe03:c031/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:65510  Metric:1
> >           RX packets:18 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:1 errors:321933488 dropped:321933488 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:1188 (1.1 KiB)  TX bytes:90 (90.0 B)
> >
> > Is it possible there are some special NTB/BIOS settings needed for
> > this hardware?

[root@localhost ~]# lspci -vvv -s 0000:3b:00.1
3b:00.1 Bridge: PMC-Sierra Inc. PM8546 B-FEIP PSX 96xG3 PCIe Storage Switch
...
Region 2: Memory at 1affc8000000 (64-bit, prefetchable) [size=64M]
Region 4: Memory at 1affc4000000 (64-bit, prefetchable) [size=64M]
...

[root@localhost ~]# lspci -vvv -s 0000:3b:00.1
3b:00.1 Bridge: PMC-Sierra Inc. PM8546 B-FEIP PSX 96xG3 PCIe Storage Switch
...
Region 2: Memory at 1affc8000000 (64-bit, prefetchable) [size=64M]
Region 4: Memory at 1affc4000000 (64-bit, prefetchable) [size=64M]
...

No DMAR errors in kernel log buffer on either side.


>
> No, I don't think there's any BIOS settings needed. It could be a
> problem with the switch configuration, but I'm not entirely sure what it
> would be. Crosslink configurations are also super tricky.
>
> The fact that it comes up some of the time suggest there's a race
> condition or something along those lines while the driver sets up the
> switch. So this may take some effort to debug.
>
> It's hard to say, but I suspect the doorbells on the receive side are
> not getting through. Can you confirm that when it is broken you don't
> get any debug messages like "ntb_transport_rxc_db: doorbell 0 received".

I *do* see these messages occasionally even when it's not "broken"
(still passes traffic):
...
[ 1445.383417] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[ 1445.383437] switchtec 0000:3b:00.1: qp 0: RX ver 42 len 98 flags 1
[ 1445.383449] switchtec 0000:3b:00.1: RX OK index 11 ver 42 size 98
into buf size 65524
[ 1445.383463] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 98
byte payload received
[ 1445.383567] switchtec 0000:3b:00.1: qp 0: RX ver 12 len 42 flags 0
[ 1445.383569] switchtec 0000:3b:00.1: done flag not set
[ 1445.383655] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[ 1445.383658] switchtec 0000:3b:00.1: qp 0: RX ver 12 len 42 flags 0
[ 1445.383660] switchtec 0000:3b:00.1: done flag not set
...

And then finally when it's in the broken state (not passing traffic,
and interface error/drop constantly increasing on one side), I see
these final entries...

Node "1":
[24911.478753] switchtec 0000:3b:00.1: Remote QP link status = 1
[24911.478758] switchtec 0000:3b:00.1: qp 0: Link Up
[24911.478762] ntb_netdev ntb_netdev0 priv0: Event 1, Link 1
[24911.478776] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[24911.478780] switchtec 0000:3b:00.1: qp 0: RX ver 22909 len 98 flags 0
[24911.478782] switchtec 0000:3b:00.1: done flag not set

Node "2":
[73817.158844] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[73817.158847] switchtec 0000:3b:00.1: qp 0: RX ver 48 len 42 flags 1
[73817.158850] switchtec 0000:3b:00.1: RX OK index 17 ver 48 size 42
into buf size 65524
[73817.158854] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 42
byte payload received
[73817.158882] switchtec 0000:3b:00.1: qp 0: RX ver 18 len 70 flags 0
[73817.158884] switchtec 0000:3b:00.1: done flag not set
[73817.158943] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[73817.158946] switchtec 0000:3b:00.1: qp 0: RX ver 18 len 70 flags 0
[73817.158948] switchtec 0000:3b:00.1: done flag not set

On node "2" is where the drop/error count for the interface increases
in this case, and if I run a "ping" between the two in another
terminal, the following excerpts repeat continually:
...
[73881.175207] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[73881.175211] switchtec 0000:3b:00.1: qp 0: RX ver 96 len 42 flags 1
[73881.175214] switchtec 0000:3b:00.1: RX OK index 3 ver 96 size 42
into buf size 65524
[73881.175218] ntb_netdev ntb_netdev0 priv0: ntb_netdev_rx_handler: 42
byte payload received
[73881.175245] switchtec 0000:3b:00.1: qp 0: RX ver 66 len 42 flags 0
[73881.175247] switchtec 0000:3b:00.1: done flag not set
[73881.175304] switchtec 0000:3b:00.1: ntb_transport_rxc_db: doorbell 0 received
[73881.175307] switchtec 0000:3b:00.1: qp 0: RX ver 66 len 42 flags 0
[73881.175309] switchtec 0000:3b:00.1: done flag not set
...


>
> Another thing that you could try is to dump the registers for both
> switches when it's working and broken. You can use the switchtec user
> tool for this. 'switchtec gas dump' will dump the entire register space
> in hex. The NTB registers to take a look at start at 0x10000. If we can
> notice a difference between broken and working we can narrow down what
> is not being setup correctly.

How many bytes to compare starting at 0x10000? The beginning looks like this:

Working:
0010000 0003 0000 0005 0000 0000 0000 3a00 0000
0010010 0000 0000 0000 0000 0000 0000 0000 0000
0010020 0001 0000 0004 0000 0000 0000 0000 0000
0010030 0000 0000 0000 0000 0000 0000 0000 0000
0010040 0001 0000 0001 0000 0000 0000 0000 0000
0010050 0000 0000 0000 0000 0000 0000 0000 0000
*

Broken:
0010000 0003 0000 0005 0000 0000 0000 3a00 0000
0010010 0000 0000 0000 0000 0000 0000 0000 0000
0010020 0001 0000 0004 0000 0000 0000 0000 0000
0010030 0000 0000 0000 0000 0000 0000 0000 0000
0010040 0001 0000 0001 0000 0000 0000 0000 0000
0010050 0000 0000 0000 0000 0000 0000 0000 0000
*

No difference between the two above, and actually, comparing the
entire 4 MiB of GAS dump taken while working/functional versus broken
state shows no differences:
# cmp switchtec0_gas_dump_after_ifup_working.1 switchtec0_gas_dump_broken.1
# echo $?
0


--Marc


>
> Logan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ntb_netdev Communication Failure Issue
  2022-03-09 14:35       ` Marc Smith
@ 2022-03-09 16:52         ` Logan Gunthorpe
  2022-03-09 18:26           ` Marc Smith
  0 siblings, 1 reply; 9+ messages in thread
From: Logan Gunthorpe @ 2022-03-09 16:52 UTC (permalink / raw)
  To: Marc Smith; +Cc: ntb, Kelvin Cao, kelvincao



On 2022-03-09 07:35, Marc Smith wrote:
> No difference between the two above, and actually, comparing the
> entire 4 MiB of GAS dump taken while working/functional versus broken
> state shows no differences:
> # cmp switchtec0_gas_dump_after_ifup_working.1 switchtec0_gas_dump_broken.1

Well I'm stumped. It looks like the receive side is getting the
doorbell, but when it reads the memory window the done flag is not set,
so it doesn't process any packets. The transmit side then fills up the
memory window and breaks.

So it sounds like the read side is not reading the correct data from the
memory window. Like it's cache or something (which it shouldn't be). But
I have no idea what could cause that.

Logan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ntb_netdev Communication Failure Issue
  2022-03-09 16:52         ` Logan Gunthorpe
@ 2022-03-09 18:26           ` Marc Smith
  2022-03-09 18:31             ` Logan Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Smith @ 2022-03-09 18:26 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: ntb, Kelvin Cao, kelvincao

On Wed, Mar 9, 2022 at 11:52 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2022-03-09 07:35, Marc Smith wrote:
> > No difference between the two above, and actually, comparing the
> > entire 4 MiB of GAS dump taken while working/functional versus broken
> > state shows no differences:
> > # cmp switchtec0_gas_dump_after_ifup_working.1 switchtec0_gas_dump_broken.1
>
> Well I'm stumped. It looks like the receive side is getting the
> doorbell, but when it reads the memory window the done flag is not set,
> so it doesn't process any packets. The transmit side then fills up the
> memory window and breaks.
>
> So it sounds like the read side is not reading the correct data from the
> memory window. Like it's cache or something (which it shouldn't be). But
> I have no idea what could cause that.

Anything possibly to blame in the PCIe switch firmware that could
cause this? Working with the vendor to ensure we're at the least, but
still TBD at the moment.

--Marc


>
> Logan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ntb_netdev Communication Failure Issue
  2022-03-09 18:26           ` Marc Smith
@ 2022-03-09 18:31             ` Logan Gunthorpe
  0 siblings, 0 replies; 9+ messages in thread
From: Logan Gunthorpe @ 2022-03-09 18:31 UTC (permalink / raw)
  To: Marc Smith; +Cc: ntb, Kelvin Cao, kelvincao



On 2022-03-09 11:26, Marc Smith wrote:
> On Wed, Mar 9, 2022 at 11:52 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>>
>>
>>
>> On 2022-03-09 07:35, Marc Smith wrote:
>>> No difference between the two above, and actually, comparing the
>>> entire 4 MiB of GAS dump taken while working/functional versus broken
>>> state shows no differences:
>>> # cmp switchtec0_gas_dump_after_ifup_working.1 switchtec0_gas_dump_broken.1
>>
>> Well I'm stumped. It looks like the receive side is getting the
>> doorbell, but when it reads the memory window the done flag is not set,
>> so it doesn't process any packets. The transmit side then fills up the
>> memory window and breaks.
>>
>> So it sounds like the read side is not reading the correct data from the
>> memory window. Like it's cache or something (which it shouldn't be). But
>> I have no idea what could cause that.
> 
> Anything possibly to blame in the PCIe switch firmware that could
> cause this? Working with the vendor to ensure we're at the least, but
> still TBD at the moment.

I can't say anything for certain. It certainly worth getting the latest
firmware. But as far as I understand the memory windows (if that is
indeed the problem) are mostly hardware related so if they are setup
correctly by the firmware they shouldn't stop working if the firmware is
right.

So I'm skeptical a firmware fix would solve the issue, but I could
definitely be wrong about that.

Logan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-09 18:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16 16:16 ntb_netdev Communication Failure Issue Marc Smith
2022-02-16 17:14 ` Logan Gunthorpe
2022-02-17 15:22   ` Marc Smith
2022-02-17 16:31     ` Eric Pilmore
2022-02-17 16:49     ` Logan Gunthorpe
2022-03-09 14:35       ` Marc Smith
2022-03-09 16:52         ` Logan Gunthorpe
2022-03-09 18:26           ` Marc Smith
2022-03-09 18:31             ` Logan Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).