All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 1086] Significant TX packet drops with Mellanox NIC (mlx5 PMD)
@ 2022-09-28 13:41 bugzilla
  0 siblings, 0 replies; only message in thread
From: bugzilla @ 2022-09-28 13:41 UTC (permalink / raw)
  To: dev

https://bugs.dpdk.org/show_bug.cgi?id=1086

            Bug ID: 1086
           Summary: Significant TX packet drops with Mellanox NIC (mlx5
                    PMD)
           Product: DPDK
           Version: 21.11
          Hardware: x86
                OS: Linux
            Status: UNCONFIRMED
          Severity: critical
          Priority: Normal
         Component: ethdev
          Assignee: dev@dpdk.org
          Reporter: anton@vaa.su
  Target Milestone: ---

Created attachment 222
  --> https://bugs.dpdk.org/attachment.cgi?id=222&action=edit
testpmd-fec28ca0e3.log.txt

Given 2 servers with 25G Mellanox 2-port NICs:

# dpdk-devbind.py -s
Network devices using kernel driver
===================================
0000:3b:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens1f0np0 drv=mlx5_core
unused=vfio-pci 
0000:3b:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens1f1np1 drv=mlx5_core
unused=vfio-pci

Servers are connected directly.


The first server is used as a packet generator, running TRex v2.99 in stateless
mode:
./t-rex-64 -c 16 -i
./trex-console
trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps


The second one runs dpdk-testpmd:
OS: Debian GNU/Linux 10 (buster)
uname -r: 4.19.0-21-amd64
ofed_info: MLNX_OFED_LINUX-5.7-1.0.2.0
gcc version 8.3.0 (Debian 8.3.0-6)

When compiled DPDK v21.08 and running testpmd this way:

dpdk-testpmd -l 1-17 -n 4 --log-level=debug -- --nb-ports=2 --nb-cores=16
--portmask=0x3 --rxq=8 --txq=8

It handles roughly 17Mpps per port:

trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps

TRex Port Statistics
   port    |         0         |         1         |       total       
-----------+-------------------+-------------------+------------------
owner      |              root |              root |                   
link       |                UP |                UP |                   
state      |      TRANSMITTING |      TRANSMITTING |                   
speed      |           25 Gb/s |           25 Gb/s |                   
CPU util.  |            27.76% |            27.76% |                   
--         |                   |                   |                   
Tx bps L2  |          8.7 Gbps |         8.73 Gbps |        17.43 Gbps 
Tx bps L1  |        11.42 Gbps |        11.46 Gbps |        22.88 Gbps 
Tx pps     |           17 Mpps |        17.05 Mpps |        34.05 Mpps 
Line Util. |            45.7 % |           45.83 % |                   
---        |                   |                   |                   
Rx bps     |          8.7 Gbps |         8.73 Gbps |        17.43 Gbps 
Rx pps     |           17 Mpps |        17.05 Mpps |        34.05 Mpps 
----       |                   |                   |                   
opackets   |         290928398 |         291050836 |         581979234 
ipackets   |         290885740 |         291093159 |         581978899 
obytes     |       18619417472 |       18627254464 |       37246671936 
ibytes     |       18616688080 |       18629962836 |       37246650916 
tx-pkts    |      290.93 Mpkts |      291.05 Mpkts |      581.98 Mpkts 
rx-pkts    |      290.89 Mpkts |      291.09 Mpkts |      581.98 Mpkts 
tx-bytes   |          18.62 GB |          18.63 GB |          37.25 GB 
rx-bytes   |          18.62 GB |          18.63 GB |          37.25 GB 
-----      |                   |                   |                   
oerrors    |                 0 |                 0 |                 0 
ierrors    |                 0 |                 0 |                 0


But if we switch to DPDK v21.11, it becomes much worse:

TRex Port Statistics
   port    |         0         |         1         |       total       
-----------+-------------------+-------------------+------------------
owner      |              root |              root |                   
link       |                UP |                UP |                   
state      |      TRANSMITTING |      TRANSMITTING |                   
speed      |           25 Gb/s |           25 Gb/s |                   
CPU util.  |            26.06% |            26.06% |                   
--         |                   |                   |                   
Tx bps L2  |          8.7 Gbps |         8.72 Gbps |        17.42 Gbps 
Tx bps L1  |        11.42 Gbps |        11.45 Gbps |        22.86 Gbps 
Tx pps     |        16.99 Mpps |        17.04 Mpps |        34.02 Mpps 
Line Util. |           45.66 % |           45.79 % |                   
---        |                   |                   |                   
Rx bps     |         3.75 Gbps |         3.76 Gbps |          7.5 Gbps 
Rx pps     |         7.32 Mpps |         7.34 Mpps |        14.66 Mpps 
----       |                   |                   |                   
opackets   |         190538147 |         190707494 |         381245641 
ipackets   |          82174700 |          82260152 |         164434852 
obytes     |       12194441408 |       12205280936 |       24399722344 
ibytes     |        5259181520 |        5264649728 |       10523831248 
tx-pkts    |      190.54 Mpkts |      190.71 Mpkts |      381.25 Mpkts 
rx-pkts    |       82.17 Mpkts |       82.26 Mpkts |      164.43 Mpkts 
tx-bytes   |          12.19 GB |          12.21 GB |           24.4 GB 
rx-bytes   |           5.26 GB |           5.26 GB |          10.52 GB 
-----      |                   |                   |                   
oerrors    |                 0 |                 0 |                 0 
ierrors    |                 0 |                 0 |                 0

It handles only ~7 Mpps for each port, instead of ~17 Mpps! There are huge TX
drops stats reported by testpmd:
  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 1101378001     RX-dropped: 0             RX-total: 1101378001
  TX-packets: 1016776861     TX-dropped: 84576754      TX-total: 1101353615
  ----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1  ----------------------
  RX-packets: 1101353615     RX-dropped: 0             RX-total: 1101353615
  TX-packets: 1016804108     TX-dropped: 84573893      TX-total: 1101378001
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 2202731616     RX-dropped: 0             RX-total: 2202731616
  TX-packets: 2033580969     TX-dropped: 169150647     TX-total: 2202731616
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


I found the commit (between 21.08 and 21.11), which caused this trouble using
git bisect:
https://github.com/DPDK/dpdk/commit/fec28ca0e3a93143829f3b41a28a8da933f28499

Also, I've used to profile it with Intel VTune 2021.3.0 (-collect hotspots &
-collect memory-access). I've compared two revisions:
1. 690b2a88c2 (GOOD)
2. fec28ca0e3 (BAD)
I may try to share corresponding profiling results somehow if it helps.
Unfortunately, I cannot attach them here (vtune stats data is too big).

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-09-28 13:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 13:41 [Bug 1086] Significant TX packet drops with Mellanox NIC (mlx5 PMD) bugzilla

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.