All of lore.kernel.org
 help / color / mirror / Atom feed
* Mellanox ConnectX-5 crashes and mbuf leak
@ 2017-09-26  9:23 Martin Weiser
  2017-09-26 10:32 ` Olga Shern
  2017-10-05 21:46 ` Yongseok Koh
  0 siblings, 2 replies; 8+ messages in thread
From: Martin Weiser @ 2017-09-26  9:23 UTC (permalink / raw)
  To: Adrien Mazarguil, Nelio Laranjeiro; +Cc: dev

[-- Attachment #1: Type: text/plain, Size: 5100 bytes --]

Hi,

we are currently testing the Mellanox ConnectX-5 100G NIC with DPDK
17.08 as well as dpdk-net-next and are
experiencing mbuf leaks as well as crashes (and in some instances even
kernel panics in a mlx5 module) under
certain load conditions.

We initially saw these issues only in our own DPDK-based application and
it took some effort to reproduce this
in one of the DPDK example applications. However with the attached patch
to the load-balancer example we can
reproduce the issues reliably.

The patch may look weird at first but I will explain why I made these
changes:

* the sleep introduced in the worker threads simulates heavy processing
which causes the software rx rings to fill
  up under load. If the rings are large enough (I increased the ring
size with the load-balancer command line option
  as you can see in the example call further down) the mbuf pool may run
empty and I believe this leads to a malfunction
  in the mlx5 driver. As soon as this happens the NIC will stop
forwarding traffic, probably because the driver
  cannot allocate mbufs for the packets received by the NIC.
Unfortunately when this happens most of the mbufs will
  never return to the mbuf pool so that even when the traffic stops the
pool will remain almost empty and the
  application will not forward traffic even at a very low rate.

* the use of the reference count in the mbuf in addition to the
situation described above is what makes the
  mlx5 DPDK driver crash almost immediately under load. In our
application we rely on this feature to be able to forward
  the packet quickly and still send the packet to a worker thread for
analysis and finally free the packet when analysis is
  done. Here I simulated this by increasing the mbuf reference count
immediately after receiving the mbuf from the
  driver and then calling rte_pktmbuf_free in the worker thread which
should only decrement the reference count again
  and not actually free the mbuf.

We executed the patched load-balancer application with the following
command line:

    ./build/load_balancer -l 3-7 -n 4 -- --rx "(0,0,3),(1,0,3)" --tx
"(0,3),(1,3)" --w "4" --lpm "16.0.0.0/8=>0; 48.0.0.0/8=>1;" --pos-lb 29
--rsz "1024, 32768, 1024, 1024"

Then we generated traffic using the t-rex traffic generator and the sfr
test case. On our machine the issues start
to happen when the traffic exceeds ~6 Gbps but this may vary depending
on how powerful the test machine is (by
the way we were able to reproduce this on different types of hardware).

A typical stacktrace looks like this:

    Thread 1 "load_balancer" received signal SIGSEGV, Segmentation fault.
    0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
    716      __builtin_ia32_storedqu ((char *)__P, (__v16qi)__B);
    (gdb) bt
    #0  0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
    #1  rxq_cq_decompress_v (elts=0x7fff3732bef0, cq=0x7ffff7f99380,
rxq=0x7fff3732a980) at
/root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:679
    #2  rxq_burst_v (pkts_n=<optimized out>, pkts=0xa7c7b0 <app+432944>,
rxq=0x7fff3732a980) at
/root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1242
    #3  mlx5_rx_burst_vec (dpdk_rxq=0x7fff3732a980, pkts=<optimized
out>, pkts_n=<optimized out>) at
/root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1277
    #4  0x000000000043c11d in rte_eth_rx_burst (nb_pkts=3599,
rx_pkts=0xa7c7b0 <app+432944>, queue_id=0, port_id=0 '\000')
    at
/root/dpdk-next-net//x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2781
    #5  app_lcore_io_rx (lp=lp@entry=0xa7c700 <app+432768>,
n_workers=n_workers@entry=1, bsz_rd=bsz_rd@entry=144,
bsz_wr=bsz_wr@entry=144, pos_lb=pos_lb@entry=29 '\035')
    at /root/dpdk-next-net/examples/load_balancer/runtime.c:198
    #6  0x0000000000447dc0 in app_lcore_main_loop_io () at
/root/dpdk-next-net/examples/load_balancer/runtime.c:485
    #7  app_lcore_main_loop (arg=<optimized out>) at
/root/dpdk-next-net/examples/load_balancer/runtime.c:669
    #8  0x0000000000495e8b in rte_eal_mp_remote_launch ()
    #9  0x0000000000441e0d in main (argc=<optimized out>,
argv=<optimized out>) at
/root/dpdk-next-net/examples/load_balancer/main.c:99

The crash does not always happen at the exact same spot but in our tests
always in the same function.
In a few instances instead of an application crash the system froze
completely with what appeared to be a kernel
panic. The last output looked like a crash in the interrupt handler of a
mlx5 module but unfortunately I cannot
provide the exact output right now.

All tests were performed under Ubuntu 16.04 server running a
4.4.0-96-generic kernel and the lasted Mellanox OFED
MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64 was used.

Any help with this issue is greatly appreciated.

Best regards,
Martin


[-- Attachment #2: test.patch --]
[-- Type: text/plain, Size: 1443 bytes --]

diff --git a/config/common_base b/config/common_base
index 439f3cc..12b71e9 100644
--- a/config/common_base
+++ b/config/common_base
@@ -220,7 +220,7 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
 #
 # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD
 #
-CONFIG_RTE_LIBRTE_MLX5_PMD=n
+CONFIG_RTE_LIBRTE_MLX5_PMD=y
 CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
 CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8

diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index e54b785..d448100 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -41,6 +41,7 @@
 #include <stdarg.h>
 #include <errno.h>
 #include <getopt.h>
+#include <unistd.h>

 #include <rte_common.h>
 #include <rte_byteorder.h>
@@ -133,6 +134,8 @@ app_lcore_io_rx_buffer_to_send (
 	uint32_t pos;
 	int ret;

+	rte_pktmbuf_refcnt_update(mbuf, 1);
+
 	pos = lp->rx.mbuf_out[worker].n_mbufs;
 	lp->rx.mbuf_out[worker].array[pos ++] = mbuf;
 	if (likely(pos < bsz)) {
@@ -521,6 +524,8 @@ app_lcore_worker(
 		continue;
 #endif

+		usleep(20);
+
 		APP_WORKER_PREFETCH1(rte_pktmbuf_mtod(lp->mbuf_in.array[0], unsigned char *));
 		APP_WORKER_PREFETCH0(lp->mbuf_in.array[1]);

@@ -530,6 +535,8 @@ app_lcore_worker(
 			uint32_t ipv4_dst, pos;
 			uint32_t port;

+			rte_pktmbuf_free(lp->mbuf_in.array[j]);
+
 			if (likely(j < bsz_rd - 1)) {
 				APP_WORKER_PREFETCH1(rte_pktmbuf_mtod(lp->mbuf_in.array[j+1], unsigned char *));
 			}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Mellanox ConnectX-5 crashes and mbuf leak
  2017-09-26  9:23 Mellanox ConnectX-5 crashes and mbuf leak Martin Weiser
@ 2017-09-26 10:32 ` Olga Shern
  2017-10-05 21:46 ` Yongseok Koh
  1 sibling, 0 replies; 8+ messages in thread
From: Olga Shern @ 2017-09-26 10:32 UTC (permalink / raw)
  To: Martin Weiser, Adrien Mazarguil, Nélio Laranjeiro
  Cc: dev, Raslan Darawsheh

Hi Martin, 

We will look into this issue and will try to reproduce.
Will update you as soon as we have any news.

Can you please send kernel crush stack that you are seeing.

Best Regards,
Olga

________________________________________________________________
Olga Shern 
SW Director DPDK 
Mellanox Technologies, Raanana Israel



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Martin Weiser
> Sent: Tuesday, September 26, 2017 12:24 PM
> To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Nélio Laranjeiro
> <nelio.laranjeiro@6wind.com>
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] Mellanox ConnectX-5 crashes and mbuf leak
> 
> Hi,
> 
> we are currently testing the Mellanox ConnectX-5 100G NIC with DPDK
> 17.08 as well as dpdk-net-next and are
> experiencing mbuf leaks as well as crashes (and in some instances even
> kernel panics in a mlx5 module) under certain load conditions.
> 
> We initially saw these issues only in our own DPDK-based application and it
> took some effort to reproduce this in one of the DPDK example applications.
> However with the attached patch to the load-balancer example we can
> reproduce the issues reliably.
> 
> The patch may look weird at first but I will explain why I made these
> changes:
> 
> * the sleep introduced in the worker threads simulates heavy processing
> which causes the software rx rings to fill
>   up under load. If the rings are large enough (I increased the ring size with
> the load-balancer command line option
>   as you can see in the example call further down) the mbuf pool may run
> empty and I believe this leads to a malfunction
>   in the mlx5 driver. As soon as this happens the NIC will stop forwarding
> traffic, probably because the driver
>   cannot allocate mbufs for the packets received by the NIC.
> Unfortunately when this happens most of the mbufs will
>   never return to the mbuf pool so that even when the traffic stops the pool
> will remain almost empty and the
>   application will not forward traffic even at a very low rate.
> 
> * the use of the reference count in the mbuf in addition to the situation
> described above is what makes the
>   mlx5 DPDK driver crash almost immediately under load. In our application
> we rely on this feature to be able to forward
>   the packet quickly and still send the packet to a worker thread for analysis
> and finally free the packet when analysis is
>   done. Here I simulated this by increasing the mbuf reference count
> immediately after receiving the mbuf from the
>   driver and then calling rte_pktmbuf_free in the worker thread which should
> only decrement the reference count again
>   and not actually free the mbuf.
> 
> We executed the patched load-balancer application with the following
> command line:
> 
>     ./build/load_balancer -l 3-7 -n 4 -- --rx "(0,0,3),(1,0,3)" --tx "(0,3),(1,3)" --w
> "4" --lpm "16.0.0.0/8=>0; 48.0.0.0/8=>1;" --pos-lb 29 --rsz "1024, 32768, 1024,
> 1024"
> 
> Then we generated traffic using the t-rex traffic generator and the sfr test
> case. On our machine the issues start to happen when the traffic exceeds ~6
> Gbps but this may vary depending on how powerful the test machine is (by
> the way we were able to reproduce this on different types of hardware).
> 
> A typical stacktrace looks like this:
> 
>     Thread 1 "load_balancer" received signal SIGSEGV, Segmentation fault.
>     0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
> out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
>     716      __builtin_ia32_storedqu ((char *)__P, (__v16qi)__B);
>     (gdb) bt
>     #0  0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
> out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
>     #1  rxq_cq_decompress_v (elts=0x7fff3732bef0, cq=0x7ffff7f99380,
> rxq=0x7fff3732a980) at
> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:679
>     #2  rxq_burst_v (pkts_n=<optimized out>, pkts=0xa7c7b0 <app+432944>,
> rxq=0x7fff3732a980) at
> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1242
>     #3  mlx5_rx_burst_vec (dpdk_rxq=0x7fff3732a980, pkts=<optimized
> out>, pkts_n=<optimized out>) at
> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1277
>     #4  0x000000000043c11d in rte_eth_rx_burst (nb_pkts=3599,
> rx_pkts=0xa7c7b0 <app+432944>, queue_id=0, port_id=0 '\000')
>     at
> /root/dpdk-next-net//x86_64-native-linuxapp-
> gcc/include/rte_ethdev.h:2781
>     #5  app_lcore_io_rx (lp=lp@entry=0xa7c700 <app+432768>,
> n_workers=n_workers@entry=1, bsz_rd=bsz_rd@entry=144,
> bsz_wr=bsz_wr@entry=144, pos_lb=pos_lb@entry=29 '\035')
>     at /root/dpdk-next-net/examples/load_balancer/runtime.c:198
>     #6  0x0000000000447dc0 in app_lcore_main_loop_io () at
> /root/dpdk-next-net/examples/load_balancer/runtime.c:485
>     #7  app_lcore_main_loop (arg=<optimized out>) at
> /root/dpdk-next-net/examples/load_balancer/runtime.c:669
>     #8  0x0000000000495e8b in rte_eal_mp_remote_launch ()
>     #9  0x0000000000441e0d in main (argc=<optimized out>, argv=<optimized
> out>) at
> /root/dpdk-next-net/examples/load_balancer/main.c:99
> 
> The crash does not always happen at the exact same spot but in our tests
> always in the same function.
> In a few instances instead of an application crash the system froze
> completely with what appeared to be a kernel panic. The last output looked
> like a crash in the interrupt handler of a
> mlx5 module but unfortunately I cannot
> provide the exact output right now.
> 
> All tests were performed under Ubuntu 16.04 server running a 4.4.0-96-
> generic kernel and the lasted Mellanox OFED
> MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64 was used.
> 
> Any help with this issue is greatly appreciated.
> 
> Best regards,
> Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mellanox ConnectX-5 crashes and mbuf leak
  2017-09-26  9:23 Mellanox ConnectX-5 crashes and mbuf leak Martin Weiser
  2017-09-26 10:32 ` Olga Shern
@ 2017-10-05 21:46 ` Yongseok Koh
  2017-10-06 14:10   ` Martin Weiser
  1 sibling, 1 reply; 8+ messages in thread
From: Yongseok Koh @ 2017-10-05 21:46 UTC (permalink / raw)
  To: Martin Weiser; +Cc: Adrien Mazarguil, Nélio Laranjeiro, dev

Hi, Martin

Thanks for your thorough and valuable reporting. We could reproduce it. I found
a bug and fixed it. Please refer to the patch [1] I sent to the mailing list.
This might not be automatically applicable to v17.08 as I rebased it on top of
Nelio's flow cleanup patch. But as this is a simple patch, you can easily apply
it manually.

Thanks,
Yongseok

[1] http://dpdk.org/dev/patchwork/patch/29781

> On Sep 26, 2017, at 2:23 AM, Martin Weiser <martin.weiser@allegro-packets.com> wrote:
> 
> Hi,
> 
> we are currently testing the Mellanox ConnectX-5 100G NIC with DPDK
> 17.08 as well as dpdk-net-next and are
> experiencing mbuf leaks as well as crashes (and in some instances even
> kernel panics in a mlx5 module) under
> certain load conditions.
> 
> We initially saw these issues only in our own DPDK-based application and
> it took some effort to reproduce this
> in one of the DPDK example applications. However with the attached patch
> to the load-balancer example we can
> reproduce the issues reliably.
> 
> The patch may look weird at first but I will explain why I made these
> changes:
> 
> * the sleep introduced in the worker threads simulates heavy processing
> which causes the software rx rings to fill
>   up under load. If the rings are large enough (I increased the ring
> size with the load-balancer command line option
>   as you can see in the example call further down) the mbuf pool may run
> empty and I believe this leads to a malfunction
>   in the mlx5 driver. As soon as this happens the NIC will stop
> forwarding traffic, probably because the driver
>   cannot allocate mbufs for the packets received by the NIC.
> Unfortunately when this happens most of the mbufs will
>   never return to the mbuf pool so that even when the traffic stops the
> pool will remain almost empty and the
>   application will not forward traffic even at a very low rate.
> 
> * the use of the reference count in the mbuf in addition to the
> situation described above is what makes the
>   mlx5 DPDK driver crash almost immediately under load. In our
> application we rely on this feature to be able to forward
>   the packet quickly and still send the packet to a worker thread for
> analysis and finally free the packet when analysis is
>   done. Here I simulated this by increasing the mbuf reference count
> immediately after receiving the mbuf from the
>   driver and then calling rte_pktmbuf_free in the worker thread which
> should only decrement the reference count again
>   and not actually free the mbuf.
> 
> We executed the patched load-balancer application with the following
> command line:
> 
>     ./build/load_balancer -l 3-7 -n 4 -- --rx "(0,0,3),(1,0,3)" --tx
> "(0,3),(1,3)" --w "4" --lpm "16.0.0.0/8=>0; 48.0.0.0/8=>1;" --pos-lb 29
> --rsz "1024, 32768, 1024, 1024"
> 
> Then we generated traffic using the t-rex traffic generator and the sfr
> test case. On our machine the issues start
> to happen when the traffic exceeds ~6 Gbps but this may vary depending
> on how powerful the test machine is (by
> the way we were able to reproduce this on different types of hardware).
> 
> A typical stacktrace looks like this:
> 
>     Thread 1 "load_balancer" received signal SIGSEGV, Segmentation fault.
>     0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
> out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
>     716      __builtin_ia32_storedqu ((char *)__P, (__v16qi)__B);
>     (gdb) bt
>     #0  0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
> out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
>     #1  rxq_cq_decompress_v (elts=0x7fff3732bef0, cq=0x7ffff7f99380,
> rxq=0x7fff3732a980) at
> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:679
>     #2  rxq_burst_v (pkts_n=<optimized out>, pkts=0xa7c7b0 <app+432944>,
> rxq=0x7fff3732a980) at
> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1242
>     #3  mlx5_rx_burst_vec (dpdk_rxq=0x7fff3732a980, pkts=<optimized
> out>, pkts_n=<optimized out>) at
> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1277
>     #4  0x000000000043c11d in rte_eth_rx_burst (nb_pkts=3599,
> rx_pkts=0xa7c7b0 <app+432944>, queue_id=0, port_id=0 '\000')
>     at
> /root/dpdk-next-net//x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2781
>     #5  app_lcore_io_rx (lp=lp@entry=0xa7c700 <app+432768>,
> n_workers=n_workers@entry=1, bsz_rd=bsz_rd@entry=144,
> bsz_wr=bsz_wr@entry=144, pos_lb=pos_lb@entry=29 '\035')
>     at /root/dpdk-next-net/examples/load_balancer/runtime.c:198
>     #6  0x0000000000447dc0 in app_lcore_main_loop_io () at
> /root/dpdk-next-net/examples/load_balancer/runtime.c:485
>     #7  app_lcore_main_loop (arg=<optimized out>) at
> /root/dpdk-next-net/examples/load_balancer/runtime.c:669
>     #8  0x0000000000495e8b in rte_eal_mp_remote_launch ()
>     #9  0x0000000000441e0d in main (argc=<optimized out>,
> argv=<optimized out>) at
> /root/dpdk-next-net/examples/load_balancer/main.c:99
> 
> The crash does not always happen at the exact same spot but in our tests
> always in the same function.
> In a few instances instead of an application crash the system froze
> completely with what appeared to be a kernel
> panic. The last output looked like a crash in the interrupt handler of a
> mlx5 module but unfortunately I cannot
> provide the exact output right now.
> 
> All tests were performed under Ubuntu 16.04 server running a
> 4.4.0-96-generic kernel and the lasted Mellanox OFED
> MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64 was used.
> 
> Any help with this issue is greatly appreciated.
> 
> Best regards,
> Martin
> 
> <test.patch>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mellanox ConnectX-5 crashes and mbuf leak
  2017-10-05 21:46 ` Yongseok Koh
@ 2017-10-06 14:10   ` Martin Weiser
  2017-10-06 22:31     ` Yongseok Koh
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Weiser @ 2017-10-06 14:10 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: Adrien Mazarguil, Nélio Laranjeiro, dev

Hi Yongseok,

unfortunately in a quick test using testpmd and ~20Gb/s of traffic with
your patch traffic forwarding always stops completely after a few seconds.

I wanted to test this with the current master of dpdk-next-net but after
"net/mlx5: support upstream rdma-core" it will not compile against
MLNX_OFED_LINUX-4.1-1.0.2.0.
So i used the last commit before that (v17.08-306-gf214841) and applied
your patch leading to the result described above.
Apart from your patch no other modifications were made and without the
patch testpmd forwards the traffic without a problem (in this
configuration mbufs should never run out so this test was never affected
by the original issue).

For this test I simply used testpmd with the following command line:
"testpmd -c 0xfe -- -i" and issued the "start" command. As traffic
generator I used t-rex with the sfr traffic profile.

Best regards,
Martin



On 05.10.17 23:46, Yongseok Koh wrote:
> Hi, Martin
>
> Thanks for your thorough and valuable reporting. We could reproduce it. I found
> a bug and fixed it. Please refer to the patch [1] I sent to the mailing list.
> This might not be automatically applicable to v17.08 as I rebased it on top of
> Nelio's flow cleanup patch. But as this is a simple patch, you can easily apply
> it manually.
>
> Thanks,
> Yongseok
>
> [1] http://dpdk.org/dev/patchwork/patch/29781
>
>> On Sep 26, 2017, at 2:23 AM, Martin Weiser <martin.weiser@allegro-packets.com> wrote:
>>
>> Hi,
>>
>> we are currently testing the Mellanox ConnectX-5 100G NIC with DPDK
>> 17.08 as well as dpdk-net-next and are
>> experiencing mbuf leaks as well as crashes (and in some instances even
>> kernel panics in a mlx5 module) under
>> certain load conditions.
>>
>> We initially saw these issues only in our own DPDK-based application and
>> it took some effort to reproduce this
>> in one of the DPDK example applications. However with the attached patch
>> to the load-balancer example we can
>> reproduce the issues reliably.
>>
>> The patch may look weird at first but I will explain why I made these
>> changes:
>>
>> * the sleep introduced in the worker threads simulates heavy processing
>> which causes the software rx rings to fill
>>   up under load. If the rings are large enough (I increased the ring
>> size with the load-balancer command line option
>>   as you can see in the example call further down) the mbuf pool may run
>> empty and I believe this leads to a malfunction
>>   in the mlx5 driver. As soon as this happens the NIC will stop
>> forwarding traffic, probably because the driver
>>   cannot allocate mbufs for the packets received by the NIC.
>> Unfortunately when this happens most of the mbufs will
>>   never return to the mbuf pool so that even when the traffic stops the
>> pool will remain almost empty and the
>>   application will not forward traffic even at a very low rate.
>>
>> * the use of the reference count in the mbuf in addition to the
>> situation described above is what makes the
>>   mlx5 DPDK driver crash almost immediately under load. In our
>> application we rely on this feature to be able to forward
>>   the packet quickly and still send the packet to a worker thread for
>> analysis and finally free the packet when analysis is
>>   done. Here I simulated this by increasing the mbuf reference count
>> immediately after receiving the mbuf from the
>>   driver and then calling rte_pktmbuf_free in the worker thread which
>> should only decrement the reference count again
>>   and not actually free the mbuf.
>>
>> We executed the patched load-balancer application with the following
>> command line:
>>
>>     ./build/load_balancer -l 3-7 -n 4 -- --rx "(0,0,3),(1,0,3)" --tx
>> "(0,3),(1,3)" --w "4" --lpm "16.0.0.0/8=>0; 48.0.0.0/8=>1;" --pos-lb 29
>> --rsz "1024, 32768, 1024, 1024"
>>
>> Then we generated traffic using the t-rex traffic generator and the sfr
>> test case. On our machine the issues start
>> to happen when the traffic exceeds ~6 Gbps but this may vary depending
>> on how powerful the test machine is (by
>> the way we were able to reproduce this on different types of hardware).
>>
>> A typical stacktrace looks like this:
>>
>>     Thread 1 "load_balancer" received signal SIGSEGV, Segmentation fault.
>>     0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
>> out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
>>     716      __builtin_ia32_storedqu ((char *)__P, (__v16qi)__B);
>>     (gdb) bt
>>     #0  0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
>> out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
>>     #1  rxq_cq_decompress_v (elts=0x7fff3732bef0, cq=0x7ffff7f99380,
>> rxq=0x7fff3732a980) at
>> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:679
>>     #2  rxq_burst_v (pkts_n=<optimized out>, pkts=0xa7c7b0 <app+432944>,
>> rxq=0x7fff3732a980) at
>> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1242
>>     #3  mlx5_rx_burst_vec (dpdk_rxq=0x7fff3732a980, pkts=<optimized
>> out>, pkts_n=<optimized out>) at
>> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1277
>>     #4  0x000000000043c11d in rte_eth_rx_burst (nb_pkts=3599,
>> rx_pkts=0xa7c7b0 <app+432944>, queue_id=0, port_id=0 '\000')
>>     at
>> /root/dpdk-next-net//x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2781
>>     #5  app_lcore_io_rx (lp=lp@entry=0xa7c700 <app+432768>,
>> n_workers=n_workers@entry=1, bsz_rd=bsz_rd@entry=144,
>> bsz_wr=bsz_wr@entry=144, pos_lb=pos_lb@entry=29 '\035')
>>     at /root/dpdk-next-net/examples/load_balancer/runtime.c:198
>>     #6  0x0000000000447dc0 in app_lcore_main_loop_io () at
>> /root/dpdk-next-net/examples/load_balancer/runtime.c:485
>>     #7  app_lcore_main_loop (arg=<optimized out>) at
>> /root/dpdk-next-net/examples/load_balancer/runtime.c:669
>>     #8  0x0000000000495e8b in rte_eal_mp_remote_launch ()
>>     #9  0x0000000000441e0d in main (argc=<optimized out>,
>> argv=<optimized out>) at
>> /root/dpdk-next-net/examples/load_balancer/main.c:99
>>
>> The crash does not always happen at the exact same spot but in our tests
>> always in the same function.
>> In a few instances instead of an application crash the system froze
>> completely with what appeared to be a kernel
>> panic. The last output looked like a crash in the interrupt handler of a
>> mlx5 module but unfortunately I cannot
>> provide the exact output right now.
>>
>> All tests were performed under Ubuntu 16.04 server running a
>> 4.4.0-96-generic kernel and the lasted Mellanox OFED
>> MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64 was used.
>>
>> Any help with this issue is greatly appreciated.
>>
>> Best regards,
>> Martin
>>
>> <test.patch>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mellanox ConnectX-5 crashes and mbuf leak
  2017-10-06 14:10   ` Martin Weiser
@ 2017-10-06 22:31     ` Yongseok Koh
  2017-10-07 22:19       ` Yongseok Koh
  0 siblings, 1 reply; 8+ messages in thread
From: Yongseok Koh @ 2017-10-06 22:31 UTC (permalink / raw)
  To: Martin Weiser; +Cc: Adrien Mazarguil, Nélio Laranjeiro, dev, Ferruh Yigit

Hi, Martin

Even though I had done quite serious tests before sending out the patch,
I figured out deadlock could happen if the Rx queue size is smaller. It is 128
by default in testpmd while I usually use 256.

I've fixed the bug and submitted a new patch [1], which actually reverts the
previous patch. So, you can apply the attached with disregarding the old one.

And I have also done extensive tests for this new patch but please let me know
your test results.

[1]
"net/mlx5: fix deadlock due to buffered slots in Rx SW ring"
at http://dpdk.org/dev/patchwork/patch/29847


Thanks,
Yongseok


diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.c b/drivers/net/mlx5/mlx5_rxtx_vec_sse.c
index aff3359..9d37954 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.c
@@ -549,7 +549,7 @@ rxq_replenish_bulk_mbuf(struct rxq *rxq, uint16_t n)
 {
        const uint16_t q_n = 1 << rxq->elts_n;
        const uint16_t q_mask = q_n - 1;
-       const uint16_t elts_idx = rxq->rq_ci & q_mask;
+       uint16_t elts_idx = rxq->rq_ci & q_mask;
        struct rte_mbuf **elts = &(*rxq->elts)[elts_idx];
        volatile struct mlx5_wqe_data_seg *wq = &(*rxq->wqes)[elts_idx];
        unsigned int i;
@@ -567,6 +567,11 @@ rxq_replenish_bulk_mbuf(struct rxq *rxq, uint16_t n)
                wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr +
                                              RTE_PKTMBUF_HEADROOM);
        rxq->rq_ci += n;
+       /* Prevent overflowing into consumed mbufs. */
+       elts_idx = rxq->rq_ci & q_mask;
+       for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; i += 2)
+               _mm_storeu_si128((__m128i *)&(*rxq->elts)[elts_idx + i],
+                                _mm_set1_epi64x((uintptr_t)&rxq->fake_mbuf));
        rte_wmb();
        *rxq->rq_db = rte_cpu_to_be_32(rxq->rq_ci);
 }



> On Oct 6, 2017, at 7:10 AM, Martin Weiser <martin.weiser@allegro-packets.com> wrote:
> 
> Hi Yongseok,
> 
> unfortunately in a quick test using testpmd and ~20Gb/s of traffic with
> your patch traffic forwarding always stops completely after a few seconds.
> 
> I wanted to test this with the current master of dpdk-next-net but after
> "net/mlx5: support upstream rdma-core" it will not compile against
> MLNX_OFED_LINUX-4.1-1.0.2.0.
> So i used the last commit before that (v17.08-306-gf214841) and applied
> your patch leading to the result described above.
> Apart from your patch no other modifications were made and without the
> patch testpmd forwards the traffic without a problem (in this
> configuration mbufs should never run out so this test was never affected
> by the original issue).
> 
> For this test I simply used testpmd with the following command line:
> "testpmd -c 0xfe -- -i" and issued the "start" command. As traffic
> generator I used t-rex with the sfr traffic profile.
> 
> Best regards,
> Martin
> 
> 
> 
> On 05.10.17 23:46, Yongseok Koh wrote:
>> Hi, Martin
>> 
>> Thanks for your thorough and valuable reporting. We could reproduce it. I found
>> a bug and fixed it. Please refer to the patch [1] I sent to the mailing list.
>> This might not be automatically applicable to v17.08 as I rebased it on top of
>> Nelio's flow cleanup patch. But as this is a simple patch, you can easily apply
>> it manually.
>> 
>> Thanks,
>> Yongseok
>> 
>> [1] https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpdk.org%2Fdev%2Fpatchwork%2Fpatch%2F29781&data=02%7C01%7Cyskoh%40mellanox.com%7C61eea153c6ca4966b26c08d50cc3f763%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636428958171139449&sdata=d%2BEj79F%2BRZ03rkREti%2Fhaw9pYl8kF5bG7CkhK1kGQSs%3D&reserved=0
>> 
>>> On Sep 26, 2017, at 2:23 AM, Martin Weiser <martin.weiser@allegro-packets.com> wrote:
>>> 
>>> Hi,
>>> 
>>> we are currently testing the Mellanox ConnectX-5 100G NIC with DPDK
>>> 17.08 as well as dpdk-net-next and are
>>> experiencing mbuf leaks as well as crashes (and in some instances even
>>> kernel panics in a mlx5 module) under
>>> certain load conditions.
>>> 
>>> We initially saw these issues only in our own DPDK-based application and
>>> it took some effort to reproduce this
>>> in one of the DPDK example applications. However with the attached patch
>>> to the load-balancer example we can
>>> reproduce the issues reliably.
>>> 
>>> The patch may look weird at first but I will explain why I made these
>>> changes:
>>> 
>>> * the sleep introduced in the worker threads simulates heavy processing
>>> which causes the software rx rings to fill
>>>  up under load. If the rings are large enough (I increased the ring
>>> size with the load-balancer command line option
>>>  as you can see in the example call further down) the mbuf pool may run
>>> empty and I believe this leads to a malfunction
>>>  in the mlx5 driver. As soon as this happens the NIC will stop
>>> forwarding traffic, probably because the driver
>>>  cannot allocate mbufs for the packets received by the NIC.
>>> Unfortunately when this happens most of the mbufs will
>>>  never return to the mbuf pool so that even when the traffic stops the
>>> pool will remain almost empty and the
>>>  application will not forward traffic even at a very low rate.
>>> 
>>> * the use of the reference count in the mbuf in addition to the
>>> situation described above is what makes the
>>>  mlx5 DPDK driver crash almost immediately under load. In our
>>> application we rely on this feature to be able to forward
>>>  the packet quickly and still send the packet to a worker thread for
>>> analysis and finally free the packet when analysis is
>>>  done. Here I simulated this by increasing the mbuf reference count
>>> immediately after receiving the mbuf from the
>>>  driver and then calling rte_pktmbuf_free in the worker thread which
>>> should only decrement the reference count again
>>>  and not actually free the mbuf.
>>> 
>>> We executed the patched load-balancer application with the following
>>> command line:
>>> 
>>>    ./build/load_balancer -l 3-7 -n 4 -- --rx "(0,0,3),(1,0,3)" --tx
>>> "(0,3),(1,3)" --w "4" --lpm "16.0.0.0/8=>0; 48.0.0.0/8=>1;" --pos-lb 29
>>> --rsz "1024, 32768, 1024, 1024"
>>> 
>>> Then we generated traffic using the t-rex traffic generator and the sfr
>>> test case. On our machine the issues start
>>> to happen when the traffic exceeds ~6 Gbps but this may vary depending
>>> on how powerful the test machine is (by
>>> the way we were able to reproduce this on different types of hardware).
>>> 
>>> A typical stacktrace looks like this:
>>> 
>>>    Thread 1 "load_balancer" received signal SIGSEGV, Segmentation fault.
>>>    0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
>>> out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
>>>    716      __builtin_ia32_storedqu ((char *)__P, (__v16qi)__B);
>>>    (gdb) bt
>>>    #0  0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=<optimized
>>> out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
>>>    #1  rxq_cq_decompress_v (elts=0x7fff3732bef0, cq=0x7ffff7f99380,
>>> rxq=0x7fff3732a980) at
>>> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:679
>>>    #2  rxq_burst_v (pkts_n=<optimized out>, pkts=0xa7c7b0 <app+432944>,
>>> rxq=0x7fff3732a980) at
>>> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1242
>>>    #3  mlx5_rx_burst_vec (dpdk_rxq=0x7fff3732a980, pkts=<optimized
>>> out>, pkts_n=<optimized out>) at
>>> /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1277
>>>    #4  0x000000000043c11d in rte_eth_rx_burst (nb_pkts=3599,
>>> rx_pkts=0xa7c7b0 <app+432944>, queue_id=0, port_id=0 '\000')
>>>    at
>>> /root/dpdk-next-net//x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2781
>>>    #5  app_lcore_io_rx (lp=lp@entry=0xa7c700 <app+432768>,
>>> n_workers=n_workers@entry=1, bsz_rd=bsz_rd@entry=144,
>>> bsz_wr=bsz_wr@entry=144, pos_lb=pos_lb@entry=29 '\035')
>>>    at /root/dpdk-next-net/examples/load_balancer/runtime.c:198
>>>    #6  0x0000000000447dc0 in app_lcore_main_loop_io () at
>>> /root/dpdk-next-net/examples/load_balancer/runtime.c:485
>>>    #7  app_lcore_main_loop (arg=<optimized out>) at
>>> /root/dpdk-next-net/examples/load_balancer/runtime.c:669
>>>    #8  0x0000000000495e8b in rte_eal_mp_remote_launch ()
>>>    #9  0x0000000000441e0d in main (argc=<optimized out>,
>>> argv=<optimized out>) at
>>> /root/dpdk-next-net/examples/load_balancer/main.c:99
>>> 
>>> The crash does not always happen at the exact same spot but in our tests
>>> always in the same function.
>>> In a few instances instead of an application crash the system froze
>>> completely with what appeared to be a kernel
>>> panic. The last output looked like a crash in the interrupt handler of a
>>> mlx5 module but unfortunately I cannot
>>> provide the exact output right now.
>>> 
>>> All tests were performed under Ubuntu 16.04 server running a
>>> 4.4.0-96-generic kernel and the lasted Mellanox OFED
>>> MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64 was used.
>>> 
>>> Any help with this issue is greatly appreciated.
>>> 
>>> Best regards,
>>> Martin
>>> 
>>> <test.patch>
> 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Mellanox ConnectX-5 crashes and mbuf leak
  2017-10-06 22:31     ` Yongseok Koh
@ 2017-10-07 22:19       ` Yongseok Koh
  2017-10-10  8:10         ` Martin Weiser
  0 siblings, 1 reply; 8+ messages in thread
From: Yongseok Koh @ 2017-10-07 22:19 UTC (permalink / raw)
  To: Martin Weiser; +Cc: Adrien Mazarguil, Nélio Laranjeiro, dev, Ferruh Yigit

> On Oct 6, 2017, at 3:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> Hi, Martin
> 
> Even though I had done quite serious tests before sending out the patch,
> I figured out deadlock could happen if the Rx queue size is smaller. It is 128
> by default in testpmd while I usually use 256.
> 
> I've fixed the bug and submitted a new patch [1], which actually reverts the
> previous patch. So, you can apply the attached with disregarding the old one.
> 
> And I have also done extensive tests for this new patch but please let me know
> your test results.
> 
> [1]
> "net/mlx5: fix deadlock due to buffered slots in Rx SW ring"
> at http://dpdk.org/dev/patchwork/patch/29847

Hi Martin

I've submitted v2 of the patch [1]. I just replaced vector insns with regular
statements.  This is just for ease of maintenance because I'm about to add
vectorized PMD for ARM NEON.  In terms of functionality and performance it is
identical.

Please proceed your testing with this and let me know the result.

[1]
[dpdk-dev,v2] net/mlx5: fix deadlock due to buffered slots in Rx SW ring
, which is at http://dpdk.org/dev/patchwork/patch/29879/

Thanks,
Yongseok

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mellanox ConnectX-5 crashes and mbuf leak
  2017-10-07 22:19       ` Yongseok Koh
@ 2017-10-10  8:10         ` Martin Weiser
  2017-10-10 14:28           ` Yongseok Koh
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Weiser @ 2017-10-10  8:10 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: Adrien Mazarguil, Nélio Laranjeiro, dev, Ferruh Yigit

Hi Yongseok,

I can confirm that this patch fixes the crashes and freezing in my tests
so far.

We still see an issue that once the mbufs run low and reference counts
are used as well as freeing of mbufs in processing lcores happens we
suddenly lose a large amount of mbufs that will never return to the
pool. But I can also reproduce this with ixgbe so this is not specific
to the mlx5 driver but rather an issue of the current dpdk-net-next
state. I will write up a separate mail with details how to reproduce this.

Thank you for your support!

Best regards,
Martin


On 08.10.17 00:19, Yongseok Koh wrote:
>> On Oct 6, 2017, at 3:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>
>> Hi, Martin
>>
>> Even though I had done quite serious tests before sending out the patch,
>> I figured out deadlock could happen if the Rx queue size is smaller. It is 128
>> by default in testpmd while I usually use 256.
>>
>> I've fixed the bug and submitted a new patch [1], which actually reverts the
>> previous patch. So, you can apply the attached with disregarding the old one.
>>
>> And I have also done extensive tests for this new patch but please let me know
>> your test results.
>>
>> [1]
>> "net/mlx5: fix deadlock due to buffered slots in Rx SW ring"
>> at http://dpdk.org/dev/patchwork/patch/29847
> Hi Martin
>
> I've submitted v2 of the patch [1]. I just replaced vector insns with regular
> statements.  This is just for ease of maintenance because I'm about to add
> vectorized PMD for ARM NEON.  In terms of functionality and performance it is
> identical.
>
> Please proceed your testing with this and let me know the result.
>
> [1]
> [dpdk-dev,v2] net/mlx5: fix deadlock due to buffered slots in Rx SW ring
> , which is at http://dpdk.org/dev/patchwork/patch/29879/
>
> Thanks,
> Yongseok
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mellanox ConnectX-5 crashes and mbuf leak
  2017-10-10  8:10         ` Martin Weiser
@ 2017-10-10 14:28           ` Yongseok Koh
  0 siblings, 0 replies; 8+ messages in thread
From: Yongseok Koh @ 2017-10-10 14:28 UTC (permalink / raw)
  To: Martin Weiser; +Cc: Adrien Mazarguil, Nélio Laranjeiro, dev, Ferruh Yigit

Glad to hear that helped!
Then the patch will get merged soon.

Thanks,
Yongseok

> On Oct 10, 2017, at 1:10 AM, Martin Weiser <martin.weiser@allegro-packets.com> wrote:
> 
> Hi Yongseok,
> 
> I can confirm that this patch fixes the crashes and freezing in my tests
> so far.
> 
> We still see an issue that once the mbufs run low and reference counts
> are used as well as freeing of mbufs in processing lcores happens we
> suddenly lose a large amount of mbufs that will never return to the
> pool. But I can also reproduce this with ixgbe so this is not specific
> to the mlx5 driver but rather an issue of the current dpdk-net-next
> state. I will write up a separate mail with details how to reproduce this.
> 
> Thank you for your support!
> 
> Best regards,
> Martin
> 
> 
> On 08.10.17 00:19, Yongseok Koh wrote:
>>> On Oct 6, 2017, at 3:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>> 
>>> Hi, Martin
>>> 
>>> Even though I had done quite serious tests before sending out the patch,
>>> I figured out deadlock could happen if the Rx queue size is smaller. It is 128
>>> by default in testpmd while I usually use 256.
>>> 
>>> I've fixed the bug and submitted a new patch [1], which actually reverts the
>>> previous patch. So, you can apply the attached with disregarding the old one.
>>> 
>>> And I have also done extensive tests for this new patch but please let me know
>>> your test results.
>>> 
>>> [1]
>>> "net/mlx5: fix deadlock due to buffered slots in Rx SW ring"
>>> at https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpdk.org%2Fdev%2Fpatchwork%2Fpatch%2F29847&data=02%7C01%7Cyskoh%40mellanox.com%7Cd026493e00eb429cb6b608d50fb673ed%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636432198675778556&sdata=YOZQFrSeIM%2BFw42CvqazNDYSv98jKB%2F2bMRSqrYo2a8%3D&reserved=0
>> Hi Martin
>> 
>> I've submitted v2 of the patch [1]. I just replaced vector insns with regular
>> statements.  This is just for ease of maintenance because I'm about to add
>> vectorized PMD for ARM NEON.  In terms of functionality and performance it is
>> identical.
>> 
>> Please proceed your testing with this and let me know the result.
>> 
>> [1]
>> [dpdk-dev,v2] net/mlx5: fix deadlock due to buffered slots in Rx SW ring
>> , which is at https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpdk.org%2Fdev%2Fpatchwork%2Fpatch%2F29879%2F&data=02%7C01%7Cyskoh%40mellanox.com%7Cd026493e00eb429cb6b608d50fb673ed%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636432198675778556&sdata=IrrEKKLWYRarrbE2McSzytYaQ4zdh1nAnsWErgijd%2Fg%3D&reserved=0
>> 
>> Thanks,
>> Yongseok

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-10-10 14:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-26  9:23 Mellanox ConnectX-5 crashes and mbuf leak Martin Weiser
2017-09-26 10:32 ` Olga Shern
2017-10-05 21:46 ` Yongseok Koh
2017-10-06 14:10   ` Martin Weiser
2017-10-06 22:31     ` Yongseok Koh
2017-10-07 22:19       ` Yongseok Koh
2017-10-10  8:10         ` Martin Weiser
2017-10-10 14:28           ` Yongseok Koh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.