netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems with multicast delivery to application layer after kernel 4.4.0-128
@ 2020-11-12 11:52 Vladimir Rybnikov
  0 siblings, 0 replies; only message in thread
From: Vladimir Rybnikov @ 2020-11-12 11:52 UTC (permalink / raw)
  To: netdev; +Cc: Vladimir Rybnikov

Dear Linux kerenl-network experts,


I work at DESY (Hamburg, Germany), and is responsible for Data Acquisition (DAQ) from different accelerators and experiments.


Every DAQ collects data over the network. UDP multicast is used to transfer the data. Every data source has a multicast sender (~ 200 instances). A Dell PowerEdge R730xd server (DAQ server: 256 GB RAM, 40 Cores) is used for the data receiving. The DAQ server has several 10Gb network adapters in different sub-nets to receive multicast from the senders sitting in the corresponding sub-nets.

Every sender is pushing data via a UDP socket bound to a multicast address. The data sending takes place every 100ms (10Hz),

The size of data can vary from some bytes up to several MB.
The data is split into 32KB messages sent via the UDP socket.

A multi-threaded fast collector runs on the DAQ server to receive the data.


We have found and successfully used the values of the kernel parameters to minimize packet losses on all network stack layers till the kernel 4.4.0-128.


Since one year (after trying to switch to other kernels kernel 4.4.0-xxx [sorry I cannot say what xxx was, but after 128], 4.6 .. ) we have a problem that looks like losses on the application layer.


In my current test 2 network interfaces are used. The multicast input rate is ~ 140MB/s for each interface.



  I'm testing kernel 5.6.0-1032-oem.
Previously kernel 5.4.0-52-generic was tested but with the same results.

The signature is the following:
1) no Rx looses in adapters
2) no counting InErrors RcvbufErrors InCsumErrors in /proc/net/snmp
3) no counts in any column but 0 in /proc/net/softnet_stat

The losses show up in bursts from time to time.
4) dropwatch shows "xxx drops in at ip_defrag+171 ..."


Putting it in one line:n my current test 2 network interfaces are used. The multicast input rate is ~ 140MB/s for each interface.

Multicast packages are seen by the network adapters but the application layer from time to time doesn't get them  simultaneously from all senders.

Here are some sysctl parameters  currently used values, I'm aware of, that could
influence on the losses level.

net.core.optmem_max = 40960
net.core.rmem_default = 16777216
net.core.rmem_max = 67108864
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.ipv4.igmp_max_memberships = 512
net.ipv4.udp_mem = 262144    327680    393216
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 4096

net.core.netdev_budget = 100000
net.core.netdev_max_backlog = 100000
net.ipv4.ipfrag_high_thresh = 33554432
net.ipv4.ipfrag_low_thresh = 16777216

All other parameters are without changes as they come with the kernel distribution.

We plan to switch to Ubuntu 20.04 next year, and therefore kernel 5.4(6) is going to be used.


I hope that this problem is solvable on the kernel level.


Many thanks in advance and best regards,

Vladimir

-- 
>/*********************************************************************\
>* Dr. Vladimir Rybnikov      Phone : [49] (40) 8998 4846              *
>* FLA/MCS4                   Fax   : [49] (40) 8998 4448              *
>* Geb. 55a/35                e-mail: Vladimir.Rybnikov@desy.de        *
>*                            WWW   : http://www.desy.de/~rybnikov/    * 
>+                                                                     +
>* Notkestr.85, DESY                                                   *
>* D-22607 Hamburg, Germany                                            *
>\*********************************************************************/


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-11-12 11:58 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12 11:52 Problems with multicast delivery to application layer after kernel 4.4.0-128 Vladimir Rybnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).