UDP rx packet loss in a cgroup with a memory limit

* UDP rx packet loss in a cgroup with a memory limit
@ 2022-08-16 18:52 Gražvydas Ignotas
       [not found] ` <CANOLnON11vzvVdyJfW+QJ36siWR4-s=HJ2aRKpRy7CP=aRPoSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Gražvydas Ignotas @ 2022-08-16 18:52 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

I'm unsure if it's supposed to be like this, but I'm seeing this on
various hardware combinations/VMs and Debian kernel versions, plus
self-compiled vanilla 5.19.1 I just tried. It looks like this only
happens on cgroup v2:

Debian11/bullseye (cgroup v2), distro kernel: yes
Debian11/bullseye (cgroup v2), vanilla 5.19.1: yes
Debian10/buster (cgroup v1), bpo kernel: no
Debian10/buster (cgroup v2)*, bpo kernel: yes
* - booted with 'systemd.unified_cgroup_hierarchy=1' to enable cgroup v2

Basically, when there is git activity in the container with a memory
limit, other processes in the same container start to suffer (very)
occasional network issues (mostly DNS lookup failures). Git's or other
processes' memory usage doesn't seem to be anywhere close to the
limit. The fact about packet drops can be seen from /proc/net/snmp
"Udp InErrors" counter increasing, as well as "drops" counter
increasing in /proc/net/udp . Some other random details about this:
- stopping git (its disk activity?) makes the packet loss stop
- tcpdump (ran in the container itself) shows packet correctly
arriving without errors, but the process times out waiting for
response
- if memory limit is removed the problem disappears
- if memory limit is set to host's RAM size, the problem disappears
- reducing dirty_ratio, dirty_background_ratio doesn't help

My recipe to reproduce:
- install kubernetes on a host machine with Debian11 and 32GB RAM
- create a debian9 container with 'resources: limits: memory: "8G"'
- in the container:

# run this:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
cd linux
while git checkout linux-2.6.32.y && git checkout linux-5.19.y; do true; done
# at the same time in the same container:
while sleep .1; do host <remotehost>. > /dev/null; awk '/^Udp:
[0-9]/{print $4}' /proc/net/snmp; done

The packet drop counter should start increasing after some time. The
effect is much stronger if the git repository is bigger and has
different multi-gigabyte files in those branches. Can something be
done to avoid this packet loss?

Gra≈ævydas

^ permalink raw reply	[flat|nested] 13+ messages in thread