kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
From: abejideayodele@gmail.com (Àbéjídé Àyodélé)
To: kernelnewbies@lists.kernelnewbies.org
Subject: OOM in secondary cgroup leading to networking loss
Date: Sat, 28 Jul 2018 19:23:43 -0500	[thread overview]
Message-ID: <CALy30=U9_tTC8K0TvbXaXpGG0aBV4J1SEcDXNLHOqsGF6qFVgw@mail.gmail.com> (raw)

Hi friends,

One one of our machines at work, we observed a sequence of events starting
from an OOM in a secondary cgroup which ends up in the bond interface being
down for a period of up to 12 seconds. Below is some piece of dmesg about when
the bond interface went down:

[Wed Jul 25 19:20:45 2018] Call Trace:
[Wed Jul 25 19:20:45 2018]  <IRQ>
[Wed Jul 25 19:20:45 2018]  ? dev_deactivate_queue.constprop.29+0x60/0x60
[Wed Jul 25 19:20:45 2018]  call_timer_fn+0x30/0x120
[Wed Jul 25 19:20:45 2018]  run_timer_softirq+0x3c8/0x420
[Wed Jul 25 19:20:45 2018]  ? timerqueue_add+0x52/0x80
[Wed Jul 25 19:20:45 2018]  ? enqueue_hrtimer+0x37/0x80
[Wed Jul 25 19:20:45 2018]  ? recalibrate_cpu_khz+0x10/0x10
[Wed Jul 25 19:20:45 2018]  __do_softirq+0xde/0x2b3
[Wed Jul 25 19:20:45 2018]  irq_exit+0xae/0xb0
[Wed Jul 25 19:20:45 2018]  smp_apic_timer_interrupt+0x70/0x120
[Wed Jul 25 19:20:45 2018]  apic_timer_interrupt+0x7d/0x90
[Wed Jul 25 19:20:45 2018]  </IRQ>
[Wed Jul 25 19:20:45 2018] RIP: 0010:cpuidle_enter_state+0xa2/0x2e0
[Wed Jul 25 19:20:45 2018] RSP: 0018:ffffffff9c403eb0 EFLAGS: 00000246
ORIG_RAX: ffffffffffffff10
[Wed Jul 25 19:20:45 2018] RAX: ffff9075c0821a40 RBX: 0006cdb159b6000e RCX:
000000000000001f
[Wed Jul 25 19:20:45 2018] RDX: 0006cdb159b6000e RSI: ffed6f2696159a35 RDI:
0000000000000000
[Wed Jul 25 19:20:45 2018] RBP: ffffd3edc100b900 R08: 0000000000000f48 R09:
0000000000000cfe
[Wed Jul 25 19:20:45 2018] R10: ffffffff9c403e90 R11: 0000000000000f12 R12:
0000000000000003
[Wed Jul 25 19:20:45 2018] R13: ffffffff9c4b03d8 R14: 0000000000000000 R15:
0006cdb159783c8e
[Wed Jul 25 19:20:45 2018]  do_idle+0x181/0x1e0
[Wed Jul 25 19:20:45 2018]  cpu_startup_entry+0x19/0x20
[Wed Jul 25 19:20:45 2018]  start_kernel+0x400/0x408
[Wed Jul 25 19:20:45 2018]  secondary_startup_64+0xa5/0xb0
[Wed Jul 25 19:20:45 2018] Code: 63 8e 60 04 00 00 eb 8f 4c 89 f7 c6 05 79 c7
b8 00 01 e8 00 7c fd ff 89 d9 48 89 c2 4c 89 f6 48 c7 c7 f0 38 28 9c e8 c7 a3
b6 ff <0f> 0b eb bd 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56
[Wed Jul 25 19:20:45 2018] ---[ end trace 2ad2942fe3431402 ]---
[Wed Jul 25 19:20:45 2018] ixgbe 0000:19:00.1 eno2: initiating reset due to tx
timeout
[Wed Jul 25 19:20:45 2018] ixgbe 0000:19:00.1 eno2: Reset adapter
[Wed Jul 25 19:20:48 2018] ixgbe 0000:19:00.0 eno1: initiating reset due to tx
timeout
[Wed Jul 25 19:20:53 2018] ixgbe 0000:19:00.0 eno1: initiating reset due to tx
timeout

We have observed a similar behavior on a 4.15.11 kernel we were running on a
different machine, the current machine these logs are from runs a 4.14.52
kernel. A more detailed dmesg content can be found here:

https://gist.github.com/bjhaid/49a1c58742ef2458984339503290ef9a

I will appreciate any help in figuring out the cause and fix of this issue, or
what the correct mailing list to post this to.

Extra bit of information is the machines are Kubernetes nodes running docker.

Thanks!

Abejide Ayodele
It always seems impossible until it's done. --Nelson Mandela

                 reply	other threads:[~2018-07-29  0:23 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALy30=U9_tTC8K0TvbXaXpGG0aBV4J1SEcDXNLHOqsGF6qFVgw@mail.gmail.com' \
    --to=abejideayodele@gmail.com \
    --cc=kernelnewbies@lists.kernelnewbies.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).