linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-kernel@vger.kernel.org
Cc: Daniel Wagner <daniel.wagner@siemens.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will.deacon@arm.com>,
	x86@kernel.org, Linus Torvalds <torvalds@linux-foundation.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: [Problem] Cache line starvation
Date: Fri, 21 Sep 2018 14:02:26 +0200	[thread overview]
Message-ID: <20180921120226.6xjgr4oiho22ex75@linutronix.de> (raw)

We reproducibly observe cache line starvation on a Core2Duo E6850 (2
cores), a i5-6400 SKL (4 cores) and on a NXP LS2044A ARM Cortex-A72 (4
cores).

The problem can be triggered with a v4.9-RT kernel by starting

    cyclictest -S -p98 -m  -i2000 -b 200

and as "load"

    stress-ng --ptrace 4

The reported maximal latency is usually less than 60us. If the problem
triggers then values around 400us, 800us or even more are reported. The
upperlimit is the -i parameter.

Reproduction with 4.9-RT is almost immediate on Core2Duo, ARM64 and SKL,
but it took 7.5 hours to trigger on v4.14-RT on the Core2Duo.

Instrumentation show always the picture:

CPU0                                         CPU1
=> do_syscall_64                              => do_syscall_64
=> SyS_ptrace                                   => syscall_slow_exit_work
=> ptrace_check_attach                          => ptrace_do_notify / rt_read_unlock 
=> wait_task_inactive                              rt_spin_lock_slowunlock()
   -> while task_running()                         __rt_mutex_unlock_common()
  /   check_task_state()                           mark_wakeup_next_waiter()
 |     raw_spin_lock_irq(&p->pi_lock);             raw_spin_lock(&current->pi_lock);
 |     .                                               .
 |     raw_spin_unlock_irq(&p->pi_lock);               .
  \  cpu_relax()                                       .
   -                                                   .
    *IRQ*                                          <lock acquired>

In the error case we observe that the while() loop is repeated more than
5000 times which indicates that the pi_lock can be acquired. CPU1 on the
other side does not make progress waiting for the same lock with interrupts
disabled.

This continues until an IRQ hits CPU0. Once CPU0 starts processing the IRQ
the other CPU is able to acquire pi_lock and the situation relaxes.

This matches Daniel Wagner's observations which he described in [0] on
v4.4-RT.

It took us weeks to hunt this down as it's really showing true Heisenbug
behaviour and instrumentation at the wrong place makes it vanish
immediately.

Other data points:

Replacing the single rep_nop() in cpu_relax() with two consecutive
rep_nop() makes it harder to trigger on the Core2Duo machine. Adding a
third one seems to resolve it completely, though no long term runs done
yet. Also our attempt of instrumenting the lock()/unlock() sequence more
deeply with rdtsc() timestamps made it go away. On SKL the identical
instrumentation hides it.

Peter suggested to do a clwb(&p->pi_lock); before the cpu_relax() in
wait_task_inactive() which on both the Core2Duo and the SKL gets runtime
patched to clflush(). That hides it as well.

On ARM64 we suspected something like this without having real proof a week
ago and added a delay loop to cpu_relax(). That "resolved" the issue. By
now we have traces on ARM64 which proof it to be the same issue.

Daniel reported that disabling ticket locks on 4.4 makes the problem go
away, but he hasn't run a long time test yet and as we saw with 4.14 it can
take quite a while.

While it is pretty reliable reproducible with an RT kernel, this is not an
RT kernel issue at all. Mainline has similar or even the same code
patterns, but it's harder to observe as there are no latency guarantees for
an observer like cyclictest. The fact that an interrupt makes it go away
makes it even harder because the rare stalls just go unnoticed because they
are canceled early.

[0] https://lkml.kernel.org/r/4d878d3e-bd1b-3d6e-bdde-4ce840191093@monom.org

Sebastian

             reply	other threads:[~2018-09-21 12:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-21 12:02 Sebastian Andrzej Siewior [this message]
2018-09-21 12:13 ` [Problem] Cache line starvation Thomas Gleixner
2018-09-21 12:50   ` Sebastian Andrzej Siewior
2018-09-21 12:20 ` Peter Zijlstra
2018-09-21 12:54   ` Thomas Gleixner
2018-10-03  7:51   ` Catalin Marinas
2018-10-03  8:07     ` Thomas Gleixner
2018-10-03  8:28       ` Peter Zijlstra
2018-10-03 10:43         ` Thomas Gleixner
2018-10-03  8:23     ` Peter Zijlstra
2018-09-26  7:34 ` Peter Zijlstra
2018-09-26  8:04   ` Thomas Gleixner
2018-09-26 12:53 ` Will Deacon
2018-09-27 14:25   ` Kurt Kanzenbach
2018-09-27 14:41     ` Kurt Kanzenbach
2018-09-27 14:47       ` Thomas Gleixner
2018-09-28  9:05         ` Kurt Kanzenbach
2018-09-28 15:26           ` Kurt Kanzenbach
2018-09-28 19:26         ` Sebastian Andrzej Siewior
2018-09-28 19:34           ` Thomas Gleixner
2018-10-02  6:31 ` Daniel Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180921120226.6xjgr4oiho22ex75@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=daniel.wagner@siemens.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).