All of lore.kernel.org
 help / color / mirror / Atom feed
From: Austin Schuh <austin@peloton-tech.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Weinberger <richard.weinberger@gmail.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	rt-users <linux-rt-users@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT
Date: Thu, 26 Jun 2014 17:07:47 -0700	[thread overview]
Message-ID: <CANGgnMa+qtgJ3wwg_h5Rynw5vEvZpQZ6PvaUfXNQ8+Y3Yu5U0g@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1406270027300.5170@nanos>

On Thu, Jun 26, 2014 at 3:35 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 26 Jun 2014, Austin Schuh wrote:
>> On Wed, May 21, 2014 at 12:33 AM, Richard Weinberger
>> <richard.weinberger@gmail.com> wrote:
>> > CC'ing RT folks
>> >
>> > On Wed, May 21, 2014 at 8:23 AM, Austin Schuh <austin@peloton-tech.com> wrote:
>> >> On Tue, May 13, 2014 at 7:29 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>> >>> Hi,
>> >>>
>> >>> I am observing a filesystem lockup with XFS on a CONFIG_PREEMPT_RT
>> >>> patched kernel.  I have currently only triggered it using dpkg.  Dave
>> >>> Chinner on the XFS mailing list suggested that it was a rt-kernel
>> >>> workqueue issue as opposed to a XFS problem after looking at the
>> >>> kernel messages.
>>
>> I've got a 100% reproducible test case that doesn't involve a
>> filesystem.  I wrote a module that triggers the bug when the device is
>> written to, making it easy to enable tracing during the event and
>> capture everything.
>>
>> It looks like rw_semaphores don't trigger wq_worker_sleeping to run
>> when work goes to sleep on a rw_semaphore.  This only happens with the
>> RT patches, not with the mainline kernel.  I'm foreseeing a second
>> deadlock/bug coming into play shortly.  If a task holding the work
>> pool spinlock gets preempted, and we need to schedule more work from
>> another worker thread which was just blocked by a mutex, we'll then
>> end up trying to go to sleep on 2 locks at once.
>
> I remember vaguely, that I've seen and analyzed that quite some time
> ago. I can't page in all the gory details right now, but I have a look
> how the related code changed in the last couple of years tomorrow
> morning with an awake brain.
>
> Steven, you did some analysis on that IIRC, or was that just related
> to rw_locks?
>
> Thanks,
>
>         tglx

If I'm reading the rt patch correctly, wq_worker_sleeping was moved
out of __schedule to sched_submit_work.  It looks like that changes
the conditions under which wq_worker_sleeping is called.  It used to
be called whenever a task was going to sleep (I think).  It looks like
it is called now if the task is going to sleep, and if the task isn't
blocked on a PI mutex (I think).

If I try the following experiment

 static inline void sched_submit_work(struct task_struct *tsk)
 {
+   if (tsk->state && tsk->flags & PF_WQ_WORKER) {
+     wq_worker_sleeping(tsk);
+     return;
+   }

and then remove the call later in the function, I am able to pass my test.

Unfortunately, I then get a recursive pool spinlock BUG_ON after a
while (as I would expect), and it all blows up.

I'm not sure where to go from there.  Any changes to the workpool to
try to fix that will be hard, or could affect latency significantly.

Austin

  reply	other threads:[~2014-06-27  0:08 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-14  2:29 Filesystem lockup with CONFIG_PREEMPT_RT Austin Schuh
2014-05-14  2:29 ` Austin Schuh
2014-05-21  6:23 ` Austin Schuh
2014-05-21  6:23   ` Austin Schuh
2014-05-21  7:33   ` Richard Weinberger
2014-05-21  7:33     ` Richard Weinberger
2014-06-26 19:50     ` Austin Schuh
2014-06-26 22:35       ` Thomas Gleixner
2014-06-27  0:07         ` Austin Schuh [this message]
2014-06-27  3:22           ` Mike Galbraith
2014-06-27 12:57           ` Mike Galbraith
2014-06-27 14:01             ` Steven Rostedt
2014-06-27 17:34               ` Mike Galbraith
2014-06-27 17:54                 ` Steven Rostedt
2014-06-27 18:07                   ` Mike Galbraith
2014-06-27 18:19                     ` Steven Rostedt
2014-06-27 19:11                       ` Mike Galbraith
2014-06-28  1:18                       ` Austin Schuh
2014-06-28  3:32                         ` Mike Galbraith
2014-06-28  6:20                           ` Austin Schuh
2014-06-28  7:11                             ` Mike Galbraith
2014-06-27 14:24           ` Thomas Gleixner
2014-06-28  4:51             ` Mike Galbraith
2014-07-01  0:12             ` Austin Schuh
2014-07-01  0:53               ` Austin Schuh
2014-07-05 20:26                 ` Thomas Gleixner
2014-07-06  4:55                   ` Austin Schuh
2014-07-01  3:01             ` Austin Schuh
2014-07-01 19:32               ` Austin Schuh
2014-07-03 23:08                 ` Austin Schuh
2014-07-04  4:42                   ` Mike Galbraith
2014-05-21 19:30 John Blackwood
2014-05-21 19:30 ` John Blackwood
2014-05-21 21:59 ` Austin Schuh
2014-05-21 21:59   ` Austin Schuh
2014-07-05 20:36 ` Thomas Gleixner
2014-07-05 20:36   ` Thomas Gleixner
2014-07-05 19:30 Jan de Kruyf
2014-07-07  8:48 Jan de Kruyf
2014-07-07 13:00 ` Thomas Gleixner
2014-07-07 16:23 ` Austin Schuh
2014-07-08  8:03   ` Jan de Kruyf
2014-07-08 16:09     ` Austin Schuh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANGgnMa+qtgJ3wwg_h5Rynw5vEvZpQZ6PvaUfXNQ8+Y3Yu5U0g@mail.gmail.com \
    --to=austin@peloton-tech.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=richard.weinberger@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.