All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
	"linux-kernel@vger.kernel.org >> Linux Kernel Mailing List" 
	<linux-kernel@vger.kernel.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>, Oleg Nesterov <oleg@redhat.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: regression 4.4: deadlock in with cgroup percpu_rwsem
Date: Tue, 19 Jan 2016 20:36:18 +0100	[thread overview]
Message-ID: <569E9032.3070903@de.ibm.com> (raw)
In-Reply-To: <20160119095518.GC3528@osiris>

On 01/19/2016 10:55 AM, Heiko Carstens wrote:
> On Mon, Jan 18, 2016 at 07:48:16PM +0100, Christian Borntraeger wrote:
>> On 01/18/2016 07:32 PM, Peter Zijlstra wrote:
>>> On Fri, Jan 15, 2016 at 04:13:34PM +0100, Christian Borntraeger wrote:
>>>>> Yes, the deadlock is gone and the system is still running.
>>>>> After some time I had the following WARN in the logs, though.
>>>>> Not sure yet if that is related.
>>>>>
>>>>> [25331.763607] DEBUG_LOCKS_WARN_ON(lock->owner != current)
>>>>> [25331.763630] ------------[ cut here ]------------
>>>>> [25331.763634] WARNING: at kernel/locking/mutex-debug.c:80
>>>
>>>> I restarted the test with panic_on_warn. Hopefully I can get a dump to check
>>>> which mutex this was.
>>>
>>> Hard to reproduce warnings like this tend to point towards memory
>>> corruption. Someone stepped on the mutex value and tickles the sanity
>>> check.
>>>
>>> With lockdep and debugging enabled the mutex gets quite a bit bigger, so
>>> it gets more likely to be hit by 'random' corruption.
>>>
>>> The locking in seq_read() seems rather straight forward.
>>
>> I was able to reproduce. The dump shows a mutex that has an owner field, which
>> does not exists as a task so this all looks fishy. The good thing is, that I
>> can reproduce the issue within some hours. (exact same backtrace). Will add some
>> more debug data to get a handle where we come from.
> 
> Did the owner field show to something that still looks like a task_struct?

No, its not a task_struct. Activating some more debug information did indeed 
revealed several other issues (overwritten redzones etc). Unfortunately I 
only saw the broken things after the facts, so I do not know which code did that.
When I disabled the cgroup controllers in libvirt I was no longer able to trigger
the bugs. Still trying to narrow things down.

Christian

  reply	other threads:[~2016-01-19 19:36 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-14 11:19 regression 4.4: deadlock in with cgroup percpu_rwsem Christian Borntraeger
2016-01-14 11:19 ` Christian Borntraeger
2016-01-14 13:38 ` Christian Borntraeger
2016-01-14 13:38   ` Christian Borntraeger
2016-01-14 14:04 ` Nikolay Borisov
2016-01-14 14:04   ` Nikolay Borisov
2016-01-14 14:08   ` Christian Borntraeger
2016-01-14 14:08     ` Christian Borntraeger
2016-01-14 14:27     ` Nikolay Borisov
2016-01-14 14:27       ` Nikolay Borisov
2016-01-14 17:15       ` Christian Borntraeger
2016-01-14 17:15         ` Christian Borntraeger
2016-01-14 19:56 ` Tejun Heo
2016-01-14 19:56   ` Tejun Heo
2016-01-15  7:30   ` Christian Borntraeger
2016-01-15  7:30     ` Christian Borntraeger
2016-01-15 15:13     ` Christian Borntraeger
2016-01-15 15:13       ` Christian Borntraeger
2016-01-18 18:32       ` Peter Zijlstra
2016-01-18 18:32         ` Peter Zijlstra
2016-01-18 18:48         ` Christian Borntraeger
2016-01-18 18:48           ` Christian Borntraeger
2016-01-19  9:55           ` Heiko Carstens
2016-01-19  9:55             ` Heiko Carstens
2016-01-19 19:36             ` Christian Borntraeger [this message]
2016-01-19 19:36               ` Christian Borntraeger
2016-01-19 19:38               ` Tejun Heo
2016-01-19 19:38                 ` Tejun Heo
2016-01-20  7:07                 ` Heiko Carstens
2016-01-20  7:07                   ` Heiko Carstens
2016-01-20 10:15                   ` Christian Borntraeger
2016-01-20 10:15                     ` Christian Borntraeger
2016-01-20 10:30                     ` Peter Zijlstra
2016-01-20 10:30                       ` Peter Zijlstra
2016-01-20 10:47                       ` Peter Zijlstra
2016-01-20 10:47                         ` Peter Zijlstra
2016-01-20 15:30                         ` Tejun Heo
2016-01-20 15:30                           ` Tejun Heo
2016-01-20 16:04                           ` Tejun Heo
2016-01-20 16:04                             ` Tejun Heo
2016-01-20 16:49                             ` Peter Zijlstra
2016-01-20 16:49                               ` Peter Zijlstra
2016-01-20 16:56                               ` Tejun Heo
2016-01-20 16:56                                 ` Tejun Heo
2016-01-23  2:03                           ` Paul E. McKenney
2016-01-23  2:03                             ` Paul E. McKenney
2016-01-25  8:49                             ` Christoph Hellwig
2016-01-25  8:49                               ` Christoph Hellwig
2016-01-25 19:38                               ` Tejun Heo
2016-01-25 19:38                                 ` Tejun Heo
2016-01-26 14:51                                 ` Christoph Hellwig
2016-01-26 14:51                                   ` Christoph Hellwig
2016-01-26 15:28                                   ` Tejun Heo
2016-01-26 15:28                                     ` Tejun Heo
2016-01-26 16:41                                     ` Christoph Hellwig
2016-01-26 16:41                                       ` Christoph Hellwig
2016-01-20 10:53                       ` Peter Zijlstra
2016-01-20 10:53                         ` Peter Zijlstra
2016-01-21  8:23                         ` Christian Borntraeger
2016-01-21  8:23                           ` Christian Borntraeger
2016-01-21  9:27                           ` Peter Zijlstra
2016-01-21  9:27                             ` Peter Zijlstra
2016-01-15 16:40     ` Tejun Heo
2016-01-15 16:40       ` Tejun Heo
2016-01-19 17:18       ` [PATCH cgroup/for-4.5-fixes] cpuset: make mm migration asynchronous Tejun Heo
2016-01-19 17:18         ` Tejun Heo
2016-01-22 14:24         ` Christian Borntraeger
2016-01-22 15:22           ` Tejun Heo
2016-01-22 15:45             ` Christian Borntraeger
2016-01-22 15:45               ` Christian Borntraeger
2016-01-22 15:47               ` Tejun Heo
2016-01-22 15:23         ` Tejun Heo
2016-01-22 15:23           ` Tejun Heo
2016-01-21 20:31     ` [PATCH 1/2] cgroup: make sure a parent css isn't offlined before its children Tejun Heo
2016-01-21 20:31       ` Tejun Heo
2016-01-21 20:32       ` [PATCH 2/2] cgroup: make sure a parent css isn't freed " Tejun Heo
2016-01-22 15:45         ` [PATCH v2 " Tejun Heo
2016-01-22 15:45           ` Tejun Heo
2016-01-21 21:24       ` [PATCH 1/2] cgroup: make sure a parent css isn't offlined " Peter Zijlstra
2016-01-21 21:24         ` Peter Zijlstra
2016-01-21 21:28         ` Tejun Heo
2016-01-21 21:28           ` Tejun Heo
2016-01-22  8:18           ` Christian Borntraeger
2016-02-29 11:13         ` [tip:sched/core] sched/cgroup: Fix cgroup entity load tracking tear-down tip-bot for Peter Zijlstra
2016-01-22 15:45       ` [PATCH v2 1/2] cgroup: make sure a parent css isn't offlined before its children Tejun Heo
2016-01-22 15:45         ` Tejun Heo
2016-01-22 15:45         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=569E9032.3070903@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.