All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	Nate Custer <nate@cpanel.net>,
	kvm@vger.kernel.org, linux-kernel <linux-kernel@vger.kernel.org>,
	Vivek Goyal <vgoyal@redhat.com>
Subject: Re: kvm deadlock
Date: Wed, 14 Dec 2011 17:03:54 +0100	[thread overview]
Message-ID: <4EE8C8EA.9070207@kernel.dk> (raw)
In-Reply-To: <4EE8A7ED.7060703@redhat.com>

On 2011-12-14 14:43, Avi Kivity wrote:
> On 12/14/2011 02:25 PM, Marcelo Tosatti wrote:
>> On Mon, Dec 05, 2011 at 04:48:16PM -0600, Nate Custer wrote:
>>> Hello,
>>>
>>> I am struggling with repeatable full hardware locks when running 8-12 KVM vms. At some point before the hard lock I get a inconsistent lock state warning. An example of this can be found here:
>>>
>>> http://pastebin.com/8wKhgE2C
>>>
>>> After that the server continues to run for a while and then starts its death spiral. When it reaches that point it fails to log anything further to the disk, but by attaching a console I have been able to get a stack trace documenting the final implosion:
>>>
>>> http://pastebin.com/PbcN76bd
>>>
>>> All of the cores end up hung and the server stops responding to all input, including SysRq commands. 
>>>
>>> I have seen this behavior on two machines (dual E5606 running Fedora 16) both passed cpuburnin testing and memtest86 scans without error. 
>>>
>>> I have reproduced the crash and stack traces from a Fedora debugging kernel - 3.1.2-1 and with a vanilla 3.1.4 kernel.
>>
>> Busted hardware, apparently. Can you reproduce these issues with the
>> same workload on different hardware?
> 
> I don't think it's hardware related.  The second trace (in the first
> paste) is called during swap, so GFP_FS is set.  The first one is not,
> so GFP_FS is clear.  Lockdep is worried about the following scenario:
> 
>   acpi_early_init() is called
>   calls pcpu_alloc(), which takes pcpu_alloc_mutex
>   eventually, calls kmalloc(), or some other allocation function
>   no memory, so swap
>   call try_to_free_pages()
>   submit_bio()
>   blk_throtl_bio()
>   blkio_alloc_blkg_stats()
>   alloc_percpu()
>   pcpu_alloc(), which takes pcpu_alloc_mutex
>   deadlock
> 
> It's a little unlikely that acpi_early_init() will OOM, but lockdep
> doesn't know that.  Other callers of pcpu_alloc() could trigger the same
> thing.
> 
> When lockdep says
> 
> [ 5839.924953] other info that might help us debug this:
> [ 5839.925396]  Possible unsafe locking scenario:
> [ 5839.925397]
> [ 5839.925840]        CPU0
> [ 5839.926063]        ----
> [ 5839.926287]   lock(pcpu_alloc_mutex);
> [ 5839.926533]   <Interrupt>
> [ 5839.926756]     lock(pcpu_alloc_mutex);
> [ 5839.926986]
> 
> It really means
> 
>    <swap, set GFP_FS>
> 
> GFP_FS simply marks the beginning of a nested, unrelated context that
> uses the same thread, just like an interrupt.  Kudos to lockdep for
> catching that.
> 
> I think the allocation in blkio_alloc_blkg_stats() should be moved out
> of the I/O path into some init function. Copying Jens.

That's completely buggy, basically you end up with a GFP_KERNEL
allocation from the IO submit path. Vivek, per_cpu data needs to be set
up at init time. You can't allocate it dynamically off the IO path.

-- 
Jens Axboe


  parent reply	other threads:[~2011-12-14 16:04 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-05 22:48 kvm deadlock Nate Custer
2011-12-14 12:25 ` Marcelo Tosatti
2011-12-14 13:43   ` Avi Kivity
2011-12-14 14:00     ` Marcelo Tosatti
2011-12-14 14:02       ` Avi Kivity
2011-12-14 14:06         ` Marcelo Tosatti
2011-12-14 14:17           ` Nate Custer
2011-12-14 14:20             ` Marcelo Tosatti
2011-12-14 14:28             ` Avi Kivity
2011-12-14 14:27           ` Avi Kivity
2011-12-14 16:03     ` Jens Axboe [this message]
2011-12-14 17:03       ` Vivek Goyal
2011-12-14 17:09         ` Jens Axboe
2011-12-14 17:22           ` Vivek Goyal
2011-12-14 18:16             ` Tejun Heo
2011-12-14 18:41               ` Vivek Goyal
2011-12-14 23:06                 ` Vivek Goyal
2011-12-15 19:47       ` [RFT PATCH] blkio: alloc per cpu data from worker thread context( Re: kvm deadlock) Vivek Goyal
     [not found]         ` <E73DB38E-AFC5-445D-9E76-DE599B36A814@cpanel.net>
2011-12-16 20:29           ` Vivek Goyal
2011-12-18 21:25             ` Nate Custer
2011-12-19 13:40               ` Vivek Goyal
2011-12-19 17:27               ` Vivek Goyal
2011-12-19 17:35                 ` Tejun Heo
2011-12-19 18:27                   ` Vivek Goyal
2011-12-19 22:56                     ` Tejun Heo
2011-12-20 14:50                       ` Vivek Goyal
2011-12-20 20:45                         ` Tejun Heo
2011-12-20 12:49                     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EE8C8EA.9070207@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nate@cpanel.net \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.