From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751852Ab2GOJOq (ORCPT ); Sun, 15 Jul 2012 05:14:46 -0400 Received: from nat.nue.novell.com ([195.135.221.2]:50609 "EHLO nat.nue.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750773Ab2GOJOk (ORCPT ); Sun, 15 Jul 2012 05:14:40 -0400 Message-ID: <1342343673.28142.2.camel@marge.simpson.net> Subject: Re: Deadlocks due to per-process plugging From: Mike Galbraith To: Thomas Gleixner Cc: Jan Kara , Jeff Moyer , LKML , linux-fsdevel@vger.kernel.org, Tejun Heo , Jens Axboe , mgalbraith@suse.com Date: Sun, 15 Jul 2012 11:14:33 +0200 In-Reply-To: References: <20120711133735.GA8122@quack.suse.cz> <20120711201601.GB9779@quack.suse.cz> <20120713123318.GB20361@quack.suse.cz> <20120713144622.GB28715@quack.suse.cz> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 2012-07-15 at 10:59 +0200, Thomas Gleixner wrote: > On Fri, 13 Jul 2012, Jan Kara wrote: > > On Fri 13-07-12 16:25:05, Thomas Gleixner wrote: > > > So the patch below should allow the unplug to take place when blocked > > > on mutexes etc. > > Thanks for the patch! Mike will give it some testing. > > I just found out that this patch will explode nicely when the unplug > code runs into a contended lock. Then we try to block on that lock and > make the rtmutex code unhappy as we are already blocked on something > else. Kinda like so? My x3550 M3 just exploded. Aw poo. [ 6669.133081] Kernel panic - not syncing: rt_mutex_real_waiter(task->pi_blocked_on) lock: 0xffff880175dfd588 waiter: 0xffff880121fc2d58 [ 6669.133083] [ 6669.133086] Pid: 28240, comm: bonnie++ Tainted: G N 3.0.35-rt56-rt #20 [ 6669.133088] Call Trace: [ 6669.133102] [] dump_trace+0x82/0x2e0 [ 6669.133109] [] dump_stack+0x69/0x6f [ 6669.133114] [] panic+0xa1/0x1e5 [ 6669.133121] [] task_blocks_on_rt_mutex+0x279/0x2c0 [ 6669.133127] [] rt_spin_lock_slowlock+0xb5/0x290 [ 6669.133134] [] blk_flush_plug_list+0x164/0x200 [ 6669.133139] [] schedule+0x5e/0xb0 [ 6669.133143] [] __rt_mutex_slowlock+0x4b/0xd0 [ 6669.133148] [] rt_mutex_slowlock+0xeb/0x210 [ 6669.133154] [] page_referenced_file+0x4e/0x190 [ 6669.133160] [] page_referenced+0x6a/0x230 [ 6669.133166] [] shrink_active_list+0x214/0x3d0 [ 6669.133170] [] shrink_list+0xd4/0x120 [ 6669.133176] [] shrink_zone+0x9c/0x1d0 [ 6669.133180] [] shrink_zones+0x7f/0x1f0 [ 6669.133185] [] do_try_to_free_pages+0x8d/0x370 [ 6669.133189] [] try_to_free_pages+0xea/0x210 [ 6669.133197] [] __alloc_pages_nodemask+0x5b3/0x9f0 [ 6669.133205] [] alloc_pages_current+0xc4/0x150 [ 6669.133211] [] find_or_create_page+0x46/0xb0 [ 6669.133217] [] alloc_extent_buffer+0x226/0x4b0 [ 6669.133225] [] readahead_tree_block+0x19/0x50 [ 6669.133231] [] reada_for_search+0x1cf/0x230 [ 6669.133237] [] read_block_for_search+0x18a/0x200 [ 6669.133242] [] btrfs_search_slot+0x25a/0x7e0 [ 6669.133248] [] btrfs_lookup_csum+0x74/0x180 [ 6669.133254] [] __btrfs_lookup_bio_sums+0x1bf/0x3b0 [ 6669.133260] [] btrfs_submit_bio_hook+0x158/0x1a0 [ 6669.133270] [] submit_one_bio+0x66/0xa0 [ 6669.133274] [] submit_extent_page+0x107/0x220 [ 6669.133278] [] __extent_read_full_page+0x4b9/0x6e0 [ 6669.133284] [] extent_readpages+0xbf/0x100 [ 6669.133289] [] __do_page_cache_readahead+0x1ae/0x250 [ 6669.133295] [] ra_submit+0x1c/0x30 [ 6669.133299] [] do_generic_file_read.clone.0+0x27b/0x450 [ 6669.133305] [] generic_file_aio_read+0x1fb/0x2a0 [ 6669.133313] [] do_sync_read+0xbf/0x100 [ 6669.133319] [] vfs_read+0xc3/0x180 [ 6669.133323] [] sys_read+0x51/0xa0 [ 6669.133329] [] system_call_fastpath+0x16/0x1b [ 6669.133347] [<00007ff8b95bb370>] 0x7ff8b95bb36f > So no, it's not a solution to the problem. Sigh. > > Can you figure out on which lock the stuck thread which did not unplug > due to tsk_is_pi_blocked was blocked? I'll take a peek. -Mike