From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.wl.linuxfoundation.org ([198.145.29.98]:44202 "EHLO mail.wl.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726073AbeJEICb (ORCPT ); Fri, 5 Oct 2018 04:02:31 -0400 Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0634428537 for ; Fri, 5 Oct 2018 01:06:16 +0000 (UTC) From: bugzilla-daemon@bugzilla.kernel.org Subject: [Bug 201331] deadlock (XFS?) Date: Fri, 05 Oct 2018 01:06:15 +0000 Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: linux-xfs@kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=201331 --- Comment #5 from Dave Chinner (david@fromorbit.com) --- On Thu, Oct 04, 2018 at 11:25:49PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=201331 > > --- Comment #4 from edo (edo.rus@gmail.com) --- > I tested with 4.17 and 4.18 prebuilt Debian kernels, behavior is the same: > Sep 30 16:01:23 storage10x10n1 kernel: [23683.218388] INFO: task > kworker/u24:0:21848 blocked for more than 120 seconds. I think we need to rename XFS to "The Messenger: Please don't shoot me"... :) >>From the xfs_info: sunit=4096 swidth=32768 blks Ok, that looks wrong - why do you have a MD raid device with 16MB stripe unit and a 128MB stripe width? Yup: md3 : active raid6 sda4[0] sdj4[9] sdg4[6] sdd4[3] sdi4[8] sdf4[5] sde4[4] sdh4[7] sdb4[2] sdc4[1] 77555695616 blocks super 1.2 level 6, 16384k chunk, algorithm 2 [10/10] [UUUUUUUUUU] bitmap: 9/73 pages [36KB], 65536KB chunk You've configured your RAID6 device with a 16MB chunk size, which gives the XFS su/sw noted above. Basically, your RMW'd your RAID device to death because every write is a sub-stripe write. > Workqueue: writeback wb_workfn (flush-9:3) > Call Trace: > schedule+0x32/0x80 > bitmap_startwrite+0x161/0x1e0 [md_mod] MD blocks here when it has too many inflight bitmap updates and so waits for IO to complete before starting another. This isn't XFS filesystem IO - this in internal MD RAID consistency information that it needs to write for crash recovery purposes. This will be a direct result of the raid device configuration.... > add_stripe_bio+0x441/0x7d0 [raid456] > raid5_make_request+0x1ae/0xb10 [raid456] > md_handle_request+0x116/0x190 [md_mod] > md_make_request+0x65/0x160 [md_mod] > generic_make_request+0x1e7/0x410 > submit_bio+0x6c/0x140 > xfs_add_to_ioend+0x14c/0x280 [xfs] > xfs_do_writepage+0x2bb/0x680 [xfs] > write_cache_pages+0x1ed/0x430 > xfs_vm_writepages+0x64/0xa0 [xfs] > do_writepages+0x1a/0x60 > __writeback_single_inode+0x3d/0x320 > writeback_sb_inodes+0x221/0x4b0 > __writeback_inodes_wb+0x87/0xb0 > wb_writeback+0x288/0x320 > wb_workfn+0x37c/0x450 ... and this is just the writeback path - your problem has nothing do with XFS... Cheers, Dave. -- You are receiving this mail because: You are watching the assignee of the bug.