From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932773AbcFBNYO (ORCPT ); Thu, 2 Jun 2016 09:24:14 -0400 Received: from mx2.suse.de ([195.135.220.15]:47912 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932346AbcFBNYK (ORCPT ); Thu, 2 Jun 2016 09:24:10 -0400 Subject: Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup To: Sergey Senozhatsky , Andrew Morton , Ebru Akagunduz References: <20160601131122.7dbb0a65@canb.auug.org.au> <20160602014835.GA635@swordfish> Cc: Michal Hocko , "Kirill A. Shutemov" , Stephen Rothwell , linux-mm@kvack.org, linux-next@vger.kernel.org, linux-kernel@vger.kernel.org, Rik van Riel , Andrea Arcangeli From: Vlastimil Babka Message-ID: <0c47a3a0-5530-b257-1c1f-28ed44ba97e6@suse.cz> Date: Thu, 2 Jun 2016 15:24:05 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <20160602014835.GA635@swordfish> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+CC's] On 06/02/2016 03:48 AM, Sergey Senozhatsky wrote: > On (06/01/16 13:11), Stephen Rothwell wrote: >> Hi all, >> >> Changes since 20160531: >> >> My fixes tree contains: >> >> of: silence warnings due to max() usage >> >> The arm tree gained a conflict against Linus' tree. >> >> Non-merge commits (relative to Linus' tree): 1100 >> 936 files changed, 38159 insertions(+), 17475 deletions(-) > > Hello, > > the cc1 process ended up in DN state during kernel -j4 compilation. > > ... > [ 2856.323052] INFO: task cc1:4582 blocked for more than 21 seconds. > [ 2856.323055] Not tainted 4.7.0-rc1-next-20160601-dbg-00012-g52c180e-dirty #453 > [ 2856.323056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 2856.323059] cc1 D ffff880057e9fd78 0 4582 4575 0x00000000 > [ 2856.323062] ffff880057e9fd78 ffff880057e08000 ffff880057e9fd90 ffff880057ea0000 > [ 2856.323065] ffff88005dc3dc68 ffffffff00000001 ffff880057e09500 ffff88005dc3dc80 > [ 2856.323067] ffff880057e9fd90 ffffffff81441e33 ffff88005dc3dc68 ffff880057e9fe00 > [ 2856.323068] Call Trace: > [ 2856.323074] [] schedule+0x83/0x98 > [ 2856.323077] [] rwsem_down_write_failed+0x18e/0x1d3 > [ 2856.323080] [] ? unlock_page+0x2b/0x2d > [ 2856.323083] [] call_rwsem_down_write_failed+0x17/0x30 > [ 2856.323084] [] ? call_rwsem_down_write_failed+0x17/0x30 > [ 2856.323086] [] down_write+0x1f/0x2e > [ 2856.323089] [] __khugepaged_exit+0x104/0x11a > [ 2856.323091] [] mmput+0x29/0xc5 > [ 2856.323093] [] do_exit+0x34c/0x894 > [ 2856.323095] [] ? __do_page_fault+0x2f7/0x399 > [ 2856.323097] [] do_group_exit+0x3c/0x98 > [ 2856.323099] [] SyS_exit_group+0xf/0xf > [ 2856.323101] [] entry_SYSCALL_64_fastpath+0x13/0x8f > > [ 2877.322853] INFO: task cc1:4582 blocked for more than 21 seconds. > [ 2877.322858] Not tainted 4.7.0-rc1-next-20160601-dbg-00012-g52c180e-dirty #453 > [ 2877.322858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 2877.322861] cc1 D ffff880057e9fd78 0 4582 4575 0x00000000 > [ 2877.322865] ffff880057e9fd78 ffff880057e08000 ffff880057e9fd90 ffff880057ea0000 > [ 2877.322867] ffff88005dc3dc68 ffffffff00000001 ffff880057e09500 ffff88005dc3dc80 > [ 2877.322867] ffff880057e9fd90 ffffffff81441e33 ffff88005dc3dc68 ffff880057e9fe00 > [ 2877.322870] Call Trace: > [ 2877.322875] [] schedule+0x83/0x98 > [ 2877.322878] [] rwsem_down_write_failed+0x18e/0x1d3 > [ 2877.322881] [] ? unlock_page+0x2b/0x2d > [ 2877.322884] [] call_rwsem_down_write_failed+0x17/0x30 > [ 2877.322885] [] ? call_rwsem_down_write_failed+0x17/0x30 > [ 2877.322887] [] down_write+0x1f/0x2e > [ 2877.322890] [] __khugepaged_exit+0x104/0x11a > [ 2877.322892] [] mmput+0x29/0xc5 > [ 2877.322894] [] do_exit+0x34c/0x894 > [ 2877.322896] [] ? __do_page_fault+0x2f7/0x399 > [ 2877.322898] [] do_group_exit+0x3c/0x98 > [ 2877.322900] [] SyS_exit_group+0xf/0xf > [ 2877.322902] [] entry_SYSCALL_64_fastpath+0x13/0x8f I think it's this patch: http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-make-swapin-readahead-under-down_read-of-mmap_sem.patch Some parts of the code in collapse_huge_page() that were under down_write(mmap_sem) are under down_read() after the patch. But there's "goto out" which continues via "goto out_up_write" which does up_write(mmap_sem) so there's an imbalance. One path seems to go via both up_read() and up_write(). I can imagine this can cause a stuck down_write() among other things?