All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	linux-mm@kvack.org, linux-next@vger.kernel.org,
	linux-kernel@vger.kernel.org, Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup
Date: Thu, 2 Jun 2016 15:24:05 +0200	[thread overview]
Message-ID: <0c47a3a0-5530-b257-1c1f-28ed44ba97e6@suse.cz> (raw)
In-Reply-To: <20160602014835.GA635@swordfish>

[+CC's]

On 06/02/2016 03:48 AM, Sergey Senozhatsky wrote:
> On (06/01/16 13:11), Stephen Rothwell wrote:
>> Hi all,
>>
>> Changes since 20160531:
>>
>> My fixes tree contains:
>>
>>   of: silence warnings due to max() usage
>>
>> The arm tree gained a conflict against Linus' tree.
>>
>> Non-merge commits (relative to Linus' tree): 1100
>>  936 files changed, 38159 insertions(+), 17475 deletions(-)
>
> Hello,
>
> the cc1 process ended up in DN state during kernel -j4 compilation.
>
> ...
> [ 2856.323052] INFO: task cc1:4582 blocked for more than 21 seconds.
> [ 2856.323055]       Not tainted 4.7.0-rc1-next-20160601-dbg-00012-g52c180e-dirty #453
> [ 2856.323056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2856.323059] cc1             D ffff880057e9fd78     0  4582   4575 0x00000000
> [ 2856.323062]  ffff880057e9fd78 ffff880057e08000 ffff880057e9fd90 ffff880057ea0000
> [ 2856.323065]  ffff88005dc3dc68 ffffffff00000001 ffff880057e09500 ffff88005dc3dc80
> [ 2856.323067]  ffff880057e9fd90 ffffffff81441e33 ffff88005dc3dc68 ffff880057e9fe00
> [ 2856.323068] Call Trace:
> [ 2856.323074]  [<ffffffff81441e33>] schedule+0x83/0x98
> [ 2856.323077]  [<ffffffff81443d9b>] rwsem_down_write_failed+0x18e/0x1d3
> [ 2856.323080]  [<ffffffff810a87cf>] ? unlock_page+0x2b/0x2d
> [ 2856.323083]  [<ffffffff811bdb77>] call_rwsem_down_write_failed+0x17/0x30
> [ 2856.323084]  [<ffffffff811bdb77>] ? call_rwsem_down_write_failed+0x17/0x30
> [ 2856.323086]  [<ffffffff81443630>] down_write+0x1f/0x2e
> [ 2856.323089]  [<ffffffff810ea4f3>] __khugepaged_exit+0x104/0x11a
> [ 2856.323091]  [<ffffffff8103702a>] mmput+0x29/0xc5
> [ 2856.323093]  [<ffffffff8103bbd8>] do_exit+0x34c/0x894
> [ 2856.323095]  [<ffffffff8102f9e0>] ? __do_page_fault+0x2f7/0x399
> [ 2856.323097]  [<ffffffff8103c188>] do_group_exit+0x3c/0x98
> [ 2856.323099]  [<ffffffff8103c1f3>] SyS_exit_group+0xf/0xf
> [ 2856.323101]  [<ffffffff81444cdb>] entry_SYSCALL_64_fastpath+0x13/0x8f
>
> [ 2877.322853] INFO: task cc1:4582 blocked for more than 21 seconds.
> [ 2877.322858]       Not tainted 4.7.0-rc1-next-20160601-dbg-00012-g52c180e-dirty #453
> [ 2877.322858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2877.322861] cc1             D ffff880057e9fd78     0  4582   4575 0x00000000
> [ 2877.322865]  ffff880057e9fd78 ffff880057e08000 ffff880057e9fd90 ffff880057ea0000
> [ 2877.322867]  ffff88005dc3dc68 ffffffff00000001 ffff880057e09500 ffff88005dc3dc80
> [ 2877.322867]  ffff880057e9fd90 ffffffff81441e33 ffff88005dc3dc68 ffff880057e9fe00
> [ 2877.322870] Call Trace:
> [ 2877.322875]  [<ffffffff81441e33>] schedule+0x83/0x98
> [ 2877.322878]  [<ffffffff81443d9b>] rwsem_down_write_failed+0x18e/0x1d3
> [ 2877.322881]  [<ffffffff810a87cf>] ? unlock_page+0x2b/0x2d
> [ 2877.322884]  [<ffffffff811bdb77>] call_rwsem_down_write_failed+0x17/0x30
> [ 2877.322885]  [<ffffffff811bdb77>] ? call_rwsem_down_write_failed+0x17/0x30
> [ 2877.322887]  [<ffffffff81443630>] down_write+0x1f/0x2e
> [ 2877.322890]  [<ffffffff810ea4f3>] __khugepaged_exit+0x104/0x11a
> [ 2877.322892]  [<ffffffff8103702a>] mmput+0x29/0xc5
> [ 2877.322894]  [<ffffffff8103bbd8>] do_exit+0x34c/0x894
> [ 2877.322896]  [<ffffffff8102f9e0>] ? __do_page_fault+0x2f7/0x399
> [ 2877.322898]  [<ffffffff8103c188>] do_group_exit+0x3c/0x98
> [ 2877.322900]  [<ffffffff8103c1f3>] SyS_exit_group+0xf/0xf
> [ 2877.322902]  [<ffffffff81444cdb>] entry_SYSCALL_64_fastpath+0x13/0x8f

I think it's this patch:

http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-make-swapin-readahead-under-down_read-of-mmap_sem.patch

Some parts of the code in collapse_huge_page() that were under 
down_write(mmap_sem) are under down_read() after the patch. But there's 
"goto out" which continues via "goto out_up_write" which does 
up_write(mmap_sem) so there's an imbalance. One path seems to go via 
both up_read() and up_write(). I can imagine this can cause a stuck 
down_write() among other things?

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	linux-mm@kvack.org, linux-next@vger.kernel.org,
	linux-kernel@vger.kernel.org, Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup
Date: Thu, 2 Jun 2016 15:24:05 +0200	[thread overview]
Message-ID: <0c47a3a0-5530-b257-1c1f-28ed44ba97e6@suse.cz> (raw)
In-Reply-To: <20160602014835.GA635@swordfish>

[+CC's]

On 06/02/2016 03:48 AM, Sergey Senozhatsky wrote:
> On (06/01/16 13:11), Stephen Rothwell wrote:
>> Hi all,
>>
>> Changes since 20160531:
>>
>> My fixes tree contains:
>>
>>   of: silence warnings due to max() usage
>>
>> The arm tree gained a conflict against Linus' tree.
>>
>> Non-merge commits (relative to Linus' tree): 1100
>>  936 files changed, 38159 insertions(+), 17475 deletions(-)
>
> Hello,
>
> the cc1 process ended up in DN state during kernel -j4 compilation.
>
> ...
> [ 2856.323052] INFO: task cc1:4582 blocked for more than 21 seconds.
> [ 2856.323055]       Not tainted 4.7.0-rc1-next-20160601-dbg-00012-g52c180e-dirty #453
> [ 2856.323056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2856.323059] cc1             D ffff880057e9fd78     0  4582   4575 0x00000000
> [ 2856.323062]  ffff880057e9fd78 ffff880057e08000 ffff880057e9fd90 ffff880057ea0000
> [ 2856.323065]  ffff88005dc3dc68 ffffffff00000001 ffff880057e09500 ffff88005dc3dc80
> [ 2856.323067]  ffff880057e9fd90 ffffffff81441e33 ffff88005dc3dc68 ffff880057e9fe00
> [ 2856.323068] Call Trace:
> [ 2856.323074]  [<ffffffff81441e33>] schedule+0x83/0x98
> [ 2856.323077]  [<ffffffff81443d9b>] rwsem_down_write_failed+0x18e/0x1d3
> [ 2856.323080]  [<ffffffff810a87cf>] ? unlock_page+0x2b/0x2d
> [ 2856.323083]  [<ffffffff811bdb77>] call_rwsem_down_write_failed+0x17/0x30
> [ 2856.323084]  [<ffffffff811bdb77>] ? call_rwsem_down_write_failed+0x17/0x30
> [ 2856.323086]  [<ffffffff81443630>] down_write+0x1f/0x2e
> [ 2856.323089]  [<ffffffff810ea4f3>] __khugepaged_exit+0x104/0x11a
> [ 2856.323091]  [<ffffffff8103702a>] mmput+0x29/0xc5
> [ 2856.323093]  [<ffffffff8103bbd8>] do_exit+0x34c/0x894
> [ 2856.323095]  [<ffffffff8102f9e0>] ? __do_page_fault+0x2f7/0x399
> [ 2856.323097]  [<ffffffff8103c188>] do_group_exit+0x3c/0x98
> [ 2856.323099]  [<ffffffff8103c1f3>] SyS_exit_group+0xf/0xf
> [ 2856.323101]  [<ffffffff81444cdb>] entry_SYSCALL_64_fastpath+0x13/0x8f
>
> [ 2877.322853] INFO: task cc1:4582 blocked for more than 21 seconds.
> [ 2877.322858]       Not tainted 4.7.0-rc1-next-20160601-dbg-00012-g52c180e-dirty #453
> [ 2877.322858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2877.322861] cc1             D ffff880057e9fd78     0  4582   4575 0x00000000
> [ 2877.322865]  ffff880057e9fd78 ffff880057e08000 ffff880057e9fd90 ffff880057ea0000
> [ 2877.322867]  ffff88005dc3dc68 ffffffff00000001 ffff880057e09500 ffff88005dc3dc80
> [ 2877.322867]  ffff880057e9fd90 ffffffff81441e33 ffff88005dc3dc68 ffff880057e9fe00
> [ 2877.322870] Call Trace:
> [ 2877.322875]  [<ffffffff81441e33>] schedule+0x83/0x98
> [ 2877.322878]  [<ffffffff81443d9b>] rwsem_down_write_failed+0x18e/0x1d3
> [ 2877.322881]  [<ffffffff810a87cf>] ? unlock_page+0x2b/0x2d
> [ 2877.322884]  [<ffffffff811bdb77>] call_rwsem_down_write_failed+0x17/0x30
> [ 2877.322885]  [<ffffffff811bdb77>] ? call_rwsem_down_write_failed+0x17/0x30
> [ 2877.322887]  [<ffffffff81443630>] down_write+0x1f/0x2e
> [ 2877.322890]  [<ffffffff810ea4f3>] __khugepaged_exit+0x104/0x11a
> [ 2877.322892]  [<ffffffff8103702a>] mmput+0x29/0xc5
> [ 2877.322894]  [<ffffffff8103bbd8>] do_exit+0x34c/0x894
> [ 2877.322896]  [<ffffffff8102f9e0>] ? __do_page_fault+0x2f7/0x399
> [ 2877.322898]  [<ffffffff8103c188>] do_group_exit+0x3c/0x98
> [ 2877.322900]  [<ffffffff8103c1f3>] SyS_exit_group+0xf/0xf
> [ 2877.322902]  [<ffffffff81444cdb>] entry_SYSCALL_64_fastpath+0x13/0x8f

I think it's this patch:

http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-make-swapin-readahead-under-down_read-of-mmap_sem.patch

Some parts of the code in collapse_huge_page() that were under 
down_write(mmap_sem) are under down_read() after the patch. But there's 
"goto out" which continues via "goto out_up_write" which does 
up_write(mmap_sem) so there's an imbalance. One path seems to go via 
both up_read() and up_write(). I can imagine this can cause a stuck 
down_write() among other things?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-06-02 13:24 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01  3:11 linux-next: Tree for Jun 1 Stephen Rothwell
2016-06-02  1:48 ` [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup Sergey Senozhatsky
2016-06-02  1:48   ` Sergey Senozhatsky
2016-06-02  9:21   ` Michal Hocko
2016-06-02  9:21     ` Michal Hocko
2016-06-02 12:08     ` Sergey Senozhatsky
2016-06-02 12:08       ` Sergey Senozhatsky
2016-06-02 12:21       ` Michal Hocko
2016-06-02 12:21         ` Michal Hocko
2016-06-03 13:51         ` Andrea Arcangeli
2016-06-03 13:51           ` Andrea Arcangeli
2016-06-03 14:46           ` Michal Hocko
2016-06-03 14:46             ` Michal Hocko
2016-06-03 15:10             ` Andrea Arcangeli
2016-06-03 15:10               ` Andrea Arcangeli
2016-06-07  7:34               ` Michal Hocko
2016-06-07  7:34                 ` Michal Hocko
2016-06-08  8:19               ` Vlastimil Babka
2016-06-08  8:19                 ` Vlastimil Babka
2016-06-03  7:15     ` Sergey Senozhatsky
2016-06-03  7:15       ` Sergey Senozhatsky
2016-06-03  7:25       ` Michal Hocko
2016-06-03  7:25         ` Michal Hocko
2016-06-03  8:43         ` Sergey Senozhatsky
2016-06-03  8:43           ` Sergey Senozhatsky
2016-06-03  9:55           ` Michal Hocko
2016-06-03  9:55             ` Michal Hocko
2016-06-03 10:05             ` Michal Hocko
2016-06-03 10:05               ` Michal Hocko
2016-06-03 13:38               ` Sergey Senozhatsky
2016-06-03 13:38                 ` Sergey Senozhatsky
2016-06-03 13:45                 ` Michal Hocko
2016-06-03 13:45                   ` Michal Hocko
2016-06-03 13:49                   ` Michal Hocko
2016-06-03 13:49                     ` Michal Hocko
2016-06-03 13:49                     ` Michal Hocko
2016-06-04  7:51                     ` Sergey Senozhatsky
2016-06-04  7:51                       ` Sergey Senozhatsky
2016-06-06  8:39                       ` Michal Hocko
2016-06-06  8:39                         ` Michal Hocko
2016-06-02 13:24   ` Vlastimil Babka [this message]
2016-06-02 13:24     ` Vlastimil Babka
2016-06-02 18:58     ` Ebru Akagunduz
2016-06-02 18:58       ` Ebru Akagunduz
2016-06-03  1:00       ` Sergey Senozhatsky
2016-06-03  1:00         ` Sergey Senozhatsky
2016-06-03  1:29         ` Sergey Senozhatsky
2016-06-03  1:29           ` Sergey Senozhatsky
2016-06-03  4:14           ` Sergey Senozhatsky
2016-06-03  4:14             ` Sergey Senozhatsky
2016-06-03 12:28     ` [PATCH] mm, thp: fix locking inconsistency in collapse_huge_page Ebru Akagunduz
2016-06-03 12:28       ` Ebru Akagunduz
2016-06-06 13:05       ` Vlastimil Babka
2016-06-06 13:05         ` Vlastimil Babka
2016-06-09  3:51         ` Sergey Senozhatsky
2016-06-09  3:51           ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c47a3a0-5530-b257-1c1f-28ed44ba97e6@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebru.akagunduz@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-next@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=riel@redhat.com \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sfr@canb.auug.org.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.