* [PATCH] mm: vmscan: unlock_page page when forcing reclaim
@ 2014-07-18 15:48 Richard Yao
2014-07-18 16:38 ` Johannes Weiner
0 siblings, 1 reply; 3+ messages in thread
From: Richard Yao @ 2014-07-18 15:48 UTC (permalink / raw)
To: linux-kernel
Cc: mthode, kernel, Richard Yao, Andrew Morton, Michal Hocko,
Glauber Costa, Rik van Riel, Vladimir Davydov, Johannes Weiner,
Dave Chinner, open list:MEMORY MANAGEMENT
A small userland program I wrote to assist me in drive forensic
operations soft deadlocked on Linux 3.14.4. The stack trace from /proc
was:
[<ffffffff8112968e>] sleep_on_page_killable+0xe/0x40
[<ffffffff81129829>] wait_on_page_bit_killable+0x79/0x80
[<ffffffff811299a5>] __lock_page_or_retry+0x95/0xc0
[<ffffffff8112a95b>] filemap_fault+0x21b/0x420
[<ffffffff8115685e>] __do_fault+0x6e/0x520
[<ffffffff81156de3>] handle_pte_fault+0xd3/0x1f0
[<ffffffff81157073>] __handle_mm_fault+0x173/0x290
[<ffffffff811571d2>] handle_mm_fault+0x42/0xb0
[<ffffffff81587a11>] __do_page_fault+0x191/0x490
[<ffffffff81587dec>] do_page_fault+0xc/0x10
[<ffffffff81584622>] page_fault+0x22/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
The program used mmap() to do a linear scan of the device on 64-bit
hardware. The block device in question was 200GB in size and the system
had only 8GB of RAM. All IO operations stopped following pageout.
shrink_page_list() seemed to have raced with filemap_fault() by evicting
a page when we had an active fault handler. This is possible only
because 02c6de8d757cb32c0829a45d81c3dfcbcafd998b altered the behavior of
shrink_page_list() to ignore references. Consequently, we must call
unlock_page() instead of __clear_page_locked() when doing this so that
waiters are notified. unlock_page() here will cause active page fault
handlers to retry (depending on the architecture), which avoids the soft
deadlock.
Signed-off-by: Richard Yao <ryao@gentoo.org>
---
mm/vmscan.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3f56c8d..c07c635 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1083,13 +1083,16 @@ static unsigned long shrink_page_list(struct list_head *page_list,
goto keep_locked;
/*
- * At this point, we have no other references and there is
- * no way to pick any more up (removed from LRU, removed
- * from pagecache). Can use non-atomic bitops now (and
+ * Unless we force reclaim, we have no other references and
+ * there is no way to pick any more up (removed from LRU,
+ * removed from pagecache). Can use non-atomic bitops now (and
* we obviously don't have to worry about waking up a process
* waiting on the page lock, because there are no references.
*/
- __clear_page_locked(page);
+ if (force_reclaim)
+ unlock_page(page);
+ else
+ __clear_page_locked(page);
free_it:
nr_reclaimed++;
--
1.8.3.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] mm: vmscan: unlock_page page when forcing reclaim
2014-07-18 15:48 [PATCH] mm: vmscan: unlock_page page when forcing reclaim Richard Yao
@ 2014-07-18 16:38 ` Johannes Weiner
[not found] ` <53C96CBF.4040705@gentoo.org>
0 siblings, 1 reply; 3+ messages in thread
From: Johannes Weiner @ 2014-07-18 16:38 UTC (permalink / raw)
To: Richard Yao
Cc: linux-kernel, mthode, kernel, Andrew Morton, Michal Hocko,
Glauber Costa, Rik van Riel, Vladimir Davydov, Dave Chinner,
open list:MEMORY MANAGEMENT
On Fri, Jul 18, 2014 at 11:48:02AM -0400, Richard Yao wrote:
> A small userland program I wrote to assist me in drive forensic
> operations soft deadlocked on Linux 3.14.4. The stack trace from /proc
> was:
>
> [<ffffffff8112968e>] sleep_on_page_killable+0xe/0x40
> [<ffffffff81129829>] wait_on_page_bit_killable+0x79/0x80
> [<ffffffff811299a5>] __lock_page_or_retry+0x95/0xc0
> [<ffffffff8112a95b>] filemap_fault+0x21b/0x420
> [<ffffffff8115685e>] __do_fault+0x6e/0x520
> [<ffffffff81156de3>] handle_pte_fault+0xd3/0x1f0
> [<ffffffff81157073>] __handle_mm_fault+0x173/0x290
> [<ffffffff811571d2>] handle_mm_fault+0x42/0xb0
> [<ffffffff81587a11>] __do_page_fault+0x191/0x490
> [<ffffffff81587dec>] do_page_fault+0xc/0x10
> [<ffffffff81584622>] page_fault+0x22/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> The program used mmap() to do a linear scan of the device on 64-bit
> hardware. The block device in question was 200GB in size and the system
> had only 8GB of RAM. All IO operations stopped following pageout.
>
> shrink_page_list() seemed to have raced with filemap_fault() by evicting
> a page when we had an active fault handler. This is possible only
> because 02c6de8d757cb32c0829a45d81c3dfcbcafd998b altered the behavior of
> shrink_page_list() to ignore references. Consequently, we must call
> unlock_page() instead of __clear_page_locked() when doing this so that
> waiters are notified. unlock_page() here will cause active page fault
> handlers to retry (depending on the architecture), which avoids the soft
> deadlock.
I don't really understand how the scenario you describe can happen.
Successfully reclaiming a page means that __remove_mapping() was able
to freeze a page count of 2 (page cache and LRU isolation), but
filemap_fault() increases the refcount on the page before trying to
lock the page. If __remove_mapping() wins, find_get_page() does not
work and the fault does not lock the page. If find_get_page() wins,
__remove_mapping() does not work and the reclaimer aborts and does a
regular unlock_page().
page_check_references() is purely about reclaim strategy, it should
not be essential for correctness.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm: vmscan: unlock_page page when forcing reclaim
[not found] ` <53C96CBF.4040705@gentoo.org>
@ 2014-07-21 7:18 ` Vlastimil Babka
0 siblings, 0 replies; 3+ messages in thread
From: Vlastimil Babka @ 2014-07-21 7:18 UTC (permalink / raw)
To: Richard Yao, Johannes Weiner
Cc: linux-kernel, mthode, kernel, Andrew Morton, Michal Hocko,
Glauber Costa, Rik van Riel, Vladimir Davydov, Dave Chinner,
open, list@kvack.org:MEMORY MANAGEMENT
On 07/18/2014 08:51 PM, Richard Yao wrote:
> On 07/18/2014 12:38 PM, Johannes Weiner wrote:
>> I don't really understand how the scenario you describe can happen.
>>
>> Successfully reclaiming a page means that __remove_mapping() was able
>> to freeze a page count of 2 (page cache and LRU isolation), but
>> filemap_fault() increases the refcount on the page before trying to
>> lock the page. If __remove_mapping() wins, find_get_page() does not
>> work and the fault does not lock the page. If find_get_page() wins,
>> __remove_mapping() does not work and the reclaimer aborts and does a
>> regular unlock_page().
>>
>> page_check_references() is purely about reclaim strategy, it should
>> not be essential for correctness.
>>
>
> You are right that something else is happened here. I had not spotted
> the cmpxchg being done in __remove_mapping(). If I spot something that
> looks like it could be what went wrong doing this, I will propose a new
> fix to the list for review. Thanks for your time.
>
> P.S. The system had ECC RAM, so this was not a bit flip. My current
> method for debugging this involves using cscope to construct possible
> call paths under a couple of assumptions:
>
> 1. Something set PG_locked without calling unlock_page().
> 2. The only ways of doing #1 that I see in the code are calling
> __clear_page_locked() or failing to clear the bit. I do not believe that
> a patch was accepted that did the latter, so I assume the former.
Could it be that the process holding the lock was also stuck doing
something, and it was not a missed unlock?
> I have root access to the system, so each time I do a lookup using
> cscope, I go through the list to logically eliminate possibilities by
> inspecting the system where the problem occurred. When I cannot
> eliminate a possibility, I recurse. This is prone to fail positives
> should I miss a subtle piece of code that prevents a problem and it is
> very tedious, but I do not see a better way of debugging based on what I
> have at my disposal. If anyone has any suggestions, I would appreciate them.
You could try enabling VM_DEBUG, possibly LOCKDEP, try a git bisect if
there's a previous known working kernel version...
> P.P.S. I *really* wish that I had used kdump when this issue happened,
> but sadly, the system is not setup for kdump.
So it happened only once so far? How about enabling kdump and waiting if
it happens again.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-07-21 7:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-18 15:48 [PATCH] mm: vmscan: unlock_page page when forcing reclaim Richard Yao
2014-07-18 16:38 ` Johannes Weiner
[not found] ` <53C96CBF.4040705@gentoo.org>
2014-07-21 7:18 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).