linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap
@ 2019-07-30 16:25 Qian Cai
  2019-07-31  5:34 ` Minchan Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Qian Cai @ 2019-07-30 16:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

OOM workloads with swapping is unable to recover with linux-next since next-
20190729 due to the commit "mm: account nr_isolated_xxx in
[isolate|putback]_lru_page" breaks OOM with swap" [1]

[1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kernel.org/
T/#mdcd03bcb4746f2f23e6f508c205943726aee8355

For example, LTP oom01 test case is stuck for hours, while it finishes in a few
minutes here after reverted the above commit. Sometimes, it prints those message
while hanging.

[  509.983393][  T711] INFO: task oom01:5331 blocked for more than 122 seconds.
[  509.983431][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.983447][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.983477][  T711] oom01           D24656  5331   5157 0x00040000
[  509.983513][  T711] Call Trace:
[  509.983538][  T711] [c00020037d00f880] [0000000000000008] 0x8 (unreliable)
[  509.983583][  T711] [c00020037d00fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.983615][  T711] [c00020037d00fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.983647][  T711] [c00020037d00fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.983684][  T711] [c00020037d00fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.983727][  T711] [c00020037d00fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.983758][  T711] [c00020037d00fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.983801][  T711] [c00020037d00fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.983832][  T711] INFO: task oom01:5333 blocked for more than 122 seconds.
[  509.983862][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.983887][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.983928][  T711] oom01           D26352  5333   5157 0x00040000
[  509.983964][  T711] Call Trace:
[  509.983990][  T711] [c00020089ae4f880] [c0000000015d1180]
rcu_lock_map+0x0/0x20 (unreliable)
[  509.984038][  T711] [c00020089ae4fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.984078][  T711] [c00020089ae4fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.984121][  T711] [c00020089ae4fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.984151][  T711] [c00020089ae4fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.984193][  T711] [c00020089ae4fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.984244][  T711] [c00020089ae4fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.984284][  T711] [c00020089ae4fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.984324][  T711] INFO: task oom01:5339 blocked for more than 122 seconds.
[  509.984362][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.984388][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.984429][  T711] oom01           D26352  5339   5157 0x00040000
[  509.984469][  T711] Call Trace:
[  509.984493][  T711] [c00020175cd4f880] [c00020175cd4f8d0] 0xc00020175cd4f8d0
(unreliable)
[  509.984545][  T711] [c00020175cd4fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.984586][  T711] [c00020175cd4fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.984628][  T711] [c00020175cd4fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.984678][  T711] [c00020175cd4fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.984732][  T711] [c00020175cd4fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.984751][  T711] [c00020175cd4fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.984791][  T711] [c00020175cd4fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.984840][  T711] INFO: task oom01:5341 blocked for more than 122 seconds.
[  509.984879][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.984916][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.984966][  T711] oom01           D26352  5341   5157 0x00040000
[  509.985008][  T711] Call Trace:
[  509.985032][  T711] [c000200d8aa6f880] [c000200d8aa6f8d0] 0xc000200d8aa6f8d0
(unreliable)
[  509.985074][  T711] [c000200d8aa6fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.985112][  T711] [c000200d8aa6fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.985152][  T711] [c000200d8aa6fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.985191][  T711] [c000200d8aa6fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.985234][  T711] [c000200d8aa6fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.985285][  T711] [c000200d8aa6fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.985335][  T711] [c000200d8aa6fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.985387][  T711] INFO: task oom01:5348 blocked for more than 122 seconds.
[  509.985424][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.985470][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.985522][  T711] oom01           D26352  5348   5157 0x00040000
[  509.985565][  T711] Call Trace:
[  509.985588][  T711] [c00020089a46f880] [c00020089a46f8d0] 0xc00020089a46f8d0
(unreliable)
[  509.985628][  T711] [c00020089a46fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.985669][  T711] [c00020089a46fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.985711][  T711] [c00020089a46fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.985751][  T711] [c00020089a46fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.985793][  T711] [c00020089a46fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.985836][  T711] [c00020089a46fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.985887][  T711] [c00020089a46fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.985937][  T711] INFO: task oom01:5355 blocked for more than 122 seconds.
[  509.985976][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.986012][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.986054][  T711] oom01           D26352  5355   5157 0x00040000
[  509.986092][  T711] Call Trace:
[  509.986125][  T711] [c0002011b220f880] [c0002011b220f8d0] 0xc0002011b220f8d0
(unreliable)
[  509.986157][  T711] [c0002011b220fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.986194][  T711] [c0002011b220fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.986235][  T711] [c0002011b220fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.986273][  T711] [c0002011b220fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.986315][  T711] [c0002011b220fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.986356][  T711] [c0002011b220fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.986397][  T711] [c0002011b220fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.986437][  T711] INFO: task oom01:5356 blocked for more than 122 seconds.
[  509.986474][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.986512][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.986621][  T711] oom01           D26352  5356   5157 0x00040000
[  509.986715][  T711] Call Trace:
[  509.986748][  T711] [c00020107806f880] [0000000000000008] 0x8 (unreliable)
[  509.986830][  T711] [c00020107806fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.986937][  T711] [c00020107806fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.987028][  T711] [c00020107806fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.987123][  T711] [c00020107806fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.987232][  T711] [c00020107806fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.987317][  T711] [c00020107806fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.987445][  T711] [c00020107806fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.987528][  T711] INFO: task oom01:5363 blocked for more than 122 seconds.
[  509.987626][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.987728][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.987829][  T711] oom01           D26352  5363   5157 0x00040000
[  509.987899][  T711] Call Trace:
[  509.987934][  T711] [c0002010f510f880] [c0000000015d1180]
rcu_lock_map+0x0/0x20 (unreliable)
[  509.988040][  T711] [c0002010f510fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.988162][  T711] [c0002010f510fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.988246][  T711] [c0002010f510fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.988322][  T711] [c0002010f510fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.988429][  T711] [c0002010f510fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.988528][  T711] [c0002010f510fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.988625][  T711] [c0002010f510fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.988737][  T711] INFO: task oom01:5364 blocked for more than 122 seconds.
[  509.988836][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.988894][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.988989][  T711] oom01           D26352  5364   5157 0x00040000
[  509.989097][  T711] Call Trace:
[  509.989137][  T711] [c0002017e444f880] [c0002017e444f8d0] 0xc0002017e444f8d0
(unreliable)
[  509.989248][  T711] [c0002017e444fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.989345][  T711] [c0002017e444fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.989459][  T711] [c0002017e444fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.989528][  T711] [c0002017e444fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.989652][  T711] [c0002017e444fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.989747][  T711] [c0002017e444fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.989859][  T711] [c0002017e444fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.989972][  T711] INFO: task oom01:5367 blocked for more than 122 seconds.
[  509.990074][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
[  509.990141][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  509.990221][  T711] oom01           D26352  5367   5157 0x00040000
[  509.990295][  T711] Call Trace:
[  509.990376][  T711] [c0002008d216f880] [c0002008d216f8d0] 0xc0002008d216f8d0
(unreliable)
[  509.990494][  T711] [c0002008d216fa60] [c000000000023724]
__switch_to+0x3a4/0x520
[  509.990601][  T711] [c0002008d216fad0] [c0000000008d17bc]
__schedule+0x2fc/0x950
[  509.990684][  T711] [c0002008d216fba0] [c0000000008d1e68] schedule+0x58/0x150
[  509.990755][  T711] [c0002008d216fbd0] [c0000000008d7614]
rwsem_down_read_slowpath+0x4b4/0x630
[  509.990886][  T711] [c0002008d216fc90] [c0000000008d7dfc]
down_read+0x12c/0x240
[  509.990965][  T711] [c0002008d216fd20] [c00000000005fb28]
__do_page_fault+0x6f8/0xee0
[  509.991077][  T711] [c0002008d216fe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[  509.991187][  T711] 
[  509.991187][  T711] Showing all locks held in the system:
[  509.991361][  T711] 1 lock held by khungtaskd/711:
[  509.991375][  T711]  #0: 000000006e6271c2 (rcu_read_lock){....}, at:
debug_show_all_locks+0x50/0x170
[  509.991520][  T711] 1 lock held by systemd-udevd/1612:
[  509.991577][  T711]  #0: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.991728][  T711] 1 lock held by oom01/5331:
[  509.991766][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.991876][  T711] 2 locks held by oom01/5332:
[  509.991930][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.992037][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.992174][  T711] 1 lock held by oom01/5333:
[  509.992228][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.992350][  T711] 2 locks held by oom01/5334:
[  509.992407][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.992524][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.992631][  T711] 1 lock held by oom01/5335:
[  509.992707][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.992828][  T711] 2 locks held by oom01/5336:
[  509.992890][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.992996][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.993135][  T711] 1 lock held by oom01/5337:
[  509.993202][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.993352][  T711] 2 locks held by oom01/5338:
[  509.993395][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.993519][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.993647][  T711] 1 lock held by oom01/5339:
[  509.993694][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.993800][  T711] 2 locks held by oom01/5340:
[  509.993874][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.994014][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.994157][  T711] 1 lock held by oom01/5341:
[  509.994229][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.994328][  T711] 2 locks held by oom01/5342:
[  509.994382][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.994511][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.994638][  T711] 2 locks held by oom01/5343:
[  509.994729][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.994825][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.994976][  T711] 2 locks held by oom01/5344:
[  509.995036][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.995134][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.995258][  T711] 2 locks held by oom01/5345:
[  509.995306][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.995432][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.995540][  T711] 2 locks held by oom01/5346:
[  509.995601][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.995704][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.995822][  T711] 2 locks held by oom01/5347:
[  509.995898][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.996025][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.996143][  T711] 1 lock held by oom01/5348:
[  509.996208][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.996313][  T711] 2 locks held by oom01/5349:
[  509.996369][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.996497][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.996621][  T711] 1 lock held by oom01/5350:
[  509.996675][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.996791][  T711] 2 locks held by oom01/5351:
[  509.996837][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.996946][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.997067][  T711] 1 lock held by oom01/5352:
[  509.997126][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.997258][  T711] 2 locks held by oom01/5353:
[  509.997295][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.997400][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.997533][  T711] 2 locks held by oom01/5354:
[  509.997595][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.997711][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.997837][  T711] 1 lock held by oom01/5355:
[  509.997886][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.998005][  T711] 1 lock held by oom01/5356:
[  509.998056][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.998169][  T711] 1 lock held by oom01/5357:
[  509.998221][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.998357][  T711] 2 locks held by oom01/5358:
[  509.998395][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.998507][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.998632][  T711] 2 locks held by oom01/5359:
[  509.998672][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.998805][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.998917][  T711] 2 locks held by oom01/5360:
[  509.998967][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.999069][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.999178][  T711] 2 locks held by oom01/5361:
[  509.999250][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.999373][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.999483][  T711] 2 locks held by oom01/5362:
[  509.999552][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  509.999638][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  509.999763][  T711] 1 lock held by oom01/5363:
[  509.999833][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  509.999932][  T711] 1 lock held by oom01/5364:
[  510.000003][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.000136][  T711] 2 locks held by oom01/5365:
[  510.000174][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.000289][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.000421][  T711] 1 lock held by oom01/5366:
[  510.000471][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.000593][  T711] 1 lock held by oom01/5367:
[  510.000650][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.000762][  T711] 2 locks held by oom01/5368:
[  510.000799][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.000920][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.001038][  T711] 2 locks held by oom01/5369:
[  510.001115][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.001249][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.001354][  T711] 1 lock held by oom01/5370:
[  510.001404][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.001546][  T711] 2 locks held by oom01/5371:
[  510.001579][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.001701][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.001809][  T711] 1 lock held by oom01/5372:
[  510.001876][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.001979][  T711] 2 locks held by oom01/5373:
[  510.002034][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.002148][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.002272][  T711] 1 lock held by oom01/5374:
[  510.002337][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.002473][  T711] 2 locks held by oom01/5375:
[  510.002521][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.002627][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.002755][  T711] 2 locks held by oom01/5376:
[  510.002809][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.002950][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.003064][  T711] 1 lock held by oom01/5377:
[  510.003118][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.003241][  T711] 2 locks held by oom01/5378:
[  510.003310][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.003403][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.003543][  T711] 2 locks held by oom01/5379:
[  510.003610][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.003708][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.003830][  T711] 1 lock held by oom01/5380:
[  510.003891][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.004021][  T711] 1 lock held by oom01/5381:
[  510.004071][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.004183][  T711] 2 locks held by oom01/5382:
[  510.004250][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.004369][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.004486][  T711] 2 locks held by oom01/5383:
[  510.004558][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.004692][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.004797][  T711] 1 lock held by oom01/5384:
[  510.004865][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.004962][  T711] 1 lock held by oom01/5385:
[  510.005002][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
vm_mmap_pgoff+0x8c/0x160
[  510.005128][  T711] 1 lock held by oom01/5386:
[  510.005203][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.005318][  T711] 2 locks held by oom01/5387:
[  510.005385][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.005490][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.005604][  T711] 2 locks held by oom01/5388:
[  510.005662][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.005798][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.005908][  T711] 2 locks held by oom01/5389:
[  510.005978][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.006093][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.006206][  T711] 2 locks held by oom01/5390:
[  510.006257][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.006380][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.006510][  T711] 1 lock held by oom01/5391:
[  510.006569][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.006686][  T711] 2 locks held by oom01/5392:
[  510.006743][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.006849][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.006985][  T711] 2 locks held by oom01/5393:
[  510.007044][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.007143][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.007256][  T711] 1 lock held by oom01/5394:
[  510.007315][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.007426][  T711] 2 locks held by oom01/5395:
[  510.007504][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.007615][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.007753][  T711] 2 locks held by oom01/5396:
[  510.007802][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.007901][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.008026][  T711] 2 locks held by oom01/5397:
[  510.008099][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.008209][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.008321][  T711] 2 locks held by oom01/5398:
[  510.008380][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.008465][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.008599][  T711] 2 locks held by oom01/5399:
[  510.008673][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.008791][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.008908][  T711] 2 locks held by oom01/5400:
[  510.008977][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.009076][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.009194][  T711] 2 locks held by oom01/5401:
[  510.009265][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.009376][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.009487][  T711] 2 locks held by oom01/5402:
[  510.009546][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.009655][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.009772][  T711] 2 locks held by oom01/5403:
[  510.009843][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.009973][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.010085][  T711] 2 locks held by oom01/5404:
[  510.010152][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.010253][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.010386][  T711] 2 locks held by oom01/5405:
[  510.010448][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.010578][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.010677][  T711] 2 locks held by oom01/5406:
[  510.010736][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.010846][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.010974][  T711] 2 locks held by oom01/5407:
[  510.011051][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.011141][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.011274][  T711] 2 locks held by oom01/5408:
[  510.011314][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.011424][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.011554][  T711] 2 locks held by oom01/5409:
[  510.011617][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.011745][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.011851][  T711] 2 locks held by oom01/5410:
[  510.011902][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.012030][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.012150][  T711] 2 locks held by oom01/5411:
[  510.012225][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.012345][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.012465][  T711] 2 locks held by oom01/5412:
[  510.012505][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.012617][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.012748][  T711] 1 lock held by oom01/5413:
[  510.012817][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.012928][  T711] 2 locks held by oom01/5414:
[  510.012974][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.013088][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.013233][  T711] 2 locks held by oom01/5415:
[  510.013301][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.013422][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.013535][  T711] 2 locks held by oom01/5416:
[  510.013584][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.013700][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.013809][  T711] 2 locks held by oom01/5417:
[  510.013875][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.013987][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.014110][  T711] 2 locks held by oom01/5418:
[  510.014172][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.014271][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.014409][  T711] 2 locks held by oom01/5419:
[  510.014462][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.014591][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.014696][  T711] 2 locks held by oom01/5420:
[  510.014757][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.014880][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.014997][  T711] 2 locks held by oom01/5421:
[  510.015055][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.015189][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.015303][  T711] 2 locks held by oom01/5422:
[  510.015371][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.015514][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.015626][  T711] 1 lock held by oom01/5423:
[  510.015682][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.015808][  T711] 2 locks held by oom01/5424:
[  510.015857][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.015979][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.016073][  T711] 2 locks held by oom01/5425:
[  510.016149][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.016253][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.016388][  T711] 2 locks held by oom01/5426:
[  510.016435][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.016556][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.016674][  T711] 2 locks held by oom01/5427:
[  510.016733][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.016870][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.016983][  T711] 2 locks held by oom01/5428:
[  510.017034][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.017158][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.017266][  T711] 2 locks held by oom01/5429:
[  510.017346][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.017461][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.017586][  T711] 1 lock held by oom01/5430:
[  510.017633][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.017743][  T711] 2 locks held by oom01/5431:
[  510.017779][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.017906][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.018042][  T711] 2 locks held by oom01/5432:
[  510.018102][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.018216][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.018353][  T711] 2 locks held by oom01/5433:
[  510.018393][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.018503][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.018633][  T711] 2 locks held by oom01/5434:
[  510.018695][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.018800][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.018918][  T711] 2 locks held by oom01/5435:
[  510.018956][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.019083][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.019203][  T711] 2 locks held by oom01/5436:
[  510.019282][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.019383][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.019507][  T711] 2 locks held by oom01/5437:
[  510.019552][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.019674][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.019811][  T711] 2 locks held by oom01/5438:
[  510.019874][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.019990][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.020109][  T711] 1 lock held by oom01/5439:
[  510.020158][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.020267][  T711] 2 locks held by oom01/5440:
[  510.020321][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.020442][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.020558][  T711] 2 locks held by oom01/5441:
[  510.020613][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.020728][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.020838][  T711] 1 lock held by oom01/5442:
[  510.020909][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.021043][  T711] 2 locks held by oom01/5443:
[  510.021107][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.021217][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.021321][  T711] 2 locks held by oom01/5444:
[  510.021383][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.021521][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.021632][  T711] 2 locks held by oom01/5445:
[  510.021697][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.021804][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.021905][  T711] 2 locks held by oom01/5446:
[  510.021977][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.022087][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.022225][  T711] 2 locks held by oom01/5447:
[  510.022279][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.022379][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.022504][  T711] 2 locks held by oom01/5448:
[  510.022574][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.022697][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.022825][  T711] 2 locks held by oom01/5449:
[  510.022866][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.022975][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.023085][  T711] 2 locks held by oom01/5450:
[  510.023164][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.023289][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.023408][  T711] 2 locks held by oom01/5451:
[  510.023471][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.023565][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.023696][  T711] 2 locks held by oom01/5452:
[  510.023766][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.023880][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.024007][  T711] 2 locks held by oom01/5453:
[  510.024074][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.024203][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.024284][  T711] 2 locks held by oom01/5454:
[  510.024368][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.024472][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.024593][  T711] 2 locks held by oom01/5455:
[  510.024647][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.024772][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.024877][  T711] 2 locks held by oom01/5456:
[  510.024925][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x188/0xee0
[  510.025034][  T711]  #1: 000000009cc1462e (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[  510.025150][  T711] 1 lock held by oom01/5457:
[  510.025197][  T711]  #0: 00000000cd010082 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6f8/0xee0
[  510.025332][  T711] 
[  510.025352][  T711] =============================================
[  510.025352][  T711] 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap
  2019-07-30 16:25 "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap Qian Cai
@ 2019-07-31  5:34 ` Minchan Kim
  2019-07-31 16:09   ` Qian Cai
  0 siblings, 1 reply; 6+ messages in thread
From: Minchan Kim @ 2019-07-31  5:34 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

On Tue, Jul 30, 2019 at 12:25:28PM -0400, Qian Cai wrote:
> OOM workloads with swapping is unable to recover with linux-next since next-
> 20190729 due to the commit "mm: account nr_isolated_xxx in
> [isolate|putback]_lru_page" breaks OOM with swap" [1]
> 
> [1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kernel.org/
> T/#mdcd03bcb4746f2f23e6f508c205943726aee8355
> 
> For example, LTP oom01 test case is stuck for hours, while it finishes in a few
> minutes here after reverted the above commit. Sometimes, it prints those message
> while hanging.
> 
> [  509.983393][  T711] INFO: task oom01:5331 blocked for more than 122 seconds.
> [  509.983431][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
> [  509.983447][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  509.983477][  T711] oom01           D24656  5331   5157 0x00040000
> [  509.983513][  T711] Call Trace:
> [  509.983538][  T711] [c00020037d00f880] [0000000000000008] 0x8 (unreliable)
> [  509.983583][  T711] [c00020037d00fa60] [c000000000023724]
> __switch_to+0x3a4/0x520
> [  509.983615][  T711] [c00020037d00fad0] [c0000000008d17bc]
> __schedule+0x2fc/0x950
> [  509.983647][  T711] [c00020037d00fba0] [c0000000008d1e68] schedule+0x58/0x150
> [  509.983684][  T711] [c00020037d00fbd0] [c0000000008d7614]
> rwsem_down_read_slowpath+0x4b4/0x630
> [  509.983727][  T711] [c00020037d00fc90] [c0000000008d7dfc]
> down_read+0x12c/0x240
> [  509.983758][  T711] [c00020037d00fd20] [c00000000005fb28]
> __do_page_fault+0x6f8/0xee0
> [  509.983801][  T711] [c00020037d00fe20] [c00000000000a364]
> handle_page_fault+0x18/0x38

Thanks for the testing! No surprise the patch make some bugs because
it's rather tricky.

Could you test this patch?

From b31667210dd747f4d8aeb7bdc1f5c14f1f00bff5 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Wed, 31 Jul 2019 14:18:01 +0900
Subject: [PATCH] mm: decrease NR_ISOALTED count at succesful migration

If migration fails, it should go back to LRU list so putback_lru_page
could handle NR_ISOLATED count in pair with isolate_lru_page. However,
if migration is successful, the page will be freed so no need to
add the page back to LRU list. Thus, NR_ISOLATED count should be done
in manually.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/migrate.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 84b89d2d69065..96ae0c3cada8d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1166,6 +1166,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 {
 	int rc = MIGRATEPAGE_SUCCESS;
 	struct page *newpage;
+	bool is_lru = __PageMovable(page);
 
 	if (!thp_migration_supported() && PageTransHuge(page))
 		return -ENOMEM;
@@ -1175,17 +1176,10 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		return -ENOMEM;
 
 	if (page_count(page) == 1) {
-		bool is_lru = !__PageMovable(page);
-
 		/* page was freed from under us. So we are done. */
 		ClearPageActive(page);
 		ClearPageUnevictable(page);
-		if (likely(is_lru))
-			mod_node_page_state(page_pgdat(page),
-						NR_ISOLATED_ANON +
-						page_is_file_cache(page),
-						-hpage_nr_pages(page));
-		else {
+		if (unlikely(!is_lru)) {
 			lock_page(page);
 			if (!PageMovable(page))
 				__ClearPageIsolated(page);
@@ -1229,6 +1223,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 			if (set_hwpoison_free_buddy_page(page))
 				num_poisoned_pages_inc();
 		}
+
+		if (likely(is_lru))
+			mod_node_page_state(page_pgdat(page),
+					NR_ISOLATED_ANON +
+						page_is_file_cache(page),
+					-hpage_nr_pages(page));
 	} else {
 		if (rc != -EAGAIN) {
 			if (likely(!__PageMovable(page))) {
-- 
2.22.0.709.g102302147b-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap
  2019-07-31  5:34 ` Minchan Kim
@ 2019-07-31 16:09   ` Qian Cai
  2019-07-31 18:18     ` Qian Cai
  0 siblings, 1 reply; 6+ messages in thread
From: Qian Cai @ 2019-07-31 16:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

On Wed, 2019-07-31 at 14:34 +0900, Minchan Kim wrote:
> On Tue, Jul 30, 2019 at 12:25:28PM -0400, Qian Cai wrote:
> > OOM workloads with swapping is unable to recover with linux-next since next-
> > 20190729 due to the commit "mm: account nr_isolated_xxx in
> > [isolate|putback]_lru_page" breaks OOM with swap" [1]
> > 
> > [1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kernel.
> > org/
> > T/#mdcd03bcb4746f2f23e6f508c205943726aee8355
> > 
> > For example, LTP oom01 test case is stuck for hours, while it finishes in a
> > few
> > minutes here after reverted the above commit. Sometimes, it prints those
> > message
> > while hanging.
> > 
> > [  509.983393][  T711] INFO: task oom01:5331 blocked for more than 122
> > seconds.
> > [  509.983431][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
> > [  509.983447][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [  509.983477][  T711] oom01           D24656  5331   5157 0x00040000
> > [  509.983513][  T711] Call Trace:
> > [  509.983538][  T711] [c00020037d00f880] [0000000000000008] 0x8
> > (unreliable)
> > [  509.983583][  T711] [c00020037d00fa60] [c000000000023724]
> > __switch_to+0x3a4/0x520
> > [  509.983615][  T711] [c00020037d00fad0] [c0000000008d17bc]
> > __schedule+0x2fc/0x950
> > [  509.983647][  T711] [c00020037d00fba0] [c0000000008d1e68]
> > schedule+0x58/0x150
> > [  509.983684][  T711] [c00020037d00fbd0] [c0000000008d7614]
> > rwsem_down_read_slowpath+0x4b4/0x630
> > [  509.983727][  T711] [c00020037d00fc90] [c0000000008d7dfc]
> > down_read+0x12c/0x240
> > [  509.983758][  T711] [c00020037d00fd20] [c00000000005fb28]
> > __do_page_fault+0x6f8/0xee0
> > [  509.983801][  T711] [c00020037d00fe20] [c00000000000a364]
> > handle_page_fault+0x18/0x38
> 
> Thanks for the testing! No surprise the patch make some bugs because
> it's rather tricky.
> 
> Could you test this patch?

It does help the situation a bit, but the recover speed is still way slower than
just reverting the commit "mm: account nr_isolated_xxx in
[isolate|putback]_lru_page". For example, on this powerpc system, it used to
take 4-min to finish oom01 while now still take 13-min.

The oom02 (testing NUMA mempolicy) takes even longer and I gave up after 26-min
with several hang tasks below.

[ 7881.086027][  T723]       Tainted: G        W         5.3.0-rc2-next-
20190731+ #4
[ 7881.086045][  T723] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 7881.086064][  T723] oom02           D26080 112911 112776 0x00040000
[ 7881.086100][  T723] Call Trace:
[ 7881.086113][  T723] [c00000185deef880] [0000000000000008] 0x8 (unreliable)
[ 7881.086142][  T723] [c00000185deefa60] [c0000000000236e4]
__switch_to+0x3a4/0x520
[ 7881.086182][  T723] [c00000185deefad0] [c0000000008d045c]
__schedule+0x2fc/0x950
[ 7881.086225][  T723] [c00000185deefba0] [c0000000008d0b08] schedule+0x58/0x150
[ 7881.086279][  T723] [c00000185deefbd0] [c0000000008d6284]
rwsem_down_read_slowpath+0x4b4/0x630
[ 7881.086311][  T723] [c00000185deefc90] [c0000000008d6a6c]
down_read+0x12c/0x240
[ 7881.086340][  T723] [c00000185deefd20] [c00000000005fa34]
__do_page_fault+0x6e4/0xeb0
[ 7881.086406][  T723] [c00000185deefe20] [c00000000000a364]
handle_page_fault+0x18/0x38
[ 7881.086435][  T723] INFO: task oom02:112913 blocked for more than 368
seconds.
[ 7881.086472][  T723]       Tainted: G        W         5.3.0-rc2-next-
20190731+ #4
[ 7881.086509][  T723] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 7881.086551][  T723] oom02           D26832 112913 112776 0x00040000
[ 7881.086583][  T723] Call Trace:
[ 7881.086596][  T723] [c000201c450af890] [0000000000000008] 0x8 (unreliable)
[ 7881.086636][  T723] [c000201c450afa70] [c0000000000236e4]
__switch_to+0x3a4/0x520
[ 7881.086679][  T723] [c000201c450afae0] [c0000000008d045c]
__schedule+0x2fc/0x950
[ 7881.086720][  T723] [c000201c450afbb0] [c0000000008d0b08] schedule+0x58/0x150
[ 7881.086762][  T723] [c000201c450afbe0] [c0000000008d6284]
rwsem_down_read_slowpath+0x4b4/0x630
[ 7881.086818][  T723] [c000201c450afca0] [c0000000008d6a6c]
down_read+0x12c/0x240
[ 7881.086860][  T723] [c000201c450afd30] [c00000000035534c]
__mm_populate+0x12c/0x200
[ 7881.086902][  T723] [c000201c450afda0] [c00000000036a65c] do_mlock+0xec/0x2f0
[ 7881.086955][  T723] [c000201c450afe00] [c00000000036aa24] sys_mlock+0x24/0x40
[ 7881.086987][  T723] [c000201c450afe20] [c00000000000ae08]
system_call+0x5c/0x70
[ 7881.087025][  T723] 
[ 7881.087025][  T723] Showing all locks held in the system:
[ 7881.087065][  T723] 3 locks held by systemd/1:
[ 7881.087111][  T723]  #0: 000000002f8cb0d9 (&ep->mtx){....}, at:
ep_scan_ready_list+0x2a8/0x2d0
[ 7881.087159][  T723]  #1: 000000004e0b13a9 (&mm->mmap_sem){....}, at:
__do_page_fault+0x184/0xeb0
[ 7881.087209][  T723]  #2: 000000006dafe1e3 (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[ 7881.087292][  T723] 1 lock held by khungtaskd/723:
[ 7881.087327][  T723]  #0: 00000000e4addba8 (rcu_read_lock){....}, at:
debug_show_all_locks+0x50/0x170
[ 7881.087388][  T723] 1 lock held by oom02/112907:
[ 7881.087411][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
vm_mmap_pgoff+0x8c/0x160
[ 7881.087487][  T723] 1 lock held by oom02/112908:
[ 7881.087522][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
vm_mmap_pgoff+0x8c/0x160
[ 7881.087566][  T723] 1 lock held by oom02/112909:
[ 7881.087591][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
vm_mmap_pgoff+0x8c/0x160
[ 7881.087627][  T723] 1 lock held by oom02/112910:
[ 7881.087662][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
vm_mmap_pgoff+0x8c/0x160
[ 7881.087707][  T723] 1 lock held by oom02/112911:
[ 7881.087743][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
__do_page_fault+0x6e4/0xeb0
[ 7881.087793][  T723] 1 lock held by oom02/112912:
[ 7881.087827][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
vm_mmap_pgoff+0x8c/0x160
[ 7881.087872][  T723] 1 lock held by oom02/112913:
[ 7881.087897][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
__mm_populate+0x12c/0x200
[ 7881.087943][  T723] 1 lock held by oom02/112914:
[ 7881.087979][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
vm_mmap_pgoff+0x8c/0x160
[ 7881.088037][  T723] 1 lock held by oom02/112915:
[ 7881.088060][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
vm_mmap_pgoff+0x8c/0x160
[ 7881.088095][  T723] 2 locks held by oom02/112916:
[ 7881.088134][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
__mm_populate+0x12c/0x200
[ 7881.088180][  T723]  #1: 000000006dafe1e3 (fs_reclaim){....}, at:
fs_reclaim_acquire.part.17+0x10/0x60
[ 7881.088230][  T723] 1 lock held by oom02/112917:
[ 7881.088257][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
do_mlock+0x88/0x2f0
[ 7881.088291][  T723] 1 lock held by oom02/112918:
[ 7881.088325][  T723]  #0: 000000003463bed2 (&mm->mmap_sem){....}, at:
vm_mmap_pgoff+0x8c/0x160
[ 7881.088370][  T723] 
[ 7881.088391][  T723] =============================================

> 
> From b31667210dd747f4d8aeb7bdc1f5c14f1f00bff5 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Wed, 31 Jul 2019 14:18:01 +0900
> Subject: [PATCH] mm: decrease NR_ISOALTED count at succesful migration
> 
> If migration fails, it should go back to LRU list so putback_lru_page
> could handle NR_ISOLATED count in pair with isolate_lru_page. However,
> if migration is successful, the page will be freed so no need to
> add the page back to LRU list. Thus, NR_ISOLATED count should be done
> in manually.
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/migrate.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 84b89d2d69065..96ae0c3cada8d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1166,6 +1166,7 @@ static ICE_noinline int unmap_and_move(new_page_t
> get_new_page,
>  {
>  	int rc = MIGRATEPAGE_SUCCESS;
>  	struct page *newpage;
> +	bool is_lru = __PageMovable(page);
>  
>  	if (!thp_migration_supported() && PageTransHuge(page))
>  		return -ENOMEM;
> @@ -1175,17 +1176,10 @@ static ICE_noinline int unmap_and_move(new_page_t
> get_new_page,
>  		return -ENOMEM;
>  
>  	if (page_count(page) == 1) {
> -		bool is_lru = !__PageMovable(page);
> -
>  		/* page was freed from under us. So we are done. */
>  		ClearPageActive(page);
>  		ClearPageUnevictable(page);
> -		if (likely(is_lru))
> -			mod_node_page_state(page_pgdat(page),
> -						NR_ISOLATED_ANON +
> -						page_is_file_cache(page),
> -						-hpage_nr_pages(page));
> -		else {
> +		if (unlikely(!is_lru)) {
>  			lock_page(page);
>  			if (!PageMovable(page))
>  				__ClearPageIsolated(page);
> @@ -1229,6 +1223,12 @@ static ICE_noinline int unmap_and_move(new_page_t
> get_new_page,
>  			if (set_hwpoison_free_buddy_page(page))
>  				num_poisoned_pages_inc();
>  		}
> +
> +		if (likely(is_lru))
> +			mod_node_page_state(page_pgdat(page),
> +					NR_ISOLATED_ANON +
> +						page_is_file_cache(page),
> +					-hpage_nr_pages(page));
>  	} else {
>  		if (rc != -EAGAIN) {
>  			if (likely(!__PageMovable(page))) {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap
  2019-07-31 16:09   ` Qian Cai
@ 2019-07-31 18:18     ` Qian Cai
  2019-08-01  6:51       ` Minchan Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Qian Cai @ 2019-07-31 18:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

On Wed, 2019-07-31 at 12:09 -0400, Qian Cai wrote:
> On Wed, 2019-07-31 at 14:34 +0900, Minchan Kim wrote:
> > On Tue, Jul 30, 2019 at 12:25:28PM -0400, Qian Cai wrote:
> > > OOM workloads with swapping is unable to recover with linux-next since
> > > next-
> > > 20190729 due to the commit "mm: account nr_isolated_xxx in
> > > [isolate|putback]_lru_page" breaks OOM with swap" [1]
> > > 
> > > [1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kerne
> > > l.
> > > org/
> > > T/#mdcd03bcb4746f2f23e6f508c205943726aee8355
> > > 
> > > For example, LTP oom01 test case is stuck for hours, while it finishes in
> > > a
> > > few
> > > minutes here after reverted the above commit. Sometimes, it prints those
> > > message
> > > while hanging.
> > > 
> > > [  509.983393][  T711] INFO: task oom01:5331 blocked for more than 122
> > > seconds.
> > > [  509.983431][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
> > > [  509.983447][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables this message.
> > > [  509.983477][  T711] oom01           D24656  5331   5157 0x00040000
> > > [  509.983513][  T711] Call Trace:
> > > [  509.983538][  T711] [c00020037d00f880] [0000000000000008] 0x8
> > > (unreliable)
> > > [  509.983583][  T711] [c00020037d00fa60] [c000000000023724]
> > > __switch_to+0x3a4/0x520
> > > [  509.983615][  T711] [c00020037d00fad0] [c0000000008d17bc]
> > > __schedule+0x2fc/0x950
> > > [  509.983647][  T711] [c00020037d00fba0] [c0000000008d1e68]
> > > schedule+0x58/0x150
> > > [  509.983684][  T711] [c00020037d00fbd0] [c0000000008d7614]
> > > rwsem_down_read_slowpath+0x4b4/0x630
> > > [  509.983727][  T711] [c00020037d00fc90] [c0000000008d7dfc]
> > > down_read+0x12c/0x240
> > > [  509.983758][  T711] [c00020037d00fd20] [c00000000005fb28]
> > > __do_page_fault+0x6f8/0xee0
> > > [  509.983801][  T711] [c00020037d00fe20] [c00000000000a364]
> > > handle_page_fault+0x18/0x38
> > 
> > Thanks for the testing! No surprise the patch make some bugs because
> > it's rather tricky.
> > 
> > Could you test this patch?
> 
> It does help the situation a bit, but the recover speed is still way slower
> than
> just reverting the commit "mm: account nr_isolated_xxx in
> [isolate|putback]_lru_page". For example, on this powerpc system, it used to
> take 4-min to finish oom01 while now still take 13-min.
> 
> The oom02 (testing NUMA mempolicy) takes even longer and I gave up after 26-
> min
> with several hang tasks below.

Also, oom02 is stuck on an x86 machine.

[10327.974285][  T197] INFO: task oom02:29546 can't die for more than 122
seconds.
[10327.981654][  T197] oom02           D22576 29546  29536 0x00004006
[10327.987928][  T197] Call Trace:
[10327.991237][  T197]  __schedule+0x495/0xb50
[10327.995481][  T197]  ? __sched_text_start+0x8/0x8
[10328.000230][  T197]  ? __debug_check_no_obj_freed+0x250/0x250
[10328.006036][  T197]  schedule+0x5d/0x140
[10328.009994][  T197]  schedule_timeout+0x23f/0x380
[10328.014752][  T197]  ? mem_cgroup_uncharge+0x110/0x110
[10328.020103][  T197]  ? usleep_range+0x100/0x100
[10328.024691][  T197]  ? del_timer_sync+0xa0/0xa0
[10328.029257][  T197]  ? shrink_active_list+0x825/0x9d0
[10328.034362][  T197]  ? msleep+0x23/0x70
[10328.038228][  T197]  msleep+0x58/0x70
[10328.042090][  T197]  shrink_inactive_list+0x5cf/0x730
[10328.047197][  T197]  ? move_pages_to_lru+0xc70/0xc70
[10328.052205][  T197]  ? cpumask_next+0x35/0x40
[10328.056611][  T197]  ? lruvec_lru_size+0x12d/0x3a0
[10328.061445][  T197]  ? __kasan_check_read+0x11/0x20
[10328.066530][  T197]  ? inactive_list_is_low+0x2b9/0x410
[10328.071796][  T197]  shrink_node_memcg+0x4ff/0x1560
[10328.076740][  T197]  ? shrink_active_list+0x9d0/0x9d0
[10328.081834][  T197]  ? f_getown+0x70/0x70
[10328.085900][  T197]  ? mem_cgroup_iter+0x135/0x840
[10328.090874][  T197]  ? mem_cgroup_iter+0x18e/0x840
[10328.095726][  T197]  ? __kasan_check_read+0x11/0x20
[10328.100641][  T197]  ? mem_cgroup_protected+0x215/0x260
[10328.105929][  T197]  shrink_node+0x1d3/0xa30
[10328.110233][  T197]  ? shrink_node_memcg+0x1560/0x1560
[10328.115671][  T197]  ? __kasan_check_read+0x11/0x20
[10328.120586][  T197]  do_try_to_free_pages+0x22f/0x820
[10328.125693][  T197]  ? shrink_node+0xa30/0xa30
[10328.130173][  T197]  ? __kasan_check_read+0x11/0x20
[10328.135113][  T197]  ? check_chain_key+0x1df/0x2e0
[10328.139942][  T197]  try_to_free_pages+0x242/0x4d0
[10328.144938][  T197]  ? do_try_to_free_pages+0x820/0x820
[10328.150209][  T197]  __alloc_pages_nodemask+0x9ce/0x1bc0
[10328.155589][  T197]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[10328.160853][  T197]  ? __kasan_check_read+0x11/0x20
[10328.166007][  T197]  ? check_chain_key+0x1df/0x2e0
[10328.170839][  T197]  ? do_anonymous_page+0x33c/0xde0
[10328.175869][  T197]  alloc_pages_vma+0x89/0x2c0
[10328.180439][  T197]  do_anonymous_page+0x3d8/0xde0
[10328.185288][  T197]  ? finish_fault+0x120/0x120
[10328.189857][  T197]  ? alloc_pages_vma+0x9a/0x2c0
[10328.194746][  T197]  handle_pte_fault+0x457/0x12c0
[10328.199577][  T197]  __handle_mm_fault+0x79a/0xa50
[10328.204431][  T197]  ? vmf_insert_mixed_mkwrite+0x20/0x20
[10328.209876][  T197]  ? __kasan_check_read+0x11/0x20
[10328.214816][  T197]  ? __count_memcg_events+0x56/0x1d0
[10328.220201][  T197]  handle_mm_fault+0x17f/0x370
[10328.224881][  T197]  __do_page_fault+0x25b/0x5d0
[10328.229538][  T197]  do_page_fault+0x50/0x2d3
[10328.233957][  T197]  page_fault+0x2c/0x40
[10328.238004][  T197] RIP: 0033:0x410c50
[10328.241951][  T197] Code: Bad RIP value.
[10328.245927][  T197] RSP: 002b:00007f27f0afcec0 EFLAGS: 00010206
[10328.251892][  T197] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
00007f2d34bfd497
[10328.259792][  T197] RDX: 00000000224ed000 RSI: 00000000c0000000 RDI:
0000000000000000
[10328.267845][  T197] RBP: 00007f266fafc000 R08: 00000000ffffffff R09:
0000000000000000
[10328.275752][  T197] R10: 0000000000000022 R11: 0000000000000246 R12:
0000000000000001
[10328.283635][  T197] R13: 00007fff5d124f9f R14: 0000000000000000 R15:
00007f27f0afcfc0
[10328.291696][  T197] INFO: task oom02:29554 can't die for more than 123
seconds.
[10328.299088][  T197] oom02           D22576 29554  29536 0x00004006
[10328.305348][  T197] Call Trace:
[10328.308519][  T197]  __schedule+0x495/0xb50
[10328.312737][  T197]  ? __sched_text_start+0x8/0x8
[10328.317706][  T197]  ? __debug_check_no_obj_freed+0x250/0x250
[10328.323497][  T197]  schedule+0x5d/0x140
[10328.327475][  T197]  schedule_timeout+0x23f/0x380
[10328.332217][  T197]  ? mem_cgroup_uncharge+0x110/0x110
[10328.337421][  T197]  ? usleep_range+0x100/0x100
[10328.342184][  T197]  ? del_timer_sync+0xa0/0xa0
[10328.346778][  T197]  ? shrink_active_list+0x825/0x9d0
[10328.351874][  T197]  ? msleep+0x23/0x70
[10328.355766][  T197]  msleep+0x58/0x70
[10328.359460][  T197]  shrink_inactive_list+0x5cf/0x730
[10328.364576][  T197]  ? move_pages_to_lru+0xc70/0xc70
[10328.369748][  T197]  ? cpumask_next+0x35/0x40
[10328.374158][  T197]  ? lruvec_lru_size+0x12d/0x3a0
[10328.378986][  T197]  ? __kasan_check_read+0x11/0x20
[10328.383927][  T197]  ? inactive_list_is_low+0x2b9/0x410
[10328.389195][  T197]  shrink_node_memcg+0x4ff/0x1560
[10328.394309][  T197]  ? shrink_active_list+0x9d0/0x9d0
[10328.399400][  T197]  ? f_getown+0x70/0x70
[10328.403445][  T197]  ? mem_cgroup_iter+0x135/0x840
[10328.408298][  T197]  ? mem_cgroup_iter+0x18e/0x840
[10328.413127][  T197]  ? __kasan_check_read+0x11/0x20
[10328.418306][  T197]  ? mem_cgroup_protected+0x215/0x260
[10328.423572][  T197]  shrink_node+0x1d3/0xa30
[10328.427899][  T197]  ? shrink_node_memcg+0x1560/0x1560
[10328.433080][  T197]  ? __kasan_check_read+0x11/0x20
[10328.438019][  T197]  do_try_to_free_pages+0x22f/0x820
[10328.443233][  T197]  ? shrink_node+0xa30/0xa30
[10328.447739][  T197]  ? __kasan_check_read+0x11/0x20
[10328.452655][  T197]  ? check_chain_key+0x1df/0x2e0
[10328.457507][  T197]  try_to_free_pages+0x242/0x4d0
[10328.462334][  T197]  ? do_try_to_free_pages+0x820/0x820
[10328.467848][  T197]  __alloc_pages_nodemask+0x9ce/0x1bc0
[10328.473205][  T197]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[10328.478494][  T197]  ? __kasan_check_read+0x11/0x20
[10328.483410][  T197]  ? check_chain_key+0x1df/0x2e0
[10328.488266][  T197]  ? do_anonymous_page+0x33c/0xde0
[10328.493409][  T197]  alloc_pages_vma+0x89/0x2c0
[10328.498004][  T197]  do_anonymous_page+0x3d8/0xde0
[10328.502834][  T197]  ? finish_fault+0x120/0x120
[10328.507424][  T197]  ? alloc_pages_vma+0x9a/0x2c0
[10328.512167][  T197]  handle_pte_fault+0x457/0x12c0
[10328.517261][  T197]  __handle_mm_fault+0x79a/0xa50
[10328.522093][  T197]  ? vmf_insert_mixed_mkwrite+0x20/0x20
[10328.527556][  T197]  ? __kasan_check_read+0x11/0x20
[10328.532473][  T197]  ? __count_memcg_events+0x56/0x1d0
[10328.537678][  T197]  handle_mm_fault+0x17f/0x370
[10328.542484][  T197]  __do_page_fault+0x25b/0x5d0
[10328.547164][  T197]  do_page_fault+0x50/0x2d3
[10328.551557][  T197]  page_fault+0x2c/0x40
[10328.555624][  T197] RIP: 0033:0x410c50
[10328.559405][  T197] Code: Bad RIP value.
[10328.563358][  T197] RSP: 002b:00007f21ecaf4ec0 EFLAGS: 00010206
[10328.569438][  T197] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
00007f2d34bfd497
[10328.577349][  T197] RDX: 000000001aeb4000 RSI: 00000000c0000000 RDI:
0000000000000000
[10328.585253][  T197] RBP: 00007f206baf4000 R08: 00000000ffffffff R09:
0000000000000000
[10328.593292][  T197] R10: 0000000000000022 R11: 0000000000000246 R12:
0000000000000001
[10328.601201][  T197] R13: 00007fff5d124f9f R14: 0000000000000000 R15:
00007f21ecaf4fc0
[10328.609120][  T197] 
[10328.609120][  T197] Showing all locks held in the system:
[10328.617052][  T197] 1 lock held by khungtaskd/197:
[10328.621878][  T197]  #0: 000000002d9f974d (rcu_read_lock){....}, at:
debug_show_all_locks+0x33/0x165
[10328.631211][  T197] 2 locks held by oom02/29546:
[10328.635888][  T197]  #0: 0000000031e5d1a8 (&mm->mmap_sem#2){....}, at:
__do_page_fault+0x166/0x5d0
[10328.645093][  T197]  #1: 00000000e060a0f6 (fs_reclaim){....}, at:
fs_reclaim_acquire.part.15+0x5/0x30
[10328.654418][  T197] 2 locks held by oom02/29554:
[10328.659070][  T197]  #0: 0000000031e5d1a8 (&mm->mmap_sem#2){....}, at:
__do_page_fault+0x166/0x5d0
[10328.668286][  T197]  #1: 00000000e060a0f6 (fs_reclaim){....}, at:
fs_reclaim_acquire.part.15+0x5/0x30
[10328.677608][  T197] 
[10328.679812][  T197] =============================================
[10328.679812][  T197] 
[10450.864064][  T197] INFO: task oom02:29546 can't die for more than 245
seconds.
[10450.871642][  T197] oom02           D22576 29546  29536 0x00004006
[10450.877912][  T197] Call Trace:
[10450.881087][  T197]  __schedule+0x495/0xb50
[10450.885330][  T197]  ? __sched_text_start+0x8/0x8
[10450.890072][  T197]  ? __debug_check_no_obj_freed+0x250/0x250
[10450.896031][  T197]  schedule+0x5d/0x140
[10450.899989][  T197]  schedule_timeout+0x23f/0x380
[10450.904753][  T197]  ? mem_cgroup_uncharge+0x110/0x110
[10450.909936][  T197]  ? usleep_range+0x100/0x100
[10450.914526][  T197]  ? del_timer_sync+0xa0/0xa0
[10450.919314][  T197]  ? shrink_active_list+0x825/0x9d0
[10450.924428][  T197]  ? msleep+0x23/0x70
[10450.928296][  T197]  msleep+0x58/0x70
[10450.931991][  T197]  shrink_inactive_list+0x5cf/0x730
[10450.937103][  T197]  ? move_pages_to_lru+0xc70/0xc70
[10450.942254][  T197]  ? cpumask_next+0x35/0x40
[10450.946678][  T197]  ? lruvec_lru_size+0x12d/0x3a0
[10450.951512][  T197]  ? __kasan_check_read+0x11/0x20
[10450.956444][  T197]  ? inactive_list_is_low+0x2b9/0x410
[10450.961711][  T197]  shrink_node_memcg+0x4ff/0x1560
[10450.966650][  T197]  ? shrink_active_list+0x9d0/0x9d0
[10450.971929][  T197]  ? f_getown+0x70/0x70
[10450.975988][  T197]  ? mem_cgroup_iter+0x135/0x840
[10450.980821][  T197]  ? mem_cgroup_iter+0x18e/0x840
[10450.985672][  T197]  ? __kasan_check_read+0x11/0x20
[10450.990591][  T197]  ? mem_cgroup_protected+0x215/0x260
[10450.996050][  T197]  shrink_node+0x1d3/0xa30
[10451.000361][  T197]  ? shrink_node_memcg+0x1560/0x1560
[10451.005561][  T197]  ? __kasan_check_read+0x11/0x20
[10451.010477][  T197]  do_try_to_free_pages+0x22f/0x820
[10451.015589][  T197]  ? shrink_node+0xa30/0xa30
[10451.020293][  T197]  ? __kasan_check_read+0x11/0x20
[10451.025232][  T197]  ? check_chain_key+0x1df/0x2e0
[10451.030059][  T197]  try_to_free_pages+0x242/0x4d0
[10451.034910][  T197]  ? do_try_to_free_pages+0x820/0x820
[10451.040180][  T197]  __alloc_pages_nodemask+0x9ce/0x1bc0
[10451.045732][  T197]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[10451.050999][  T197]  ? __kasan_check_read+0x11/0x20
[10451.055936][  T197]  ? check_chain_key+0x1df/0x2e0
[10451.060767][  T197]  ? do_anonymous_page+0x33c/0xde0
[10451.065796][  T197]  alloc_pages_vma+0x89/0x2c0
[10451.070521][  T197]  do_anonymous_page+0x3d8/0xde0
[10451.075372][  T197]  ? finish_fault+0x120/0x120
[10451.079941][  T197]  ? alloc_pages_vma+0x9a/0x2c0
[10451.084703][  T197]  handle_pte_fault+0x457/0x12c0
[10451.089536][  T197]  __handle_mm_fault+0x79a/0xa50
[10451.094557][  T197]  ? vmf_insert_mixed_mkwrite+0x20/0x20
[10451.100001][  T197]  ? __kasan_check_read+0x11/0x20
[10451.104938][  T197]  ? __count_memcg_events+0x56/0x1d0
[10451.110118][  T197]  handle_mm_fault+0x17f/0x370
[10451.114789][  T197]  __do_page_fault+0x25b/0x5d0
[10451.119661][  T197]  do_page_fault+0x50/0x2d3
[10451.124077][  T197]  page_fault+0x2c/0x40
[10451.128118][  T197] RIP: 0033:0x410c50
[10451.131901][  T197] Code: Bad RIP value.
[10451.135871][  T197] RSP: 002b:00007f27f0afcec0 EFLAGS: 00010206
[10451.141979][  T197] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
00007f2d34bfd497
[10451.149881][  T197] RDX: 00000000224ed000 RSI: 00000000c0000000 RDI:
0000000000000000
[10451.157786][  T197] RBP: 00007f266fafc000 R08: 00000000ffffffff R09:
0000000000000000
[10451.165694][  T197] R10: 0000000000000022 R11: 0000000000000246 R12:
0000000000000001
[10451.173741][  T197] R13: 00007fff5d124f9f R14: 0000000000000000 R15:
00007f27f0afcfc0
[10451.181656][  T197] 
[10451.181656][  T197] Showing all locks held in the system:
[10451.189350][  T197] 1 lock held by khungtaskd/197:
[10451.194369][  T197]  #0: 000000002d9f974d (rcu_read_lock){....}, at:
debug_show_all_locks+0x33/0x165
[10451.203670][  T197] 2 locks held by oom02/29546:
[10451.208344][  T197]  #0: 0000000031e5d1a8 (&mm->mmap_sem#2){....}, at:
__do_page_fault+0x166/0x5d0
[10451.217583][  T197]  #1: 00000000e060a0f6 (fs_reclaim){....}, at:
fs_reclaim_acquire.part.15+0x5/0x30
[10451.226908][  T197] 
[10451.229112][  T197] =============================================
[10451.229112][  T197] 
[10758.054022][T29393] kworker/dying (29393) used greatest stack depth: 16928
bytes left

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap
  2019-07-31 18:18     ` Qian Cai
@ 2019-08-01  6:51       ` Minchan Kim
  2019-08-01 11:46         ` Qian Cai
  0 siblings, 1 reply; 6+ messages in thread
From: Minchan Kim @ 2019-08-01  6:51 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

On Wed, Jul 31, 2019 at 02:18:00PM -0400, Qian Cai wrote:
> On Wed, 2019-07-31 at 12:09 -0400, Qian Cai wrote:
> > On Wed, 2019-07-31 at 14:34 +0900, Minchan Kim wrote:
> > > On Tue, Jul 30, 2019 at 12:25:28PM -0400, Qian Cai wrote:
> > > > OOM workloads with swapping is unable to recover with linux-next since
> > > > next-
> > > > 20190729 due to the commit "mm: account nr_isolated_xxx in
> > > > [isolate|putback]_lru_page" breaks OOM with swap" [1]
> > > > 
> > > > [1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kerne
> > > > l.
> > > > org/
> > > > T/#mdcd03bcb4746f2f23e6f508c205943726aee8355
> > > > 
> > > > For example, LTP oom01 test case is stuck for hours, while it finishes in
> > > > a
> > > > few
> > > > minutes here after reverted the above commit. Sometimes, it prints those
> > > > message
> > > > while hanging.
> > > > 
> > > > [  509.983393][  T711] INFO: task oom01:5331 blocked for more than 122
> > > > seconds.
> > > > [  509.983431][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
> > > > [  509.983447][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > > disables this message.
> > > > [  509.983477][  T711] oom01           D24656  5331   5157 0x00040000
> > > > [  509.983513][  T711] Call Trace:
> > > > [  509.983538][  T711] [c00020037d00f880] [0000000000000008] 0x8
> > > > (unreliable)
> > > > [  509.983583][  T711] [c00020037d00fa60] [c000000000023724]
> > > > __switch_to+0x3a4/0x520
> > > > [  509.983615][  T711] [c00020037d00fad0] [c0000000008d17bc]
> > > > __schedule+0x2fc/0x950
> > > > [  509.983647][  T711] [c00020037d00fba0] [c0000000008d1e68]
> > > > schedule+0x58/0x150
> > > > [  509.983684][  T711] [c00020037d00fbd0] [c0000000008d7614]
> > > > rwsem_down_read_slowpath+0x4b4/0x630
> > > > [  509.983727][  T711] [c00020037d00fc90] [c0000000008d7dfc]
> > > > down_read+0x12c/0x240
> > > > [  509.983758][  T711] [c00020037d00fd20] [c00000000005fb28]
> > > > __do_page_fault+0x6f8/0xee0
> > > > [  509.983801][  T711] [c00020037d00fe20] [c00000000000a364]
> > > > handle_page_fault+0x18/0x38
> > > 
> > > Thanks for the testing! No surprise the patch make some bugs because
> > > it's rather tricky.
> > > 
> > > Could you test this patch?
> > 
> > It does help the situation a bit, but the recover speed is still way slower
> > than
> > just reverting the commit "mm: account nr_isolated_xxx in
> > [isolate|putback]_lru_page". For example, on this powerpc system, it used to
> > take 4-min to finish oom01 while now still take 13-min.
> > 
> > The oom02 (testing NUMA mempolicy) takes even longer and I gave up after 26-
> > min
> > with several hang tasks below.
> 
> Also, oom02 is stuck on an x86 machine.

Yeb, above my patch had a bug to test page type after page was freed.
However, after the review, I found other bugs but I don't think it's
related to your problem, either. Okay, then, let's revert the patch.

Andrew, could you revert the below patch?
"mm: account nr_isolated_xxx in [isolate|putback]_lru_page"

It's just clean up patch and isn't related to new madvise hint system call now.
Thus, it shouldn't be blocker.

Anyway, I want to fix the problem when I have available time.
Qian, What's the your config and system configuration on x86?
Is it possible to reproduce in qemu?
It would be really helpful if you tell me reproduce step on x86.

Thanks.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap
  2019-08-01  6:51       ` Minchan Kim
@ 2019-08-01 11:46         ` Qian Cai
  0 siblings, 0 replies; 6+ messages in thread
From: Qian Cai @ 2019-08-01 11:46 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel



> On Aug 1, 2019, at 2:51 AM, Minchan Kim <minchan@kernel.org> wrote:
> 
> On Wed, Jul 31, 2019 at 02:18:00PM -0400, Qian Cai wrote:
>> On Wed, 2019-07-31 at 12:09 -0400, Qian Cai wrote:
>>> On Wed, 2019-07-31 at 14:34 +0900, Minchan Kim wrote:
>>>> On Tue, Jul 30, 2019 at 12:25:28PM -0400, Qian Cai wrote:
>>>>> OOM workloads with swapping is unable to recover with linux-next since
>>>>> next-
>>>>> 20190729 due to the commit "mm: account nr_isolated_xxx in
>>>>> [isolate|putback]_lru_page" breaks OOM with swap" [1]
>>>>> 
>>>>> [1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kerne
>>>>> l.
>>>>> org/
>>>>> T/#mdcd03bcb4746f2f23e6f508c205943726aee8355
>>>>> 
>>>>> For example, LTP oom01 test case is stuck for hours, while it finishes in
>>>>> a
>>>>> few
>>>>> minutes here after reverted the above commit. Sometimes, it prints those
>>>>> message
>>>>> while hanging.
>>>>> 
>>>>> [  509.983393][  T711] INFO: task oom01:5331 blocked for more than 122
>>>>> seconds.
>>>>> [  509.983431][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
>>>>> [  509.983447][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [  509.983477][  T711] oom01           D24656  5331   5157 0x00040000
>>>>> [  509.983513][  T711] Call Trace:
>>>>> [  509.983538][  T711] [c00020037d00f880] [0000000000000008] 0x8
>>>>> (unreliable)
>>>>> [  509.983583][  T711] [c00020037d00fa60] [c000000000023724]
>>>>> __switch_to+0x3a4/0x520
>>>>> [  509.983615][  T711] [c00020037d00fad0] [c0000000008d17bc]
>>>>> __schedule+0x2fc/0x950
>>>>> [  509.983647][  T711] [c00020037d00fba0] [c0000000008d1e68]
>>>>> schedule+0x58/0x150
>>>>> [  509.983684][  T711] [c00020037d00fbd0] [c0000000008d7614]
>>>>> rwsem_down_read_slowpath+0x4b4/0x630
>>>>> [  509.983727][  T711] [c00020037d00fc90] [c0000000008d7dfc]
>>>>> down_read+0x12c/0x240
>>>>> [  509.983758][  T711] [c00020037d00fd20] [c00000000005fb28]
>>>>> __do_page_fault+0x6f8/0xee0
>>>>> [  509.983801][  T711] [c00020037d00fe20] [c00000000000a364]
>>>>> handle_page_fault+0x18/0x38
>>>> 
>>>> Thanks for the testing! No surprise the patch make some bugs because
>>>> it's rather tricky.
>>>> 
>>>> Could you test this patch?
>>> 
>>> It does help the situation a bit, but the recover speed is still way slower
>>> than
>>> just reverting the commit "mm: account nr_isolated_xxx in
>>> [isolate|putback]_lru_page". For example, on this powerpc system, it used to
>>> take 4-min to finish oom01 while now still take 13-min.
>>> 
>>> The oom02 (testing NUMA mempolicy) takes even longer and I gave up after 26-
>>> min
>>> with several hang tasks below.
>> 
>> Also, oom02 is stuck on an x86 machine.
> 
> Yeb, above my patch had a bug to test page type after page was freed.
> However, after the review, I found other bugs but I don't think it's
> related to your problem, either. Okay, then, let's revert the patch.
> 
> Andrew, could you revert the below patch?
> "mm: account nr_isolated_xxx in [isolate|putback]_lru_page"
> 
> It's just clean up patch and isn't related to new madvise hint system call now.
> Thus, it shouldn't be blocker.
> 
> Anyway, I want to fix the problem when I have available time.
> Qian, What's the your config and system configuration on x86?
> Is it possible to reproduce in qemu?
> It would be really helpful if you tell me reproduce step on x86.

https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

The config could work in Openstack, and I never tried in QEMU. It might need
a few modification here or there. The reproduced x86 server is,

HPE ProLiant DL385 Gen10
AMD EPYC 7251 8-Core Processor
Smart Storage PQI 12G SAS/PCIe 3
Memory: 32768 MB
NUMA Nodes: 8

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-01 11:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 16:25 "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap Qian Cai
2019-07-31  5:34 ` Minchan Kim
2019-07-31 16:09   ` Qian Cai
2019-07-31 18:18     ` Qian Cai
2019-08-01  6:51       ` Minchan Kim
2019-08-01 11:46         ` Qian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).