All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Eric Wheeler <linux-mm@lists.ewheeler.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, "Kirill A. Shutemov" <kirill@shutemov.name>,
	Minchan Kim <minchan@kernel.org>, Tejun Heo <tj@kernel.org>,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>,
	Kent Overstreet <kent.overstreet@gmail.com>
Subject: Re: Possible deadlock in v4.14.15 contention on shrinker_rwsem in shrink_slab()
Date: Mon, 29 Jan 2018 14:27:54 +0900	[thread overview]
Message-ID: <201801290527.w0T5RsPg024008@www262.sakura.ne.jp> (raw)
In-Reply-To: <alpine.LRH.2.11.1801272305200.20457@mail.ewheeler.net>

Eric Wheeler wrote:
> I just tried v4.9.78 and we still get the deadlock. I've backported your 
> MemAlloc timing patch and yout timing is included in the output.  Both 
> full sysrq traces (30 seconds apart) are available here and I made sure it 
> includes both "Showing busy workqueues and worker pools" sections:
> 
>   https://www.linuxglobal.com/static/2018-01-27-hv1-deadlock-v4.9.78
> 
> # ps -eo pid,lstart,cmd,stat |grep D
>   PID                  STARTED CMD                         STAT
> 16127 Sat Jan 27 05:24:29 2018 crm_node -N 2               D    << Both in D state
> 22444 Sat Jan 27 05:39:50 2018 rsync --server --sender -vl DNs  << Both in D state
> 

Thank you. Although the capture is still incomplete, I noticed that
there was a surprising entry.

crm_node was stalling at page fault for 33950 seconds!
That is, crm_node already started stalling at uptime = 45375.

----------
[79325.124062] MemAlloc: crm_node(16127) flags=0xc00900 switches=10 seq=441 gfp=0x24200ca(GFP_HIGHUSER_MOVABLE) order=0 delay=33950065 uninterruptible
[79325.125113]  ffff8cf6d750dd00 0000000000000000 ffff8cf74b9d5800 ffff8cf76fd19940
[79325.125684]  ffff8cf6c3d84200 ffffada2a17af6c8 ffffffff8e7178f5 0000000000000000
[79325.126245]  0000000000000000 0000000000000000 ffff8cf6c3d84200 7fffffffffffffff
[79325.126800] Call Trace:
[79325.127339]  [<ffffffff8e7178f5>] ? __schedule+0x195/0x630
[79325.127985]  [<ffffffff8e718630>] ? bit_wait+0x50/0x50
[79325.128519]  [<ffffffff8e717dc6>] schedule+0x36/0x80
[79325.129040]  [<ffffffff8e71afa6>] schedule_timeout+0x1e6/0x320
[79325.129560]  [<ffffffff8e718630>] ? bit_wait+0x50/0x50
[79325.130100]  [<ffffffff8e7176f6>] io_schedule_timeout+0xa6/0x110
[79325.130683]  [<ffffffff8e71864b>] bit_wait_io+0x1b/0x60
[79325.131199]  [<ffffffff8e718286>] __wait_on_bit_lock+0x86/0xd0
[79325.131723]  [<ffffffff8e1acb22>] __lock_page+0x82/0xb0
[79325.132238]  [<ffffffff8e0e9e00>] ? autoremove_wake_function+0x40/0x40
[79325.132765]  [<ffffffff8e1ae63b>] pagecache_get_page+0x16b/0x230
[79325.133284]  [<ffffffff8e1ca3da>] shmem_unused_huge_shrink+0x28a/0x330
[79325.133814]  [<ffffffff8e1ca4a7>] shmem_unused_huge_scan+0x27/0x30
[79325.134338]  [<ffffffff8e23f941>] super_cache_scan+0x181/0x190
[79325.134875]  [<ffffffff8e1c1ab1>] shrink_slab+0x261/0x470
[79325.135386]  [<ffffffff8e1c6588>] shrink_node+0x108/0x310
[79325.135904]  [<ffffffff8e1c6927>] node_reclaim+0x197/0x210
[79325.136458]  [<ffffffff8e1b5dd8>] get_page_from_freelist+0x168/0x9f0
[79325.137023]  [<ffffffff8e1adc8e>] ? find_get_entry+0x1e/0x100
[79325.137560]  [<ffffffff8e1ca9c5>] ? shmem_getpage_gfp+0xf5/0xbb0
[79325.138061]  [<ffffffff8e1b77ae>] __alloc_pages_nodemask+0x10e/0x2d0
[79325.138619]  [<ffffffff8e207d08>] alloc_pages_current+0x88/0x120
[79325.139117]  [<ffffffff8e070287>] pte_alloc_one+0x17/0x40
[79325.139626]  [<ffffffff8e1e117e>] __pte_alloc+0x1e/0x100
[79325.140138]  [<ffffffff8e1e3622>] alloc_set_pte+0x4f2/0x560
[79325.140698]  [<ffffffff8e1e3770>] do_fault+0xe0/0x620
[79325.141168]  [<ffffffff8e1e5504>] handle_mm_fault+0x644/0xdd0
[79325.141667]  [<ffffffff8e06a96e>] __do_page_fault+0x25e/0x4f0
[79325.142163]  [<ffffffff8e06ac30>] do_page_fault+0x30/0x80
[79325.142660]  [<ffffffff8e003b55>] ? do_syscall_64+0x175/0x180
[79325.143132]  [<ffffffff8e71dae8>] page_fault+0x28/0x30
----------

You took SysRq-t after multiple processes (in this case, crm_node and rsync)
got stuck, didn't you? I feel anxious about LIST_POISON1 (dead000000000100) and
LIST_POISON2 (dead000000000200) are showing up at rsync side.

----------
[79325.156361] rsync           D    0 22444  22441 0x00000080
[79325.156871]  ffff8cf3b8437440 0000000000000000 ffff8cf74b9d1600 ffff8cf76fc59940
[79325.157307]  ffff8cf3eef74200 ffffada2a0e17b18 ffffffff8e7178f5 dead000000000100
[79325.157764]  dead000000000200 ffff8cf6db9101a8 ffff8cf3eef74200 7fffffffffffffff
[79325.158225] Call Trace:
[79325.158690]  [<ffffffff8e7178f5>] ? __schedule+0x195/0x630
[79325.159142]  [<ffffffff8e718630>] ? bit_wait+0x50/0x50
[79325.159604]  [<ffffffff8e717dc6>] schedule+0x36/0x80
[79325.160053]  [<ffffffff8e71afa6>] schedule_timeout+0x1e6/0x320
[79325.160536]  [<ffffffff8e0de21c>] ? enqueue_entity+0x3bc/0x570
[79325.160984]  [<ffffffff8e11bf5b>] ? ktime_get+0x3b/0xb0
[79325.161450]  [<ffffffff8e718630>] ? bit_wait+0x50/0x50
[79325.161891]  [<ffffffff8e7176f6>] io_schedule_timeout+0xa6/0x110
[79325.162337]  [<ffffffff8e71864b>] bit_wait_io+0x1b/0x60
[79325.162787]  [<ffffffff8e718286>] __wait_on_bit_lock+0x86/0xd0
[79325.163231]  [<ffffffff8e1acb22>] __lock_page+0x82/0xb0
[79325.163681]  [<ffffffff8e0e9e00>] ? autoremove_wake_function+0x40/0x40
[79325.164133]  [<ffffffff8e1addd1>] find_lock_entry+0x61/0x80
[79325.164597]  [<ffffffff8e1ca9c5>] shmem_getpage_gfp+0xf5/0xbb0
[79325.165053]  [<ffffffff8e1cb9b9>] shmem_file_read_iter+0x159/0x310
[79325.165516]  [<ffffffff8e23b2ff>] __vfs_read+0xdf/0x130
[79325.165966]  [<ffffffff8e23ba2c>] vfs_read+0x8c/0x130
[79325.166430]  [<ffffffff8e23cf95>] SyS_read+0x55/0xc0
[79325.166865]  [<ffffffff8e003a47>] do_syscall_64+0x67/0x180
[79325.167292]  [<ffffffff8e71c530>] entry_SYSCALL64_slow_path+0x25/0x25
----------

> 
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> crm_node        D    0 16127  16126 0x00000080
> MemAlloc: crm_node(16127) flags=0xc00900 switches=10 seq=441 gfp=0x24200ca(GFP_HIGHUSER_MOVABLE) order=0 delay=209469 uninterruptible

At this point, crm_node already stalled for 209 seconds. Since switches= and
seq= did not change until you took SysRq-t, crm_node kept sleeping.

> ffff8cf6d750dd00 0000000000000000 ffff8cf74b9d5800 ffff8cf76fd19940
> ffff8cf6c3d84200 ffffada2a17af6c8 ffffffff8e7178f5 0000000000000000
> 0000000000000000 0000000000000000 ffff8cf6c3d84200 7fffffffffffffff
> Call Trace:
> [<ffffffff8e7178f5>] ? __schedule+0x195/0x630
> [<ffffffff8e718630>] ? bit_wait+0x50/0x50
> [<ffffffff8e717dc6>] schedule+0x36/0x80
> [<ffffffff8e71afa6>] schedule_timeout+0x1e6/0x320
> [<ffffffff8e718630>] ? bit_wait+0x50/0x50
> [<ffffffff8e7176f6>] io_schedule_timeout+0xa6/0x110
> [<ffffffff8e71864b>] bit_wait_io+0x1b/0x60
> [<ffffffff8e718286>] __wait_on_bit_lock+0x86/0xd0
> [<ffffffff8e1acb22>] __lock_page+0x82/0xb0
> [<ffffffff8e0e9e00>] ? autoremove_wake_function+0x40/0x40
> [<ffffffff8e1ae63b>] pagecache_get_page+0x16b/0x230
> [<ffffffff8e1ca3da>] shmem_unused_huge_shrink+0x28a/0x330
> [<ffffffff8e1ca4a7>] shmem_unused_huge_scan+0x27/0x30
> [<ffffffff8e23f941>] super_cache_scan+0x181/0x190
> [<ffffffff8e1c1ab1>] shrink_slab+0x261/0x470
> [<ffffffff8e1c6588>] shrink_node+0x108/0x310
> [<ffffffff8e1c6927>] node_reclaim+0x197/0x210
> [<ffffffff8e1b5dd8>] get_page_from_freelist+0x168/0x9f0
> [<ffffffff8e1adc8e>] ? find_get_entry+0x1e/0x100
> [<ffffffff8e1ca9c5>] ? shmem_getpage_gfp+0xf5/0xbb0
> [<ffffffff8e1b77ae>] __alloc_pages_nodemask+0x10e/0x2d0
> [<ffffffff8e207d08>] alloc_pages_current+0x88/0x120
> [<ffffffff8e070287>] pte_alloc_one+0x17/0x40
> [<ffffffff8e1e117e>] __pte_alloc+0x1e/0x100
> [<ffffffff8e1e3622>] alloc_set_pte+0x4f2/0x560
> [<ffffffff8e1e3770>] do_fault+0xe0/0x620
> [<ffffffff8e1e5504>] handle_mm_fault+0x644/0xdd0
> [<ffffffff8e06a96e>] __do_page_fault+0x25e/0x4f0
> [<ffffffff8e06ac30>] do_page_fault+0x30/0x80
> [<ffffffff8e003b55>] ? do_syscall_64+0x175/0x180
> [<ffffffff8e71dae8>] page_fault+0x28/0x30
> 
> Pid 22444 didn't show up in the hung_task warning like crm_node did,
> but its /proc/pid/stack looks like so:
> 
> ~]# cat /proc/22444/stack
> [<ffffffff8e1acb22>] __lock_page+0x82/0xb0
> [<ffffffff8e1addd1>] find_lock_entry+0x61/0x80
> [<ffffffff8e1ca9c5>] shmem_getpage_gfp+0xf5/0xbb0
> [<ffffffff8e1cb9b9>] shmem_file_read_iter+0x159/0x310
> [<ffffffff8e23b2ff>] __vfs_read+0xdf/0x130
> [<ffffffff8e23ba2c>] vfs_read+0x8c/0x130
> [<ffffffff8e23cf95>] SyS_read+0x55/0xc0
> [<ffffffff8e003a47>] do_syscall_64+0x67/0x180
> [<ffffffff8e71c530>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff

I guess that /proc/sys/kernel/hung_task_warnings already became 0.
Since the default value of /proc/sys/kernel/hung_task_warnings is 10,
only first 10 warnings are shown. You can set -1 to
/proc/sys/kernel/hung_task_warnings so that all hung_task warnings
are shown.

> 
> 
> ==================== 4.14.15 ====================
> 
> On the other server running 4.14.15 the stacks look the same as you've 
> seen before.  Both full sysrq traces (30 seconds apart) are available here 
> and I made sure it includes both "Showing busy workqueues and worker 
> pools" sections:
> 
> 	https://www.linuxglobal.com/static/2018-01-28-hv2-deadlock-v4.14.15
> 
> ~]# ps -eo pid,lstart,cmd,stat |grep D
>   PID                  STARTED CMD                         STAT
> 27163 Sat Jan 27 05:15:48 2018 crm_node -N 2               D
>  1125 Sat Jan 27 14:34:40 2018 /usr/sbin/libvirtd          D
> 

crm_node was stalling at page fault for 119634 seconds (more than one day)!

----------
crm_node        D    0 27163      1 0x00000084
MemAlloc: crm_node(27163) flags=0xc00900 switches=3 seq=438 gfp=0x14200ca(GFP_HIGHUSER_MOVABLE) order=0 delay=119634513 uninterruptible
Call Trace:
 ? __schedule+0x1dc/0x770
 schedule+0x32/0x80
 io_schedule+0x12/0x40
 __lock_page+0x105/0x150
 ? page_cache_tree_insert+0xb0/0xb0
 pagecache_get_page+0x161/0x210
 shmem_unused_huge_shrink+0x334/0x3f0
 super_cache_scan+0x176/0x180
 shrink_slab+0x275/0x460
 shrink_node+0x10e/0x320
 node_reclaim+0x19d/0x250
 get_page_from_freelist+0x16a/0xac0
 ? radix_tree_lookup_slot+0x1e/0x50
 ? find_lock_entry+0x45/0x80
 ? shmem_getpage_gfp.isra.34+0xe5/0xc80
 __alloc_pages_nodemask+0x111/0x2c0
 pte_alloc_one+0x13/0x40
 __pte_alloc+0x19/0x100
 alloc_set_pte+0x468/0x4c0
 finish_fault+0x3a/0x70
 __handle_mm_fault+0x94a/0x1190
 handle_mm_fault+0xf2/0x210
 __do_page_fault+0x253/0x4d0
 do_page_fault+0x33/0x120
 ? page_fault+0x36/0x60
 page_fault+0x4c/0x60

crm_node        D    0 27163      1 0x00000084
MemAlloc: crm_node(27163) flags=0xc00900 switches=3 seq=438 gfp=0x14200ca(GFP_HIGHUSER_MOVABLE) order=0 delay=119682540 uninterruptible
Call Trace:
 ? __schedule+0x1dc/0x770
 schedule+0x32/0x80
 io_schedule+0x12/0x40
 __lock_page+0x105/0x150
 ? page_cache_tree_insert+0xb0/0xb0
 pagecache_get_page+0x161/0x210
 shmem_unused_huge_shrink+0x334/0x3f0
 super_cache_scan+0x176/0x180
 shrink_slab+0x275/0x460
 shrink_node+0x10e/0x320
 node_reclaim+0x19d/0x250
 get_page_from_freelist+0x16a/0xac0
 ? radix_tree_lookup_slot+0x1e/0x50
 ? find_lock_entry+0x45/0x80
 ? shmem_getpage_gfp.isra.34+0xe5/0xc80
 __alloc_pages_nodemask+0x111/0x2c0
 pte_alloc_one+0x13/0x40
 __pte_alloc+0x19/0x100
 alloc_set_pte+0x468/0x4c0
 finish_fault+0x3a/0x70
 __handle_mm_fault+0x94a/0x1190
 handle_mm_fault+0xf2/0x210
 __do_page_fault+0x253/0x4d0
 do_page_fault+0x33/0x120
 ? page_fault+0x36/0x60
 page_fault+0x4c/0x60
----------

And since switches= and seq= did not change, crm_node kept sleeping.

libvirtd remained stuck waiting for crm_node.
But something already went wrong more than one day ago.

----------
libvirtd        D    0  1125      1 0x00000080
Call Trace:
 ? __schedule+0x1dc/0x770
 schedule+0x32/0x80
 rwsem_down_write_failed+0x20d/0x380
 ? ida_get_new_above+0x110/0x3b0
 call_rwsem_down_write_failed+0x13/0x20
 down_write+0x29/0x40
 register_shrinker+0x45/0xa0
 sget_userns+0x468/0x4a0
 ? get_anon_bdev+0x100/0x100
 ? shmem_create+0x20/0x20
 mount_nodev+0x2a/0xa0
 mount_fs+0x34/0x150
 vfs_kern_mount+0x62/0x120
 do_mount+0x1ee/0xc50
 SyS_mount+0x7e/0xd0
 do_syscall_64+0x61/0x1a0
 entry_SYSCALL64_slow_path+0x25/0x25
----------

> I'm not sure if this is relevant, but the load average is wrong on the
> 4.14.15 machine:
>   load average: 1308.46, 1246.69, 1078.29
> There is no way those numbers are correct, top shows nothing spinning
> and vmstat only shows 1-4 processes in a running or blocked state.

If I recall correctly, insanely growing load average is a possible sign of
processes getting stuck one by one as one is created (by e.g. crond).

> 
> Here are the pid stacks in D states from ps above:
> 
> ~]# cat /proc/27163/stack 
> [<ffffffff900cd0d2>] io_schedule+0x12/0x40
> [<ffffffff901b4735>] __lock_page+0x105/0x150
> [<ffffffff901b4e61>] pagecache_get_page+0x161/0x210
> [<ffffffff901d4c74>] shmem_unused_huge_shrink+0x334/0x3f0
> [<ffffffff90251746>] super_cache_scan+0x176/0x180
> [<ffffffff901cb885>] shrink_slab+0x275/0x460
> [<ffffffff901d0d4e>] shrink_node+0x10e/0x320
> [<ffffffff901d10fd>] node_reclaim+0x19d/0x250
> [<ffffffff901be1ca>] get_page_from_freelist+0x16a/0xac0
> [<ffffffff901bef81>] __alloc_pages_nodemask+0x111/0x2c0
> [<ffffffff9006dbc3>] pte_alloc_one+0x13/0x40
> [<ffffffff901ef4e9>] __pte_alloc+0x19/0x100
> [<ffffffff901f1978>] alloc_set_pte+0x468/0x4c0
> [<ffffffff901f1a0a>] finish_fault+0x3a/0x70
> [<ffffffff901f385a>] __handle_mm_fault+0x94a/0x1190
> [<ffffffff901f4192>] handle_mm_fault+0xf2/0x210
> [<ffffffff900682a3>] __do_page_fault+0x253/0x4d0
> [<ffffffff90068553>] do_page_fault+0x33/0x120
> [<ffffffff908019dc>] page_fault+0x4c/0x60
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> 
> ~]# cat /proc/1125/stack 
> [<ffffffff907538d3>] call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff901cbb45>] register_shrinker+0x45/0xa0
> [<ffffffff90251168>] sget_userns+0x468/0x4a0
> [<ffffffff9025126a>] mount_nodev+0x2a/0xa0
> [<ffffffff90251de4>] mount_fs+0x34/0x150
> [<ffffffff902703f2>] vfs_kern_mount+0x62/0x120
> [<ffffffff90272c0e>] do_mount+0x1ee/0xc50
> [<ffffffff9027397e>] SyS_mount+0x7e/0xd0
> [<ffffffff90003831>] do_syscall_64+0x61/0x1a0
> [<ffffffff9080012c>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> > > If you have any ideas on creating an easy way to reproduce the problem, 
> > > then I can bisect---but bisecting one day at a time will take a long time, 
> > > and could be prone to bugs which I would like to avoid on this production 
> > > system.
> > > 
> > Thinking from SysRq-t output, I feel that some disk read is stuck.
> 
> Possibly.  I would not expect a hardware problem since we see this on two 
> different systems with different kernel versions.
> 
> > Since rsyslogd failed to catch portion of SysRq-t output, I can't confirm
> > whether register_shrinker() was in progress (nor all kernel worker threads
> > were reported).
> 
> As linked above, I was able to get the full trace with netconsole.
> 
> > But what I was surprised is number of kernel worker threads.
> > "grep ^kworker/ | sort" matched 314 threads and "grep ^kworker/0:"
> > matched 244.
> 
> We have many DRBD volumes and LVM volumes, most of which are dm-thin, so 
> that might be why. Also these servers have both scsi-mq and dm-mq enabled.

Ideally you could reproduce without DRBD, LVM, bcache etc. , for Marc MERLIN's
report ( http://lkml.kernel.org/r/20170502041235.zqmywvj5tiiom3jk@merlins.org )
was using (at least) bcache. It might be a bug which fails to submit I/O
request.

> > 
> > One of workqueue threads was waiting at
> > 
> > ----------
> > static void *new_read(struct dm_bufio_client *c, sector_t block,
> > 		      enum new_flag nf, struct dm_buffer **bp)
> > {
> > 	int need_submit;
> > 	struct dm_buffer *b;
> > 
> > 	LIST_HEAD(write_list);
> > 
> > 	dm_bufio_lock(c);
> > 	b = __bufio_new(c, block, nf, &need_submit, &write_list);
> > #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
> > 	if (b && b->hold_count == 1)
> > 		buffer_record_stack(b);
> > #endif
> > 	dm_bufio_unlock(c);
> > 
> > 	__flush_write_list(&write_list);
> > 
> > 	if (!b)
> > 		return NULL;
> > 
> > 	if (need_submit)
> > 		submit_io(b, READ, read_endio);
> > 
> > 	wait_on_bit_io(&b->state, B_READING, TASK_UNINTERRUPTIBLE); // <= here
> > 
> > 	if (b->read_error) {
> > 		int error = blk_status_to_errno(b->read_error);
> > 
> > 		dm_bufio_release(b);
> > 
> > 		return ERR_PTR(error);
> > 	}
> > 
> > 	*bp = b;
> > 
> > 	return b->data;
> > }
> > ----------
> > 
> > but what are possible reasons? Does this request depend on workqueue availability?
> 
> We are using dm-thin which uses dm-bufio.  The dm-thin pools are working
> properly, so I don't think this is the problem---or at least if it is
> the problem, it isn't affecting the thin pool.

Some suggestions:


You can enlarge kernel printk() buffer size using log_buf_len= kernel command line
parameter. As your system is large, you could allocate e.g. log_buf_len=67108864 .
Then, you can read kernel buffer using dmesg command (e.g. /usr/bin/dmesg ).
You can add timestamp to printk() messages by doing below command.

----------
# echo Y > /sys/module/printk/parameters/time
----------


Since crm_node starts stalling rather early, you can watch out for kernel
messages and userspace messages which were printed around crm_node started
stalling, for there might be some events (e.g. error recovery) occurring.


I'm not sure but maybe page_owner gives us some clue. (Michal, what do you think?)
You can compile your kernel with CONFIG_PAGE_OWNER=y and boot your kernel with
page_owner=on kernel command line option added. But that alone does not tell on
which "struct page *" processes got stuck. We will need to call dump_page() for
reporting specific "struct page *". One of approaches would be to run a SystemTap
script shown below (in background using -F option) and check how stalling pages
has been allocated. There might be some characteristics with stalling pages.

----------
# stap -DSTP_NO_OVERLOAD=1 -F -g - << "EOF"
global waiting_pages;
probe kernel.function("__lock_page").call { waiting_pages[$__page] = gettimeofday_s(); }
probe kernel.function("__lock_page").return { delete waiting_pages[@entry($__page)]; }
function my_dump_page(page:long) %{
  dump_page((struct page *) STAP_ARG_page, "lock_page() stalling");
%}
probe timer.s(60) {
  now = gettimeofday_s();
  foreach (page in waiting_pages)
    if (now - waiting_pages[page] >= 30)
      my_dump_page(page);
}
EOF
----------

An example allocated by plain write() (though this is without stall period
because I can't reproduce your problem) is shown below.

----------
[  164.573566] page:fffff3a9024b17c0 count:4 mapcount:0 mapping:ffff8811e2876d68 index:0x5827
[  164.584758] flags: 0x1fffff800010a9(locked|waiters|uptodate|lru|private)
[  164.593678] raw: 001fffff800010a9 ffff8811e2876d68 0000000000005827 00000004ffffffff
[  164.603410] raw: fffff3a9024b17a0 fffff3a9024cc420 ffff8811e2be27b8 ffff881269fdf800
[  164.608676] page dumped because: lock_page() stalling
[  164.612558] page->mem_cgroup:ffff881269fdf800
[  164.615770] page allocated via order 0, migratetype Movable, gfp_mask 0x1c2004a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_WRITE)
[  164.623143]  __alloc_pages_nodemask+0x184/0x470
[  164.626939]  pagecache_get_page+0xbe/0x310
[  164.630028]  grab_cache_page_write_begin+0x1f/0x40
[  164.633436]  iomap_write_begin.constprop.15+0x5a/0x150
[  164.636013]  iomap_write_actor+0x95/0x180
[  164.637955]  iomap_apply+0xa4/0x110
[  164.639710]  iomap_file_buffered_write+0x61/0x80
[  164.641796]  xfs_file_buffered_aio_write+0xfe/0x3b0 [xfs]
[  164.643945]  xfs_file_write_iter+0xfc/0x150 [xfs]
[  164.646290]  __vfs_write+0xfc/0x170
[  164.648048]  vfs_write+0xc5/0x1b0
[  164.649760]  SyS_write+0x55/0xc0
[  164.651317]  do_syscall_64+0x66/0x210
[  164.652994]  return_from_SYSCALL_64+0x0/0x75
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2018-01-29  5:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-24 23:57 Possible deadlock in v4.14.15 contention on shrinker_rwsem in shrink_slab() Eric Wheeler
2018-01-25  8:35 ` Michal Hocko
2018-01-25 10:51   ` Tetsuo Handa
2018-01-26 19:32   ` Eric Wheeler
2018-01-27  6:34     ` Tetsuo Handa
2018-01-27 14:13       ` Tetsuo Handa
2018-01-29  0:04       ` Eric Wheeler
2018-01-29  5:27         ` Tetsuo Handa [this message]
2018-02-03  7:48           ` Tetsuo Handa
2018-02-12 12:43             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201801290527.w0T5RsPg024008@www262.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=agk@redhat.com \
    --cc=kent.overstreet@gmail.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=linux-mm@lists.ewheeler.net \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=snitzer@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.