* bio_iov_iter_get_pages() + page_alloc.shuffle=1 migrating failures
@ 2019-04-25 5:06 Qian Cai
2019-04-25 8:15 ` Ming Lei
0 siblings, 1 reply; 3+ messages in thread
From: Qian Cai @ 2019-04-25 5:06 UTC (permalink / raw)
To: Jens Axboe, hch; +Cc: linux-block, linux-kernel, Linux-MM, dan.j.williams
Memory offline [1] starts to fail on linux-next on ppc64le with
page_alloc.shuffle=1 where the "echo offline" command hangs with lots of
migrating failures below. It seems in migrate_page_move_mapping()
if (!mapping) {
/* Anonymous page without mapping */
if (page_count(page) != expected_count)
return -EAGAIN;
It expected count=1 but actual count=2.
There are two ways to make the problem go away. One is to remove this line in
__shuffle_free_memory(),
shuffle_zone(z);
The other is reverting some bio commits. Bisecting so far indicates the culprit
is in one of those (the 3rd commit looks more suspicious than the others).
block: only allow contiguous page structs in a bio_vec
block: don't allow multiple bio_iov_iter_get_pages calls per bio
block: change how we get page references in bio_iov_iter_get_pages
[ 446.578064] migrating pfn 2003d5eaa failed ret:22
[ 446.578066] page:c00a00800f57aa80 count:2 mapcount:0 mapping:c000001db4c827e9
index:0x13c08a
[ 446.578220] anon
[ 446.578222] flags: 0x83fffc00008002e(referenced|uptodate|dirty|active|swapbacked)
[ 446.578347] raw: 083fffc00008002e c00a00800f57f808 c00a00800f579f88
c000001db4c827e9
[ 446.944807] raw: 000000000013c08a 0000000000000000 00000002ffffffff
c00020141a738008
[ 446.944883] page dumped because: migration failure
[ 446.944948] page->mem_cgroup:c00020141a738008
[ 446.945024] page allocated via order 0, migratetype Movable, gfp_mask
0x100cca(GFP_HIGHUSER_MOVABLE)
[ 446.945148] prep_new_page+0x390/0x3a0
[ 446.945228] get_page_from_freelist+0xd9c/0x1bf0
[ 446.945292] __alloc_pages_nodemask+0x1cc/0x1780
[ 446.945335] alloc_pages_vma+0xc0/0x360
[ 446.945401] do_anonymous_page+0x244/0xb20
[ 446.945472] __handle_mm_fault+0xcf8/0xfb0
[ 446.945532] handle_mm_fault+0x1c0/0x2b0
[ 446.945615] __get_user_pages+0x3ec/0x690
[ 446.945652] get_user_pages_unlocked+0x104/0x2f0
[ 446.945693] get_user_pages_fast+0xb0/0x200
[ 446.945762] iov_iter_get_pages+0xf4/0x6a0
[ 446.945802] bio_iov_iter_get_pages+0xc0/0x450
[ 446.945876] blkdev_direct_IO+0x2e0/0x630
[ 446.945941] generic_file_read_iter+0xbc/0x230
[ 446.945990] blkdev_read_iter+0x50/0x80
[ 446.946031] aio_read+0x128/0x1d0
[ 446.946082] migrating pfn 2003d5fe0 failed ret:22
[ 446.946084] page:c00a00800f57f800 count:2 mapcount:0 mapping:c000001db4c827e9
index:0x13c19e
[ 446.946239] anon
[ 446.946241] flags: 0x83fffc00008002e(referenced|uptodate|dirty|active|swapbacked)
[ 446.946384] raw: 083fffc00008002e c000200deb3dfa28 c00a00800f57aa88
c000001db4c827e9
[ 446.946497] raw: 000000000013c19e 0000000000000000 00000002ffffffff
c00020141a738008
[ 446.946605] page dumped because: migration failure
[ 446.946662] page->mem_cgroup:c00020141a738008
[ 446.946724] page allocated via order 0, migratetype Movable, gfp_mask
0x100cca(GFP_HIGHUSER_MOVABLE)
[ 446.946846] prep_new_page+0x390/0x3a0
[ 446.946899] get_page_from_freelist+0xd9c/0x1bf0
[ 446.946959] __alloc_pages_nodemask+0x1cc/0x1780
[ 446.947047] alloc_pages_vma+0xc0/0x360
[ 446.947101] do_anonymous_page+0x244/0xb20
[ 446.947143] __handle_mm_fault+0xcf8/0xfb0
[ 446.947200] handle_mm_fault+0x1c0/0x2b0
[ 446.947256] __get_user_pages+0x3ec/0x690
[ 446.947306] get_user_pages_unlocked+0x104/0x2f0
[ 446.947366] get_user_pages_fast+0xb0/0x200
[ 446.947458] iov_iter_get_pages+0xf4/0x6a0
[ 446.947515] bio_iov_iter_get_pages+0xc0/0x450
[ 446.947588] blkdev_direct_IO+0x2e0/0x630
[ 446.947636] generic_file_read_iter+0xbc/0x230
[ 446.947703] blkdev_read_iter+0x50/0x80
[ 446.947758] aio_read+0x128/0x1d0
[1]
i=0
found=0
for mem in $(ls -d /sys/devices/system/memory/memory*); do
i=$((i + 1))
echo "iteration: $i"
echo offline > $mem/state
if [ $? -eq 0 ] && [ $found -eq 0 ]; then
found=1
continue
fi
echo online > $mem/state
done
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: bio_iov_iter_get_pages() + page_alloc.shuffle=1 migrating failures
2019-04-25 5:06 bio_iov_iter_get_pages() + page_alloc.shuffle=1 migrating failures Qian Cai
@ 2019-04-25 8:15 ` Ming Lei
2019-04-25 13:02 ` Qian Cai
0 siblings, 1 reply; 3+ messages in thread
From: Ming Lei @ 2019-04-25 8:15 UTC (permalink / raw)
To: Qian Cai
Cc: Jens Axboe, Christoph Hellwig, linux-block,
Linux Kernel Mailing List, Linux-MM, Dan Williams
On Thu, Apr 25, 2019 at 4:13 PM Qian Cai <cai@lca.pw> wrote:
>
> Memory offline [1] starts to fail on linux-next on ppc64le with
> page_alloc.shuffle=1 where the "echo offline" command hangs with lots of
> migrating failures below. It seems in migrate_page_move_mapping()
>
> if (!mapping) {
> /* Anonymous page without mapping */
> if (page_count(page) != expected_count)
> return -EAGAIN;
>
> It expected count=1 but actual count=2.
>
> There are two ways to make the problem go away. One is to remove this line in
> __shuffle_free_memory(),
>
> shuffle_zone(z);
>
> The other is reverting some bio commits. Bisecting so far indicates the culprit
> is in one of those (the 3rd commit looks more suspicious than the others).
>
> block: only allow contiguous page structs in a bio_vec
> block: don't allow multiple bio_iov_iter_get_pages calls per bio
> block: change how we get page references in bio_iov_iter_get_pages
>
> [ 446.578064] migrating pfn 2003d5eaa failed ret:22
> [ 446.578066] page:c00a00800f57aa80 count:2 mapcount:0 mapping:c000001db4c827e9
> index:0x13c08a
> [ 446.578220] anon
> [ 446.578222] flags: 0x83fffc00008002e(referenced|uptodate|dirty|active|swapbacked)
> [ 446.578347] raw: 083fffc00008002e c00a00800f57f808 c00a00800f579f88
> c000001db4c827e9
> [ 446.944807] raw: 000000000013c08a 0000000000000000 00000002ffffffff
> c00020141a738008
> [ 446.944883] page dumped because: migration failure
> [ 446.944948] page->mem_cgroup:c00020141a738008
> [ 446.945024] page allocated via order 0, migratetype Movable, gfp_mask
> 0x100cca(GFP_HIGHUSER_MOVABLE)
> [ 446.945148] prep_new_page+0x390/0x3a0
> [ 446.945228] get_page_from_freelist+0xd9c/0x1bf0
> [ 446.945292] __alloc_pages_nodemask+0x1cc/0x1780
> [ 446.945335] alloc_pages_vma+0xc0/0x360
> [ 446.945401] do_anonymous_page+0x244/0xb20
> [ 446.945472] __handle_mm_fault+0xcf8/0xfb0
> [ 446.945532] handle_mm_fault+0x1c0/0x2b0
> [ 446.945615] __get_user_pages+0x3ec/0x690
> [ 446.945652] get_user_pages_unlocked+0x104/0x2f0
> [ 446.945693] get_user_pages_fast+0xb0/0x200
> [ 446.945762] iov_iter_get_pages+0xf4/0x6a0
> [ 446.945802] bio_iov_iter_get_pages+0xc0/0x450
> [ 446.945876] blkdev_direct_IO+0x2e0/0x630
> [ 446.945941] generic_file_read_iter+0xbc/0x230
> [ 446.945990] blkdev_read_iter+0x50/0x80
> [ 446.946031] aio_read+0x128/0x1d0
> [ 446.946082] migrating pfn 2003d5fe0 failed ret:22
> [ 446.946084] page:c00a00800f57f800 count:2 mapcount:0 mapping:c000001db4c827e9
> index:0x13c19e
> [ 446.946239] anon
> [ 446.946241] flags: 0x83fffc00008002e(referenced|uptodate|dirty|active|swapbacked)
> [ 446.946384] raw: 083fffc00008002e c000200deb3dfa28 c00a00800f57aa88
> c000001db4c827e9
> [ 446.946497] raw: 000000000013c19e 0000000000000000 00000002ffffffff
> c00020141a738008
> [ 446.946605] page dumped because: migration failure
> [ 446.946662] page->mem_cgroup:c00020141a738008
> [ 446.946724] page allocated via order 0, migratetype Movable, gfp_mask
> 0x100cca(GFP_HIGHUSER_MOVABLE)
> [ 446.946846] prep_new_page+0x390/0x3a0
> [ 446.946899] get_page_from_freelist+0xd9c/0x1bf0
> [ 446.946959] __alloc_pages_nodemask+0x1cc/0x1780
> [ 446.947047] alloc_pages_vma+0xc0/0x360
> [ 446.947101] do_anonymous_page+0x244/0xb20
> [ 446.947143] __handle_mm_fault+0xcf8/0xfb0
> [ 446.947200] handle_mm_fault+0x1c0/0x2b0
> [ 446.947256] __get_user_pages+0x3ec/0x690
> [ 446.947306] get_user_pages_unlocked+0x104/0x2f0
> [ 446.947366] get_user_pages_fast+0xb0/0x200
> [ 446.947458] iov_iter_get_pages+0xf4/0x6a0
> [ 446.947515] bio_iov_iter_get_pages+0xc0/0x450
> [ 446.947588] blkdev_direct_IO+0x2e0/0x630
> [ 446.947636] generic_file_read_iter+0xbc/0x230
> [ 446.947703] blkdev_read_iter+0x50/0x80
> [ 446.947758] aio_read+0x128/0x1d0
>
> [1]
> i=0
> found=0
> for mem in $(ls -d /sys/devices/system/memory/memory*); do
> i=$((i + 1))
> echo "iteration: $i"
> echo offline > $mem/state
> if [ $? -eq 0 ] && [ $found -eq 0 ]; then
> found=1
> continue
> fi
> echo online > $mem/state
> done
Please try the following patch:
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-5.2/block&id=0257c0ed5ea3de3e32cb322852c4c40bc09d1b97
Thanks,
Ming Lei
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: bio_iov_iter_get_pages() + page_alloc.shuffle=1 migrating failures
2019-04-25 8:15 ` Ming Lei
@ 2019-04-25 13:02 ` Qian Cai
0 siblings, 0 replies; 3+ messages in thread
From: Qian Cai @ 2019-04-25 13:02 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, Christoph Hellwig, linux-block,
Linux Kernel Mailing List, Linux-MM, Dan Williams
On Thu, 2019-04-25 at 16:15 +0800, Ming Lei wrote:
> On Thu, Apr 25, 2019 at 4:13 PM Qian Cai <cai@lca.pw> wrote:
> >
> > Memory offline [1] starts to fail on linux-next on ppc64le with
> > page_alloc.shuffle=1 where the "echo offline" command hangs with lots of
> > migrating failures below. It seems in migrate_page_move_mapping()
> >
> > if (!mapping) {
> > /* Anonymous page without mapping */
> > if (page_count(page) != expected_count)
> > return -EAGAIN;
> >
> > It expected count=1 but actual count=2.
> >
> > There are two ways to make the problem go away. One is to remove this line
> > in
> > __shuffle_free_memory(),
> >
> > shuffle_zone(z);
> >
> > The other is reverting some bio commits. Bisecting so far indicates the
> > culprit
> > is in one of those (the 3rd commit looks more suspicious than the others).
> >
> > block: only allow contiguous page structs in a bio_vec
> > block: don't allow multiple bio_iov_iter_get_pages calls per bio
> > block: change how we get page references in bio_iov_iter_get_pages
> >
> > [ 446.578064] migrating pfn 2003d5eaa failed ret:22
> > [ 446.578066] page:c00a00800f57aa80 count:2 mapcount:0
> > mapping:c000001db4c827e9
> > index:0x13c08a
> > [ 446.578220] anon
> > [ 446.578222] flags:
> > 0x83fffc00008002e(referenced|uptodate|dirty|active|swapbacked)
> > [ 446.578347] raw: 083fffc00008002e c00a00800f57f808 c00a00800f579f88
> > c000001db4c827e9
> > [ 446.944807] raw: 000000000013c08a 0000000000000000 00000002ffffffff
> > c00020141a738008
> > [ 446.944883] page dumped because: migration failure
> > [ 446.944948] page->mem_cgroup:c00020141a738008
> > [ 446.945024] page allocated via order 0, migratetype Movable, gfp_mask
> > 0x100cca(GFP_HIGHUSER_MOVABLE)
> > [ 446.945148] prep_new_page+0x390/0x3a0
> > [ 446.945228] get_page_from_freelist+0xd9c/0x1bf0
> > [ 446.945292] __alloc_pages_nodemask+0x1cc/0x1780
> > [ 446.945335] alloc_pages_vma+0xc0/0x360
> > [ 446.945401] do_anonymous_page+0x244/0xb20
> > [ 446.945472] __handle_mm_fault+0xcf8/0xfb0
> > [ 446.945532] handle_mm_fault+0x1c0/0x2b0
> > [ 446.945615] __get_user_pages+0x3ec/0x690
> > [ 446.945652] get_user_pages_unlocked+0x104/0x2f0
> > [ 446.945693] get_user_pages_fast+0xb0/0x200
> > [ 446.945762] iov_iter_get_pages+0xf4/0x6a0
> > [ 446.945802] bio_iov_iter_get_pages+0xc0/0x450
> > [ 446.945876] blkdev_direct_IO+0x2e0/0x630
> > [ 446.945941] generic_file_read_iter+0xbc/0x230
> > [ 446.945990] blkdev_read_iter+0x50/0x80
> > [ 446.946031] aio_read+0x128/0x1d0
> > [ 446.946082] migrating pfn 2003d5fe0 failed ret:22
> > [ 446.946084] page:c00a00800f57f800 count:2 mapcount:0
> > mapping:c000001db4c827e9
> > index:0x13c19e
> > [ 446.946239] anon
> > [ 446.946241] flags:
> > 0x83fffc00008002e(referenced|uptodate|dirty|active|swapbacked)
> > [ 446.946384] raw: 083fffc00008002e c000200deb3dfa28 c00a00800f57aa88
> > c000001db4c827e9
> > [ 446.946497] raw: 000000000013c19e 0000000000000000 00000002ffffffff
> > c00020141a738008
> > [ 446.946605] page dumped because: migration failure
> > [ 446.946662] page->mem_cgroup:c00020141a738008
> > [ 446.946724] page allocated via order 0, migratetype Movable, gfp_mask
> > 0x100cca(GFP_HIGHUSER_MOVABLE)
> > [ 446.946846] prep_new_page+0x390/0x3a0
> > [ 446.946899] get_page_from_freelist+0xd9c/0x1bf0
> > [ 446.946959] __alloc_pages_nodemask+0x1cc/0x1780
> > [ 446.947047] alloc_pages_vma+0xc0/0x360
> > [ 446.947101] do_anonymous_page+0x244/0xb20
> > [ 446.947143] __handle_mm_fault+0xcf8/0xfb0
> > [ 446.947200] handle_mm_fault+0x1c0/0x2b0
> > [ 446.947256] __get_user_pages+0x3ec/0x690
> > [ 446.947306] get_user_pages_unlocked+0x104/0x2f0
> > [ 446.947366] get_user_pages_fast+0xb0/0x200
> > [ 446.947458] iov_iter_get_pages+0xf4/0x6a0
> > [ 446.947515] bio_iov_iter_get_pages+0xc0/0x450
> > [ 446.947588] blkdev_direct_IO+0x2e0/0x630
> > [ 446.947636] generic_file_read_iter+0xbc/0x230
> > [ 446.947703] blkdev_read_iter+0x50/0x80
> > [ 446.947758] aio_read+0x128/0x1d0
> >
> > [1]
> > i=0
> > found=0
> > for mem in $(ls -d /sys/devices/system/memory/memory*); do
> > i=$((i + 1))
> > echo "iteration: $i"
> > echo offline > $mem/state
> > if [ $? -eq 0 ] && [ $found -eq 0 ]; then
> > found=1
> > continue
> > fi
> > echo online > $mem/state
> > done
>
> Please try the following patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?
> h=for-5.2/block&id=0257c0ed5ea3de3e32cb322852c4c40bc09d1b97
It works great so far!
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-04-25 13:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-25 5:06 bio_iov_iter_get_pages() + page_alloc.shuffle=1 migrating failures Qian Cai
2019-04-25 8:15 ` Ming Lei
2019-04-25 13:02 ` Qian Cai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).