All of lore.kernel.org
 help / color / mirror / Atom feed
From: Torsten Kaiser <just.for.lkml@googlemail.com>
To: Neil Brown <neilb@suse.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Li Shaohua <shaohua.li@intel.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.
Date: Tue, 19 Oct 2010 12:06:21 +0200	[thread overview]
Message-ID: <AANLkTimVu+5gTDs8przJVP2EbWC=FX-zWW7aH08BtrHC@mail.gmail.com> (raw)
In-Reply-To: <AANLkTin3wXWwA-HXhjx6wvzznp3p57Pg6fee8YNkZB79@mail.gmail.com>

On Tue, Oct 19, 2010 at 10:43 AM, Torsten Kaiser
<just.for.lkml@googlemail.com> wrote:
> On Tue, Oct 19, 2010 at 1:11 AM, Neil Brown <neilb@suse.de> wrote:
>> Yes, thanks for the report.
>> This is a real bug exactly as you describe.
>>
>> This is how I think I will fix it, though it needs a bit of review and
>> testing before I can be certain.
>> Also I need to check raid10 etc to see if they can suffer too.
>>
>> If you can test it I would really appreciate it.
>
> I did test it, but while it seemed to fix the deadlock, the system
> still got unusable.
> The still running "vmstat 1" showed that the swapout was still
> progressing, but at a rate of ~20k sized bursts every 5 to 20 seconds.
>
> I also tried to additionally add Wu's patch:
> --- linux-next.orig/mm/vmscan.c 2010-10-13 12:35:14.000000000 +0800
> +++ linux-next/mm/vmscan.c      2010-10-19 00:13:04.000000000 +0800
> @@ -1163,6 +1163,13 @@ static int too_many_isolated(struct zone
>               isolated = zone_page_state(zone, NR_ISOLATED_ANON);
>       }
>
> +       /*
> +        * GFP_NOIO/GFP_NOFS callers are allowed to isolate more pages, so that
> +        * they won't get blocked by normal ones and form circular deadlock.
> +        */
> +       if ((sc->gfp_mask & GFP_IOFS) == GFP_IOFS)
> +               inactive >>= 3;
> +
>       return isolated > inactive;
>
> Either it did help somewhat, or I was more lucky on my second try, but
> this time I needed ~5 tries instead of only 2 to get the system mostly
> stuck again. On the testrun with Wu's patch the writeout pattern was
> more stable, a burst of ~80kb each 20 seconds. But I would suspect
> that the size of the burst is rather random.
>
> I do have a complete SysRq+T dump from the first run, I can send that
> to anyone how wants it.
> (It's 190k so I don't want not spam it to the list)

Is this call trace from the SysRq+T violation the rule to only
allocate one bio from bio_alloc() until its submitted?

[  549.700038] Call Trace:
[  549.700038]  [<ffffffff81566b54>] schedule_timeout+0x144/0x200
[  549.700038]  [<ffffffff81045cd0>] ? process_timeout+0x0/0x10
[  549.700038]  [<ffffffff81565e22>] io_schedule_timeout+0x42/0x60
[  549.700038]  [<ffffffff81083123>] mempool_alloc+0x163/0x1b0
[  549.700038]  [<ffffffff81053560>] ? autoremove_wake_function+0x0/0x40
[  549.700038]  [<ffffffff810ea2b9>] bio_alloc_bioset+0x39/0xf0
[  549.700038]  [<ffffffff810ea38d>] bio_clone+0x1d/0x50
[  549.700038]  [<ffffffff814318ed>] make_request+0x23d/0x850
[  549.700038]  [<ffffffff81082e20>] ? mempool_alloc_slab+0x10/0x20
[  549.700038]  [<ffffffff81045cd0>] ? process_timeout+0x0/0x10
[  549.700038]  [<ffffffff81436e63>] md_make_request+0xc3/0x220
[  549.700038]  [<ffffffff81083099>] ? mempool_alloc+0xd9/0x1b0
[  549.700038]  [<ffffffff811ec153>] generic_make_request+0x1b3/0x370
[  549.700038]  [<ffffffff810ea2d6>] ? bio_alloc_bioset+0x56/0xf0
[  549.700038]  [<ffffffff811ec36a>] submit_bio+0x5a/0xd0
[  549.700038]  [<ffffffff81080cf5>] ? unlock_page+0x25/0x30
[  549.700038]  [<ffffffff810a871e>] swap_writepage+0x7e/0xc0
[  549.700038]  [<ffffffff81090d99>] shmem_writepage+0x1c9/0x240
[  549.700038]  [<ffffffff8108c9cb>] pageout+0x11b/0x270
[  549.700038]  [<ffffffff8108cd78>] shrink_page_list+0x258/0x4d0
[  549.700038]  [<ffffffff8108d9e7>] shrink_inactive_list+0x187/0x310
[  549.700038]  [<ffffffff8102dcb1>] ? __wake_up_common+0x51/0x80
[  549.700038]  [<ffffffff811fc8b2>] ? cpumask_next_and+0x22/0x40
[  549.700038]  [<ffffffff8108e1c0>] shrink_zone+0x3e0/0x470
[  549.700038]  [<ffffffff8108e797>] try_to_free_pages+0x157/0x410
[  549.700038]  [<ffffffff81087c92>] __alloc_pages_nodemask+0x412/0x760
[  549.700038]  [<ffffffff810b27d6>] alloc_pages_current+0x76/0xe0
[  549.700038]  [<ffffffff810b6dad>] new_slab+0x1fd/0x2a0
[  549.700038]  [<ffffffff81045cd0>] ? process_timeout+0x0/0x10
[  549.700038]  [<ffffffff810b8721>] __slab_alloc+0x111/0x540
[  549.700038]  [<ffffffff81059961>] ? prepare_creds+0x21/0xb0
[  549.700038]  [<ffffffff810b92bb>] kmem_cache_alloc+0x9b/0xa0
[  549.700038]  [<ffffffff81059961>] prepare_creds+0x21/0xb0
[  549.700038]  [<ffffffff8104a919>] sys_setresgid+0x29/0x120
[  549.700038]  [<ffffffff8100242b>] system_call_fastpath+0x16/0x1b
[  549.700038]  ffff88011e125ea8 0000000000000046 ffff88011e125e08
ffffffff81073c59
[  549.700038]  0000000000012780 ffff88011ea905b0 ffff88011ea90808
ffff88011e125fd8
[  549.700038]  ffff88011ea90810 ffff88011e124010 0000000000012780
ffff88011e125fd8

swap_writepage() uses get_swap_bio() which uses bio_alloc() to get one
bio. That bio is the submitted, but the submit path seems to get into
make_request from raid1.c and that allocates a second bio from
bio_alloc() via bio_clone().

I am seeing this pattern (swap_writepage calling
md_make_request/make_request and then getting stuck in mempool_alloc)
more than 5 times in the SysRq+T output...


Torsten

WARNING: multiple messages have this Message-ID (diff)
From: Torsten Kaiser <just.for.lkml@googlemail.com>
To: Neil Brown <neilb@suse.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Li Shaohua <shaohua.li@intel.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.
Date: Tue, 19 Oct 2010 12:06:21 +0200	[thread overview]
Message-ID: <AANLkTimVu+5gTDs8przJVP2EbWC=FX-zWW7aH08BtrHC@mail.gmail.com> (raw)
In-Reply-To: <AANLkTin3wXWwA-HXhjx6wvzznp3p57Pg6fee8YNkZB79@mail.gmail.com>

On Tue, Oct 19, 2010 at 10:43 AM, Torsten Kaiser
<just.for.lkml@googlemail.com> wrote:
> On Tue, Oct 19, 2010 at 1:11 AM, Neil Brown <neilb@suse.de> wrote:
>> Yes, thanks for the report.
>> This is a real bug exactly as you describe.
>>
>> This is how I think I will fix it, though it needs a bit of review and
>> testing before I can be certain.
>> Also I need to check raid10 etc to see if they can suffer too.
>>
>> If you can test it I would really appreciate it.
>
> I did test it, but while it seemed to fix the deadlock, the system
> still got unusable.
> The still running "vmstat 1" showed that the swapout was still
> progressing, but at a rate of ~20k sized bursts every 5 to 20 seconds.
>
> I also tried to additionally add Wu's patch:
> --- linux-next.orig/mm/vmscan.c 2010-10-13 12:35:14.000000000 +0800
> +++ linux-next/mm/vmscan.c      2010-10-19 00:13:04.000000000 +0800
> @@ -1163,6 +1163,13 @@ static int too_many_isolated(struct zone
>               isolated = zone_page_state(zone, NR_ISOLATED_ANON);
>       }
>
> +       /*
> +        * GFP_NOIO/GFP_NOFS callers are allowed to isolate more pages, so that
> +        * they won't get blocked by normal ones and form circular deadlock.
> +        */
> +       if ((sc->gfp_mask & GFP_IOFS) == GFP_IOFS)
> +               inactive >>= 3;
> +
>       return isolated > inactive;
>
> Either it did help somewhat, or I was more lucky on my second try, but
> this time I needed ~5 tries instead of only 2 to get the system mostly
> stuck again. On the testrun with Wu's patch the writeout pattern was
> more stable, a burst of ~80kb each 20 seconds. But I would suspect
> that the size of the burst is rather random.
>
> I do have a complete SysRq+T dump from the first run, I can send that
> to anyone how wants it.
> (It's 190k so I don't want not spam it to the list)

Is this call trace from the SysRq+T violation the rule to only
allocate one bio from bio_alloc() until its submitted?

[  549.700038] Call Trace:
[  549.700038]  [<ffffffff81566b54>] schedule_timeout+0x144/0x200
[  549.700038]  [<ffffffff81045cd0>] ? process_timeout+0x0/0x10
[  549.700038]  [<ffffffff81565e22>] io_schedule_timeout+0x42/0x60
[  549.700038]  [<ffffffff81083123>] mempool_alloc+0x163/0x1b0
[  549.700038]  [<ffffffff81053560>] ? autoremove_wake_function+0x0/0x40
[  549.700038]  [<ffffffff810ea2b9>] bio_alloc_bioset+0x39/0xf0
[  549.700038]  [<ffffffff810ea38d>] bio_clone+0x1d/0x50
[  549.700038]  [<ffffffff814318ed>] make_request+0x23d/0x850
[  549.700038]  [<ffffffff81082e20>] ? mempool_alloc_slab+0x10/0x20
[  549.700038]  [<ffffffff81045cd0>] ? process_timeout+0x0/0x10
[  549.700038]  [<ffffffff81436e63>] md_make_request+0xc3/0x220
[  549.700038]  [<ffffffff81083099>] ? mempool_alloc+0xd9/0x1b0
[  549.700038]  [<ffffffff811ec153>] generic_make_request+0x1b3/0x370
[  549.700038]  [<ffffffff810ea2d6>] ? bio_alloc_bioset+0x56/0xf0
[  549.700038]  [<ffffffff811ec36a>] submit_bio+0x5a/0xd0
[  549.700038]  [<ffffffff81080cf5>] ? unlock_page+0x25/0x30
[  549.700038]  [<ffffffff810a871e>] swap_writepage+0x7e/0xc0
[  549.700038]  [<ffffffff81090d99>] shmem_writepage+0x1c9/0x240
[  549.700038]  [<ffffffff8108c9cb>] pageout+0x11b/0x270
[  549.700038]  [<ffffffff8108cd78>] shrink_page_list+0x258/0x4d0
[  549.700038]  [<ffffffff8108d9e7>] shrink_inactive_list+0x187/0x310
[  549.700038]  [<ffffffff8102dcb1>] ? __wake_up_common+0x51/0x80
[  549.700038]  [<ffffffff811fc8b2>] ? cpumask_next_and+0x22/0x40
[  549.700038]  [<ffffffff8108e1c0>] shrink_zone+0x3e0/0x470
[  549.700038]  [<ffffffff8108e797>] try_to_free_pages+0x157/0x410
[  549.700038]  [<ffffffff81087c92>] __alloc_pages_nodemask+0x412/0x760
[  549.700038]  [<ffffffff810b27d6>] alloc_pages_current+0x76/0xe0
[  549.700038]  [<ffffffff810b6dad>] new_slab+0x1fd/0x2a0
[  549.700038]  [<ffffffff81045cd0>] ? process_timeout+0x0/0x10
[  549.700038]  [<ffffffff810b8721>] __slab_alloc+0x111/0x540
[  549.700038]  [<ffffffff81059961>] ? prepare_creds+0x21/0xb0
[  549.700038]  [<ffffffff810b92bb>] kmem_cache_alloc+0x9b/0xa0
[  549.700038]  [<ffffffff81059961>] prepare_creds+0x21/0xb0
[  549.700038]  [<ffffffff8104a919>] sys_setresgid+0x29/0x120
[  549.700038]  [<ffffffff8100242b>] system_call_fastpath+0x16/0x1b
[  549.700038]  ffff88011e125ea8 0000000000000046 ffff88011e125e08
ffffffff81073c59
[  549.700038]  0000000000012780 ffff88011ea905b0 ffff88011ea90808
ffff88011e125fd8
[  549.700038]  ffff88011ea90810 ffff88011e124010 0000000000012780
ffff88011e125fd8

swap_writepage() uses get_swap_bio() which uses bio_alloc() to get one
bio. That bio is the submitted, but the submit path seems to get into
make_request from raid1.c and that allocates a second bio from
bio_alloc() via bio_clone().

I am seeing this pattern (swap_writepage calling
md_make_request/make_request and then getting stuck in mempool_alloc)
more than 5 times in the SysRq+T output...


Torsten

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-10-19 10:06 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-14 23:11 Deadlock possibly caused by too_many_isolated Neil Brown
2010-09-14 23:11 ` Neil Brown
2010-09-15  0:30 ` Rik van Riel
2010-09-15  0:30   ` Rik van Riel
2010-09-15  2:23   ` Neil Brown
2010-09-15  2:23     ` Neil Brown
2010-09-15  2:37     ` Wu Fengguang
2010-09-15  2:37       ` Wu Fengguang
2010-09-15  2:54       ` Wu Fengguang
2010-09-15  2:54         ` Wu Fengguang
2010-09-15  3:06         ` Wu Fengguang
2010-09-15  3:06           ` Wu Fengguang
2010-09-15  3:13           ` Wu Fengguang
2010-09-15  3:13             ` Wu Fengguang
2010-09-15  3:18             ` Shaohua Li
2010-09-15  3:18               ` Shaohua Li
2010-09-15  3:31               ` Wu Fengguang
2010-09-15  3:31                 ` Wu Fengguang
2010-09-15  3:17           ` Neil Brown
2010-09-15  3:17             ` Neil Brown
2010-09-15  3:47             ` Wu Fengguang
2010-09-15  3:47               ` Wu Fengguang
2010-09-15  8:28     ` Wu Fengguang
2010-09-15  8:28       ` Wu Fengguang
2010-09-15  8:44       ` Neil Brown
2010-09-15  8:44         ` Neil Brown
2010-10-18  4:14         ` Neil Brown
2010-10-18  4:14           ` Neil Brown
2010-10-18  5:04           ` KOSAKI Motohiro
2010-10-18  5:04             ` KOSAKI Motohiro
2010-10-18 10:58           ` Torsten Kaiser
2010-10-18 10:58             ` Torsten Kaiser
2010-10-18 23:11             ` Neil Brown
2010-10-18 23:11               ` Neil Brown
2010-10-19  8:43               ` Torsten Kaiser
2010-10-19  8:43                 ` Torsten Kaiser
2010-10-19 10:06                 ` Torsten Kaiser [this message]
2010-10-19 10:06                   ` Torsten Kaiser
2010-10-20  5:57                   ` Wu Fengguang
2010-10-20  5:57                     ` Wu Fengguang
2010-10-20  7:05                     ` KOSAKI Motohiro
2010-10-20  7:05                       ` KOSAKI Motohiro
2010-10-20  9:27                       ` Wu Fengguang
2010-10-20  9:27                         ` Wu Fengguang
2010-10-20 13:03                         ` Jens Axboe
2010-10-20 13:03                           ` Jens Axboe
2010-10-22  5:37                           ` Wu Fengguang
2010-10-22  5:37                             ` Wu Fengguang
2010-10-22  8:07                             ` Wu Fengguang
2010-10-22  8:07                               ` Wu Fengguang
2010-10-22  8:09                               ` Jens Axboe
2010-10-22  8:09                                 ` Jens Axboe
2010-10-24 16:52                                 ` Wu Fengguang
2010-10-24 16:52                                   ` Wu Fengguang
2010-10-25  6:40                                   ` Neil Brown
2010-10-25  6:40                                     ` Neil Brown
2010-10-25  7:26                                     ` Wu Fengguang
2010-10-25  7:26                                       ` Wu Fengguang
2010-10-20  7:25                     ` Torsten Kaiser
2010-10-20  7:25                       ` Torsten Kaiser
2010-10-20  9:01                       ` Wu Fengguang
2010-10-20  9:01                         ` Wu Fengguang
2010-10-20 10:07                         ` Torsten Kaiser
2010-10-20 10:07                           ` Torsten Kaiser
2010-10-20 14:23                       ` Minchan Kim
2010-10-20 14:23                         ` Minchan Kim
2010-10-20 15:35                         ` Torsten Kaiser
2010-10-20 15:35                           ` Torsten Kaiser
2010-10-20 23:31                           ` Minchan Kim
2010-10-20 23:31                             ` Minchan Kim
2010-10-18 16:15           ` Wu Fengguang
2010-10-18 16:15             ` Wu Fengguang
2010-10-18 21:58             ` Andrew Morton
2010-10-18 21:58               ` Andrew Morton
2010-10-18 22:31               ` Neil Brown
2010-10-18 22:31                 ` Neil Brown
2010-10-18 22:41                 ` Andrew Morton
2010-10-18 22:41                   ` Andrew Morton
2010-10-19  0:57                   ` KOSAKI Motohiro
2010-10-19  0:57                     ` KOSAKI Motohiro
2010-10-19  1:15                     ` Minchan Kim
2010-10-19  1:15                       ` Minchan Kim
2010-10-19  1:21                       ` KOSAKI Motohiro
2010-10-19  1:21                         ` KOSAKI Motohiro
2010-10-19  1:32                         ` Minchan Kim
2010-10-19  1:32                           ` Minchan Kim
2010-10-19  2:03                           ` KOSAKI Motohiro
2010-10-19  2:03                             ` KOSAKI Motohiro
2010-10-19  2:16                             ` Minchan Kim
2010-10-19  2:16                               ` Minchan Kim
2010-10-19  2:54                               ` KOSAKI Motohiro
2010-10-19  2:54                                 ` KOSAKI Motohiro
2010-10-19  2:35                       ` Wu Fengguang
2010-10-19  2:35                         ` Wu Fengguang
2010-10-19  2:52                         ` Minchan Kim
2010-10-19  2:52                           ` Minchan Kim
2010-10-19  3:05                           ` Wu Fengguang
2010-10-19  3:05                             ` Wu Fengguang
2010-10-19  3:09                             ` Minchan Kim
2010-10-19  3:09                               ` Minchan Kim
2010-10-19  3:13                               ` KOSAKI Motohiro
2010-10-19  3:13                                 ` KOSAKI Motohiro
2010-10-19  5:11                                 ` Minchan Kim
2010-10-19  5:11                                   ` Minchan Kim
2010-10-19  3:21                               ` Shaohua Li
2010-10-19  3:21                                 ` Shaohua Li
2010-10-19  7:15                                 ` Shaohua Li
2010-10-19  7:15                                   ` Shaohua Li
2010-10-19  7:34                                   ` Minchan Kim
2010-10-19  7:34                                     ` Minchan Kim
2010-10-19  2:24                   ` Wu Fengguang
2010-10-19  2:24                     ` Wu Fengguang
2010-10-19  2:37                     ` KOSAKI Motohiro
2010-10-19  2:37                       ` KOSAKI Motohiro
2010-10-19  2:37                     ` Minchan Kim
2010-10-19  2:37                       ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTimVu+5gTDs8przJVP2EbWC=FX-zWW7aH08BtrHC@mail.gmail.com' \
    --to=just.for.lkml@googlemail.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    --cc=riel@redhat.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.