All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: "Jindřich Makovička" <makovick@gmail.com>
Cc: linux-kernel@vger.kernel.org, Mel Gorman <mel@csn.ul.ie>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2
Date: Wed, 2 Feb 2011 01:26:05 +0100	[thread overview]
Message-ID: <20110202002605.GD16981@random.random> (raw)
In-Reply-To: <AANLkTi=bqnaif=7xdLFDny86-WYJONRZB45Q=ekKMMst@mail.gmail.com>

On Tue, Feb 01, 2011 at 10:24:00PM +0100, Jindřich Makovička wrote:
> With -rc2, there is
> 
> $ ps aux | grep -E "kswap|khugep"
> root       474  0.0  0.0      0     0 ?        S    20:44   0:00 [kswapd0]
> root       540  0.0  0.0      0     0 ?        DN   20:44   0:00 [khugepaged]
> 
> Sysrq-t output is attached.

khugepaged is missing at the top because dmesg is too small to fit all
sysrq+t.

Anyway I see lots of tasks (you've some heavy java load allocating
plenty of hugepages) that allocates transparent hugepages and they're
all stuck in migrate_pages->wait_on_page_writeback and
migrate_pages->writepage.

> Good news is, I don't see these issues with -rc3.

Ah try again, I didn't check the diff between -rc2 and -rc3 to be able
to tell what helped.. but it sounds too easy that got magically fixed
by -rc3.

Anyway it's not THP, it had to be something in compaction, and if it
happens again you can be sure that doing "echo never >defrag" will fix
it (if it really is it). Ironically you can leave khugepaged/defrag
set to "always". It's ok if khugepaged stays in D state (khugepaged
will actually be not noticeable at all in D state with CONFIG_NUMA=n,
because it'd allocate all hugepages without having to hold any
mmap_sem at all, but with CONFIG_NUMA=y it tried to allocate the
hugepage from the right node and it needs to pass a vma down to the
allocator to track the right allocation node, and that requires the
mmap_sem read mode during the allocation to avoid the vma to go away,
but it's no big deal).

Maybe we need to change compaction to never block unless some
__GFP_COMPACTION_WAIT bitflag is set. It's perfectly ok to fail some
hugepage allocation if there's congestion like that without trying so
hard to allocate hugepages. The only thing that would need to pass
down a __GFP_COMPACTION_WAIT would then be fork() in the kernel stack
allocation... everything else should have a 4k fallback. Even
khugepaged doesn't need so hard to compact if the system is under huge
stress.

Usually to reproduce you need "cp /dev/zero /mnt/usbdrive", and that
tends to hang all systems no matter THP or not... it's hard to
quantify what is normal and what is not.

I've another latency issue that is much easier to quantify for some
heavy write fs-network load being reported that is most certainly
related to the use of compaction even for the jumbo frames and large
network skbs. It's still compaction related (not THP related as THP on
but with compaction only used by THP it doesn't happen). I'll let you
know when that is fixed for any patch to try as that may benefit your
workload too. In the meantime if you've have more data let me know.

Thanks,
Andrea

  reply	other threads:[~2011-02-02  0:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-31 19:28 khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2 Jindřich Makovička
2011-02-01 15:49 ` Andrea Arcangeli
2011-02-01 21:24   ` Jindřich Makovička
2011-02-02  0:26     ` Andrea Arcangeli [this message]
2011-02-03 13:24   ` Mel Gorman
2011-02-03 19:06     ` Andrea Arcangeli
2011-02-03 21:16       ` Jindřich Makovička
2011-02-04 15:48         ` Andrea Arcangeli
2011-02-13 10:47           ` Jindřich Makovička

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110202002605.GD16981@random.random \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=makovick@gmail.com \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.