linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Matthias Dahl <ml_linux-kernel@binary-island.eu>
Cc: linux-raid@vger.kernel.org, linux-mm@kvack.org,
	dm-devel@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
Date: Tue, 12 Jul 2016 13:59:15 +0200	[thread overview]
Message-ID: <20160712115915.GG14586@dhcp22.suse.cz> (raw)
In-Reply-To: <20160712114920.GF14586@dhcp22.suse.cz>

On Tue 12-07-16 13:49:20, Michal Hocko wrote:
> On Tue 12-07-16 13:28:12, Matthias Dahl wrote:
> > Hello Michal...
> > 
> > On 2016-07-12 11:50, Michal Hocko wrote:
> > 
> > > This smells like file pages are stuck in the writeback somewhere and the
> > > anon memory is not reclaimable because you do not have any swap device.
> > 
> > Not having a swap device shouldn't be a problem -- and in this case, it
> > would cause even more trouble as in disk i/o.
> > 
> > What could cause the file pages to get stuck or stopped from being written
> > to the disk? And more importantly, what is so unique/special about the
> > Intel Rapid Storage that it happens (seemingly) exclusively with that
> > and not the the normal Linux s/w raid support?
> 
> I am not a storage expert (not even mention dm-crypt). But what those
> counters say is that the IO completion doesn't trigger so the
> PageWriteback flag is still set. Such a page is not reclaimable
> obviously. So I would check the IO delivery path and focus on the
> potential dm-crypt involvement if you suspect this is a contributing
> factor.
>  
> > Also, if the pages are not written to disk, shouldn't something error
> > out or slow dd down?
> 
> Writers are normally throttled when we the dirty limit. You seem to have
> dirty_ratio set to 20% which is quite a lot considering how much memory
> you have.

And just to clarify. dirty_ratio refers to dirtyable memory which is
free_pages+file_lru pages. In your case you you have only 9% of the total
memory size dirty/writeback but that is 90% of dirtyable memory. This is
quite possible if somebody consumes free_pages racing with the writer.
Writer will get throttled but the concurrent memory consumer will not
normally. So you can end up in this situation.

> If you get back to the memory info from the OOM killer report:
> [18907.592209] active_anon:110314 inactive_anon:295 isolated_anon:0
>                 active_file:27534 inactive_file:819673 isolated_file:160
>                 unevictable:13001 dirty:167859 writeback:651864 unstable:0
>                 slab_reclaimable:177477 slab_unreclaimable:1817501
>                 mapped:934 shmem:588 pagetables:7109 bounce:0
>                 free:49928 free_pcp:45 free_cma:0
> 
> The dirty+writeback is ~9%. What is more interesting, though, LRU
> pages are negligible to the memory size (~11%). Note the numer of
> unreclaimable slab pages (~20%). Who is consuming those objects?
> Where is the rest 70% of memory hiding?


-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2016-07-12 11:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-12  8:27 Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) Matthias Dahl
2016-07-12  9:50 ` Michal Hocko
2016-07-12 11:28   ` Matthias Dahl
2016-07-12 11:49     ` Michal Hocko
2016-07-12 11:59       ` Michal Hocko [this message]
2016-07-12 12:42       ` Matthias Dahl
2016-07-12 14:07         ` Michal Hocko
2016-07-12 14:56           ` Matthias Dahl
2016-07-13 11:21             ` Michal Hocko
2016-07-13 12:18               ` Michal Hocko
2016-07-13 13:18                 ` Matthias Dahl
2016-07-13 13:47                   ` Michal Hocko
2016-07-13 15:32                     ` Matthias Dahl
2016-07-13 16:24                       ` [dm-devel] " Ondrej Kozina
2016-07-13 18:24                         ` Matthias Dahl
2016-07-14 11:18                     ` Tetsuo Handa
2016-07-15  7:11                       ` Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) with check/repair/sync Matthias Dahl
2016-07-18  7:24                         ` Matthias Dahl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160712115915.GG14586@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=ml_linux-kernel@binary-island.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).