All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@Oracle.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	n-horiguchi@ah.jp.nec.com, mike.kravetz@Oracle.com,
	aneesh.kumar@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com,
	punit.agrawal@arm.com, arnd@arndb.de, gerald.schaefer@de.ibm.com,
	aarcange@redhat.com, oleg@redhat.com,
	penguin-kernel@I-love.SAKURA.ne.jp, mingo@kernel.org,
	kirill.shutemov@linux.intel.com, vdavydov.dev@gmail.com
Subject: Re: [RFC PATCH 1/1] mm/hugetlb mm/oom_kill:  Add support for reclaiming hugepages on OOM events.
Date: Mon, 31 Jul 2017 07:37:35 -0700	[thread overview]
Message-ID: <20170731143735.GI15980@bombadil.infradead.org> (raw)
In-Reply-To: <20170731140810.GD4829@dhcp22.suse.cz>

On Mon, Jul 31, 2017 at 04:08:10PM +0200, Michal Hocko wrote:
> On Mon 31-07-17 09:56:48, Liam R. Howlett wrote:
> > * Michal Hocko <mhocko@kernel.org> [170731 05:10]:
> > > On Fri 28-07-17 21:56:38, Liam R. Howlett wrote:
> > > > The case I raise is a correctly configured system which has a memory
> > > > module failure.
> > > 
> > > So you are concerned about MCEs due to failing memory modules? If yes
> > > why do you care about hugetlb in particular?
> > 
> > No,  I am concerned about a failed memory module.  The system will
> > detect certain failures, mark the memory as bad and automatically
> > reboot.  Up on rebooting, that module will not be used.
> 
> How do you detect/configure this? We do have HWPoison infrastructure
> 
> > My focus on hugetlb is that it can stop the automatic recovery of the
> > system.
> 
> How?

Let me try to explain the situation as I understand it.

The customer has purchased a 128TB machine in order to run a database.
They reserve 124TB of memory for use by the database cache.  Everything
works great.  Then a 4TB memory module goes bad.  The machine reboots
itself in order to return to operation, now having only 124TB of memory
and having 124TB of memory reserved.  It OOMs during boot.  The current
output from our OOM machinery doesn't point the sysadmin at the kernel
command line parameter as now being the problem.  So they file a priority
1 problem ticket ...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-31 14:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-27 18:02 [RFC PATCH 0/1] oom support for reclaiming of hugepages Liam R. Howlett
2017-07-27 18:02 ` [RFC PATCH 1/1] mm/hugetlb mm/oom_kill: Add support for reclaiming hugepages on OOM events Liam R. Howlett
2017-07-28  6:46   ` Michal Hocko
2017-07-28 11:33     ` Kirill A. Shutemov
2017-07-28 12:23       ` Michal Hocko
2017-07-28 12:44         ` Michal Hocko
2017-07-29  1:56           ` Liam R. Howlett
2017-07-31  9:10             ` Michal Hocko
2017-07-31 13:56               ` Liam R. Howlett
2017-07-31 14:08                 ` Michal Hocko
2017-07-31 14:37                   ` Matthew Wilcox [this message]
2017-07-31 14:49                     ` Michal Hocko
2017-08-01  1:25                       ` Liam R. Howlett
2017-08-01  8:28                         ` Michal Hocko
2017-08-01  1:11                   ` Liam R. Howlett
2017-08-01  8:29                     ` Michal Hocko
2017-08-01 14:41                       ` Liam R. Howlett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170731143735.GI15980@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=Liam.Howlett@Oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=arnd@arndb.de \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@Oracle.com \
    --cc=mingo@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=punit.agrawal@arm.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.