All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laurent Vivier <lvivier@redhat.com>
To: Nathan Fontenot <nfont@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Cc: danielhb@linux.vnet.ibm.com
Subject: Re: [PATCH] powerpc/pseries: Check memory device state before onlining/offlining
Date: Thu, 10 Aug 2017 18:27:49 +0200	[thread overview]
Message-ID: <e14fe546-9977-e299-cc11-11c5d9974ee1@redhat.com> (raw)
In-Reply-To: <20170802180322.8245.55722.stgit@ltcalpine2-lp14.aus.stglabs.ibm.com>

On 02/08/2017 20:03, Nathan Fontenot wrote:
> When DLPAR adding or removing memory we need to check the device
> offline status before trying to online/offline the memory. This is
> needed because calls device_online() and device_offline() will return
> non-zero for memory that is already online and offline respectively.
> 
> This update resolves two scenarios. First, for kernel built with
> auto-online memory enabled, memory will be onlined as part of calls
> to add_memory(). After adding the memory the pseries dlpar code tries
> to online it and fails since the memory is already online. The dlpar
> code then tries to remove the memory which produces the oops message
> below because the memory is not offline.
> 
> The second scenario occurs when removing memory that is already offline,
> i.e. marking memory offline (via sysfs) and the trying to remove that
> memory. This doesn't work because offlining the already offline memory
> does not succeed and the dlpar code then fails the dlpar remove operation.
> 
> The fix for both scenarios is to check the device.offline status before
> making the calls to device_online() or device_offline().
> 
> kernel BUG at mm/memory_hotplug.c:2189!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=2048
> NUMA
> pSeries
> CPU: 0 PID: 5 Comm: kworker/u129:0 Not tainted 4.12.0-rc3 #272
> Workqueue: pseries hotplug workque .pseries_hp_work_fn
> task: c0000003f9c89200 task.stack: c0000003f9d10000
> NIP: c0000000002ca428 LR: c0000000002ca3cc CTR: c000000000ba16a0
> REGS: c0000003f9d13630 TRAP: 0700   Not tainted  (4.12.0-rc3)
> MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>
>   CR: 22002024  XER: 0000000a
> CFAR: c0000000002ca3d0 SOFTE: 1
> GPR00: c0000000002ca3cc c0000003f9d138b0 c000000001bb0200 0000000000000001
> GPR04: c0000003fb143c80 c0000003fef21630 0000000000000003 0000000000000002
> GPR08: 0000000000000003 0000000000000003 0000000000000003 00000000000031b1
> GPR12: 0000000028002042 c00000000fd80000 c000000000118ae0 c0000003fb170180
> GPR16: 0000000000000000 0000000000000004 0000000000000010 c0000003ffff79c8
> GPR20: c0000003ffff7b68 c0000003f728ff84 0000000000000002 0000000000000010
> GPR24: 0000000000000002 c0000003f728ff80 0000000000000002 0000000000000001
> GPR28: c0000003fb143c38 0000000000000002 0000000010000000 0000000020000000
> NIP [c0000000002ca428] .remove_memory+0xb8/0xc0
> LR [c0000000002ca3cc] .remove_memory+0x5c/0xc0
> Call Trace:
> [c0000003f9d138b0] [c0000000002ca3cc] .remove_memory+0x5c/0xc0 (unreliable)
> [c0000003f9d13940] [c0000000000938a4] .dlpar_add_lmb+0x384/0x400
> [c0000003f9d13a30] [c00000000009456c] .dlpar_memory+0x5dc/0xca0
> [c0000003f9d13af0] [c00000000008ce84] .handle_dlpar_errorlog+0x74/0xe0
> [c0000003f9d13b70] [c00000000008cf1c] .pseries_hp_work_fn+0x2c/0x90
> [c0000003f9d13bf0] [c000000000110a5c] .process_one_work+0x17c/0x460
> [c0000003f9d13c90] [c000000000110dc8] .worker_thread+0x88/0x500
> [c0000003f9d13d70] [c000000000118c3c] .kthread+0x15c/0x1a0
> [c0000003f9d13e30] [c00000000000ba18] .ret_from_kernel_thread+0x58/0xc0
> Instruction dump:
> 7fe3fb78 4bd7c845 60000000 7fa3eb78 4bfdd3c9 38210090 e8010010 eba1ffe8
> ebc1fff0 ebe1fff8 7c0803a6 4bfdc2ac <0fe00000> 00000000 7c0802a6 fb01ffc0
> 
> Fixes: 943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

tested the first scenario with 4.13.0-rc4 and qemu 2.10.0-rc2.

Tested-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>

  reply	other threads:[~2017-08-10 16:27 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-02 18:03 [PATCH] powerpc/pseries: Check memory device state before onlining/offlining Nathan Fontenot
2017-08-10 16:27 ` Laurent Vivier [this message]
2017-08-11 12:19 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e14fe546-9977-e299-cc11-11c5d9974ee1@redhat.com \
    --to=lvivier@redhat.com \
    --cc=danielhb@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=nfont@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.