linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Juliet Kim <julietk@linux.vnet.ibm.com>
To: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, mmc@linux.vnet.ibm.com, mwb@linux.ibm.com
Subject: Re: [PATCH] powerpc/rtas: Fix hang in race against concurrent cpu offline
Date: Wed, 26 Jun 2019 16:49:04 -0500	[thread overview]
Message-ID: <c99fb125-ba6d-66aa-d963-83e854bc0eb7@linux.vnet.ibm.com> (raw)
In-Reply-To: <87a7e5tvyb.fsf@linux.ibm.com>

On 6/25/19 12:29 PM, Nathan Lynch wrote:
> Juliet Kim <julietk@linux.vnet.ibm.com> writes:
>> The commit
>> (“powerpc/rtas: Fix a potential race between CPU-Offline & Migration)
>> attempted to fix a hang in Live Partition Mobility(LPM) by abandoning
>> the LPM attempt if a race between LPM and concurrent CPU offline was
>> detected.
>>
>> However, that fix failed to notify Hypervisor that the LPM attempted
>> had been abandoned which results in a system hang.
> It is surprising to me that leaving a migration unterminated would cause
> Linux to hang. Can you explain more about how that happens?
>
PHYP will block further requests(next partition migration, dlpar etc) while
it's in suspending state. That would have a follow-on effect on the HMC and
potentially this and other partitions.
>> Fix this by sending a signal PHYP to cancel the migration, so that PHYP
>> can stop waiting, and clean up the migration.
> This is well-spotted and rtas_ibm_suspend_me() needs to signal
> cancellation in several error paths. But I don't agree that this is one
> of them: this race is going to be a temporary condition in any
> production setting, and retrying would allow the migration to succeed.
If LPM and CPU offine requests conflict with one another, it might be better
to let them fail and let the customer decide which he prefers. IBM i cancels
migration if the other OS components/operations veto migration. It’s consistent
with other OS behavior for LPM. I think all the IBM products should have a
consistent customer experience. Even if the race can be temporary, it still
could happen and can cause livelock.

  reply	other threads:[~2019-06-26 21:51 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-24 23:48 [PATCH] powerpc/rtas: Fix hang in race against concurrent cpu offline Juliet Kim
2019-06-25 17:29 ` Nathan Lynch
2019-06-26 21:49   ` Juliet Kim [this message]
2019-06-26 23:51     ` Nathan Lynch
2019-06-28 20:03       ` Juliet Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c99fb125-ba6d-66aa-d963-83e854bc0eb7@linux.vnet.ibm.com \
    --to=julietk@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mmc@linux.vnet.ibm.com \
    --cc=mwb@linux.ibm.com \
    --cc=nathanl@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).