All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Axtens <dja@axtens.net>
To: Nathan Lynch <nathanl@linux.ibm.com>
Cc: tyreld@linux.ibm.com, cheloha@linux.ibm.com,
	ldufour@linux.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] powerpc/pseries/mobility: ignore ibm, platform-facilities updates
Date: Fri, 22 Oct 2021 23:21:49 +1100	[thread overview]
Message-ID: <87wnm5fdwi.fsf@dja-thinkpad.axtens.net> (raw)
In-Reply-To: <87o87iy3ji.fsf@linux.ibm.com>

Nathan Lynch <nathanl@linux.ibm.com> writes:

> Daniel Axtens <dja@axtens.net> writes:
>>> On VMs with NX encryption, compression, and/or RNG offload, these
>>> capabilities are described by nodes in the ibm,platform-facilities device
>>> tree hierarchy:
>>>
>>>   $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>>>   /sys/firmware/devicetree/base/ibm,platform-facilities/
>>>   ├── ibm,compression-v1
>>>   ├── ibm,random-v1
>>>   └── ibm,sym-encryption-v1
>>>
>>>   3 directories
>>>
>>> The acceleration functions that these nodes describe are not disrupted by
>>> live migration, not even temporarily.
>>>
>>> But the post-migration ibm,update-nodes sequence firmware always sends
>>> "delete" messages for this hierarchy, followed by an "add" directive to
>>> reconstruct it via ibm,configure-connector (log with debugging statements
>>> enabled in mobility.c):
>>>
>>>   mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
>>>   mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
>>>   mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
>>>   mobility: removing node /ibm,platform-facilities:4294967286
>>>   ...
>>>   mobility: added node /ibm,platform-facilities:4294967286
>>>
>>> Note we receive a single "add" message for the entire hierarchy, and what
>>> we receive from the ibm,configure-connector sequence is the top-level
>>> platform-facilities node along with its three children. The debug message
>>> simply reports the parent node and not the whole subtree.
>>
>> If I understand correctly, (and again, this is not my area at all!) we
>> still have to go out to the firmware and call the
>> ibm,configure-connector sequence in order to figure out that the node
>> we're supposed to add is the ibm,platform-facilites node, right? We
>> can't save enough information at delete time to avoid the trip out to
>> firmware?
>
> That is right... but maybe I don't understand your angle here. Unsure
> what avoiding the configure-connector sequence for the nodes would buy
> us.

It's not meant to be a tricky question, so the simple answer is probably
the right one. Just wondering if there was a marginal efficiency gain -
although I believe it's not really a hot path anyway.

>
>
>>> Until that can be realized we have a confirmed use-after-free and the
>>> possibility of memory corruption. So add a limited workaround that
>>> discriminates on the node type, ignoring adds and removes. This should be
>>> amenable to backporting in the meantime.
>>
>> Yeah it's an unpleasant situation to find ourselves in. It's a bit icky
>> but as I think you said in a previous email, at least this isn't worse:
>> in the common case it should now succeed and and if properties change
>> significantly it will still fail.
>>
>> My one question (from more of a security point of view) is:
>>  1) Say you start using the facilities with a particular set of
>>     parameters.
>>
>>  2) Say you then get migrated and the parameters change.
>>
>>  3) If you keep using the platform facilities as if the original
>>     properties are still valid, can this cause any Interesting,
>>     unexpected or otherwise Bad consequences? Are we going to end up
>>     (for example) scribbling over random memory somehow?
>
> If drivers are safely handling errors from H_COP_OP etc, then no. (I
> know, this looks like a Well That Would Be a Driver Bug dismissal, but
> that's not my attitude.) And again this is a case where the change
> cannot make things worse.
>
> In the current design of the pseries LPM implementation, user space and
> other normal system activity resume as soon as we return from the
> stop_machine() call which suspends the partition, executing concurrently
> with any device tree updates. So even if we had code in place to
> correctly resolve the DT changes and the drivers were able to respond to
> the changes, there would still be a window of exposure to the kind of
> problem you describe: the changed characteristics, if any, of the
> destination obtain as soon as execution resumes, regardless of when the
> OS initiates the update-nodes sequence.
>
> The way out of that mess is to use the Linux suspend framework, or
> otherwise prevent user space from executing until the destination
> system's characteristics have been appropriately propagated out to the
> necessary drivers etc. I'm trying to get there.

Fair enough. I do appreciate the perfect not being the enemy of the good
especially in areas of the codebase like this where there is scope to
improve things but there is also a lot of complexity that we cannot
really get away from because the underlying problem domain is itself
just plain complex. (I think EEH is the other obvious example in
arch/powerpc.)

>> Apart from that, the code seems to do what it says, it seems to solve a
>> real problem, the error and memory handling makes sense, you _put the DT
>> nodes that you _get, the comments are helpful and descriptive, and it
>> passes the automated tests on patchwork/snowpatch.
>
> I appreciate your review!

With those questions answered, and with the caveats above and noting my
complete inability to test the code:

Reviewed-by: Daniel Axtens <dja@axtens.net>

Kind regards,
Daniel

  reply	other threads:[~2021-10-22 12:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20 19:47 [PATCH v2] powerpc/pseries/mobility: ignore ibm, platform-facilities updates Nathan Lynch
2021-10-21  5:47 ` Daniel Axtens
2021-10-21 12:18   ` Nathan Lynch
2021-10-22 12:21     ` Daniel Axtens [this message]
2021-11-02 10:11 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnm5fdwi.fsf@dja-thinkpad.axtens.net \
    --to=dja@axtens.net \
    --cc=cheloha@linux.ibm.com \
    --cc=ldufour@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=nathanl@linux.ibm.com \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.