Re: [PATCH v4] pseries/drmem: update LMBs after LPM

From: Laurent Dufour <ldufour@linux.ibm.com>
To: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Tyrel Datwyler <tyreld@linux.ibm.com>,
	linux-kernel@vger.kernel.org, paulus@samba.org,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v4] pseries/drmem: update LMBs after LPM
Date: Wed, 5 May 2021 16:39:23 +0200	[thread overview]
Message-ID: <8cebddd7-5b06-ba93-3cc7-9cdab57db491@linux.ibm.com> (raw)
In-Reply-To: <87bl9qf7xk.fsf@linux.ibm.com>

Le 05/05/2021 à 00:30, Nathan Lynch a écrit :
> Hi Laurent,

Hi Nathan,

Thanks for your review.

> Bear with me while I work through the commit message:
> 
> Laurent Dufour <ldufour@linux.ibm.com> writes:
>> After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
>> updated by the hypervisor in the case the NUMA topology of the LPAR's
>> memory is updated.
> 
> Yes, the RTAS functions ibm,update-nodes and ibm,update-properties,
> which the OS invokes after resuming, may bring in updated properties
> under the ibm,dynamic-reconfiguration-memory node, including the
> ibm,associativity-lookup-arrays property.
> 
>> This is caught by the kernel,
> 
> "Caught" makes me think this is an error condition, as in catching an
> exception. I guess "handled" better conveys your meaning?

ok

> 
>> but the memory's node is updated because
>> there is no way to move a memory block between nodes.
> 
> "The memory's node" refers the ibm,dynamic-reconfiguration-memory DT
> node, yes? Or is it referring to Linux's NUMA nodes? ("move a memory
> block between nodes" in your statement here refers to Linux's NUMA
> nodes, that much is clear to me.)
> 
> I am failing to follow the cause->effect relationship stated. True,
> changing a block's node assignment while it's in use isn't safe. I don't
> see why that implies that "the memory's node is updated"? In fact this
> seems contradictory.
> 
> This statement makes more sense to me if I change it to "the memory's
> node is _not_ updated" -- is this what you intended?

Correct, I dropped the 'not' word here ;)

> 
>> If later a memory block is added or removed, drmem_update_dt() is called
>> and it is overwriting the DT node to match the added or removed LMB.
> 
> I understand this, but I will expand on it.
> 
> dlpar_memory()
>    -> dlpar_memory_add_by_count()
>      -> dlpar_add_lmb()
>        -> update_lmb_associativity_index()
>          ... lmb->aa_index = <value>
>    -> drmem_update_dt()
> 
> update_lmb_associativity_index() retrieves the firmware description of
> the new block, and sets the aa_index of the matching entry in the
> drmem_info array to the value matching the firmware description.
> 
> Then, drmem_update_dt() walks the drmem_info array and synthesizes a new
> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory-v2 property based
> on the recently updated information in that array.

Yes

> 
>> But the LMB's associativity node has not been updated after the DT
>> node update and thus the node is overwritten by the Linux's topology
>> instead of the hypervisor one.
> 
> So, an example of the problem is:
> 
> 1. VM migrates. On resume, ibm,associativity-lookup-arrays is changed
>     via ibm,update-properties. Entries in the drmem_info array remain
>     unchanged, with aa_index values that correspond to the source
>     system's ibm,associativity-lookup-arrays property, now inaccessible.
> 
> 2. A memory block is added. We look up the new block's entry in the
>     drmem_info array, and set the aa_index to the value matching the
>     current ibm,associativity-lookup-arrays.
> 
> 3. Then, the ibm,associativity-lookup-arrays property is completely
>     regenerated from the drmem_info array, which reflects a mixture of
>     information from the source and destination systems.
> 
> Do I understand correctly?

Yes

> 
> 
>> Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
>> updated to force an update of the LMB's associativity. However, ignore the
>> call to that hook when the update has been triggered by drmem_update_dt().
>> Because, in that case, the LMB tree has been used to set the DT property
>> and thus it doesn't need to be updated back. Since drmem_update_dt() is
>> called under the protection of the device_hotplug_lock and the hook is
>> called in the same context, use a simple boolean variable to detect that
>> call.
> 
> This strikes me as almost a revert of e978a3ccaa71 ("powerpc/pseries:
> remove obsolete memory hotplug DT notifier code").

Not really identical to reverting e978a3ccaa71, here only the aa_index of the 
LMB is updated, everything else is kept in place. I don't try to apply the 
memory layout's changes, just updating the in use LMB's aa_index field.

The only matching point with the code reverted by the commit you mentioned would 
be the use of a global variable in_drmem_update instead of the previous 
rtas_hp_event to prevent the LMB tree to be updated again during memory hot plug 
event.

> I'd rather avoid smuggling through global state information that ought
> to be passed in function parameters, if it should be passed around at
> all. Despite having (IMO) relatively simple responsibilities, this code
> is difficult to change and review; adding this property makes it
> worse. If the structure of the code is pushing us toward this kind of
> compromise, then the code probably needs more fundamental changes.
> 
> I'm probably forgetting something -- can anyone remind me why we need an
> array of these:
> 
> struct drmem_lmb {
> 	u64     base_addr;
> 	u32     drc_index;
> 	u32     aa_index;
> 	u32     flags;
> };
> 
> which is just a less efficient representation of what's already in the
> device tree? If we got rid of it, would this problem disappear?

I don't think this is right for the moment, at first, we should robustify the 
DLPAR and LPM operations.