On 9/17/21 5:55 PM, Huang, Ying wrote:
>> @@ -3147,6 +3177,16 @@ static void __set_migration_target_nodes
>>  	int node;
>>  
>>  	/*
>> +	 * The "migration path" array is heavily optimized
>> +	 * for reads.  This is the write side which incurs a
>> +	 * very heavy synchronize_rcu().  Avoid this overhead
>> +	 * when nothing of consequence has changed since the
>> +	 * last write.
>> +	 */
>> +	if (!node_demotion_topo_changed())
>> +		return;
>> +
>> +	/*
>>  	 * Avoid any oddities like cycles that could occur
>>  	 * from changes in the topology.  This will leave
>>  	 * a momentary gap when migration is disabled.
> Now synchronize_rcu() is called in disable_all_migrate_targets(), which
> is called for MEM_GOING_OFFLINE.  Can we remove the synchronize_rcu()
> from disable_all_migrate_targets() and call it in
> __set_migration_target_nodes() before we update the node_demotion[]?

I see what you are saying.  This patch just targeted
__set_migration_target_nodes() which is called in for
MEM_ONLINE/OFFLINE.  But, it missed MEM_GOING_OFFLINE's call to
disable_all_migrate_targets().

I think I found something better than what I had in this patch, or the
tweak you suggested: The 'memory_notify->status_change_nid' field is
passed to all memory hotplug notifiers and tells us whether the node is
going online/offline.  Instead of trying to track the changes, I think
we can simply rely on it to tell us when a node is going online/offline.

This removes the need for the demotion code to track *any* state.  I've
attached a totally untested patch to do this.