On 22 Jun 2021, at 8:06, Dave Hansen wrote: > Yan, your reply came through in HTML. It doesn't bother me too much, > but you'll find your replies dropped by LKML and other mailing lists > if you do this. Apologies. I used the wrong text mode. Thanks for letting me know. > > On 6/21/21 7:50 AM, Zi Yan wrote: >> Is there a plan of allowing user to change where the migration path >> starts? Or maybe one step further providing an interface to allow >> user to specify the demotion path. Something like >> /sys/devices/system/node/node*/node_demotion. > > We actually had this in an earlier series. I pulled it out because we > don't really *need* this ABI at the moment. But, I totally agree that > it would be handy for many things, including any non-obvious topology > where the built-in ordering isn't optimal. > >> I don't think that's necessary at least for now. Do you know any >> real world use case for this? >> >> In our P9+volta system, GPU memory is exposed as a NUMA node. For >> the GPU workloads with data size greater than GPU memory size, it >> will be very helpful to allow pages in GPU memory to be >> migrated/demoted to CPU memory. With your current assumption, GPU >> memory -> CPU memory demotion seems not possible, right? This >> should also apply to any system with a device memory exposed as a >> NUMA node and workloads running on the device and using CPU memory >> as a lower tier memory than the device memory. > > Yes, with the current ordering, CPU memory would be demoted to the > GPU, not the other way around. The right way to fix this (on ACPI > platforms at least) is probably to use the HMAT table and build the > demotion based on any memory targets rather than just CPUs. > > That would be a great future enhancement to all of this. But, because > not all systems have HMATs, we also need something more basic, which > is what is in this series. This information is very helpful. I agree that reading HMAT table is the right way. I will look into it. Thanks! — Best Regards, Yan, Zi