On Thu, Apr 7, 2022 at 2:26 PM Tim Chen <tim.c.chen@linux.intel.com> wrote:
On Wed, 2022-04-06 at 10:49 +0800, Huang, Ying wrote:
>
> > > If so,
> > >
> > > # echo A > memory.reclaim
> > >
> > > means
> > >
> > > a) "A" bytes memory are freed from the memcg, regardless demoting is
> > >    used or not.
> > >
> > > or
> > >
> > > b) "A" bytes memory are reclaimed from the memcg, some of them may be
> > >    freed, some of them may be just demoted from DRAM to PMEM.  The total
> > >    number is "A".
> > >
> > > For me, a) looks more reasonable.
> > >
> >
> > We can use a DEMOTE flag to control the demotion behavior for
> > memory.reclaim.  If the flag is not set (the default), then
> > no_demotion of scan_control can be set to 1, similar to
> > reclaim_pages().
>
> If we have to use a flag to control the behavior, I think it's better to
> have a separate interface (e.g. memory.demote).  But do we really need b)?
>
> > The question is then whether we want to rename memory.reclaim to
> > something more general.  I think this name is fine if reclaim-based
> > demotion is an accepted concept.
>

memory.demote will work for 2 level of memory tiers.  But when we have 3 level
of memory (e.g. high bandwidth memory, DRAM and PMEM),
it gets ambiguous again of wheter we sould demote from high bandwidth memory
or DRAM.

Will something like this be more general?

echo X > memory_[dram,pmem,hbm].reclaim

So echo X > memory_dram.reclaim
means that we want to free up X bytes from DRAM for the mem cgroup.

echo demote > memory_dram.reclaim_policy

This means that we prefer demotion for reclaim instead
of swapping to disk.


memory.demote can work with any level of memory tiers if a nodemask argument (or a tier argument if there is a more-explicitly defined, userspace visible tiering representation) is provided.  The semantics can be to demote X bytes from these nodes to their next tier.

memory_dram/memory_pmem assumes the hardware for a particular memory tier, which is undesirable.  For example, it is entirely possible that a slow memory tier is implemented by a lower-cost/lower-performance DDR device connected via CXL.mem, not by PMEM.  It is better for this interface to speak in either the NUMA node abstraction or a new tier abstraction.

It is also desirable to make this interface stateless, i.e. not to require the setting of memory_dram.reclaim_policy.  Any policy can be specified as arguments to the request itself and should only affect that particular request.

Wei