All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Abel Wu <wuyun.abel@bytedance.com>
Cc: Gang Li <ligang.bdlg@bytedance.com>,
	akpm@linux-foundation.org, surenb@google.com, hca@linux.ibm.com,
	gor@linux.ibm.com, agordeev@linux.ibm.com,
	borntraeger@linux.ibm.com, svens@linux.ibm.com,
	viro@zeniv.linux.org.uk, ebiederm@xmission.com,
	keescook@chromium.org, rostedt@goodmis.org, mingo@redhat.com,
	peterz@infradead.org, acme@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	namhyung@kernel.org, david@redhat.com, imbrenda@linux.ibm.com,
	adobriyan@gmail.com, yang.yang29@zte.com.cn, brauner@kernel.org,
	stephen.s.brennan@oracle.com, zhengqi.arch@bytedance.com,
	haolee.swjtu@gmail.com, xu.xin16@zte.com.cn,
	Liam.Howlett@oracle.com, ohoono.kwon@samsung.com,
	peterx@redhat.com, arnd@arndb.de, shy828301@gmail.com,
	alex.sierra@amd.com, xianting.tian@linux.alibaba.com,
	willy@infradead.org, ccross@google.com, vbabka@suse.cz,
	sujiaxun@uniontech.com, sfr@canb.auug.org.au,
	vasily.averin@linux.dev, mgorman@suse.de, vvghjk1234@gmail.com,
	tglx@linutronix.de, luto@kernel.org, bigeasy@linutronix.de,
	fenghua.yu@intel.com, linux-s390@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-perf-users@vger.kernel.org,
	hezhongkun.hzk@bytedance.com
Subject: Re: [PATCH v2 0/5] mm, oom: Introduce per numa node oom for CONSTRAINT_{MEMORY_POLICY,CPUSET}
Date: Mon, 18 Jul 2022 14:11:44 +0200	[thread overview]
Message-ID: <YtVOAGga+B3CmFKC@dhcp22.suse.cz> (raw)
In-Reply-To: <6f6a2257-3b60-e312-3ee3-fb08b972dbf2@bytedance.com>

On Tue 12-07-22 23:00:55, Abel Wu wrote:
> 
> On 7/12/22 9:35 PM, Michal Hocko Wrote:
> > On Tue 12-07-22 19:12:18, Abel Wu wrote:
> > [...]
> > > I was just going through the mail list and happen to see this. There
> > > is another usecase for us about per-numa memory usage.
> > > 
> > > Say we have several important latency-critical services sitting inside
> > > different NUMA nodes without intersection. The need for memory of these
> > > LC services varies, so the free memory of each node is also different.
> > > Then we launch several background containers without cpuset constrains
> > > to eat the left resources. Now the problem is that there doesn't seem
> > > like a proper memory policy available to balance the usage between the
> > > nodes, which could lead to memory-heavy LC services suffer from high
> > > memory pressure and fails to meet the SLOs.
> > 
> > I do agree that cpusets would be rather clumsy if usable at all in a
> > scenario when you are trying to mix NUMA bound workloads with those
> > that do not have any NUMA proferences. Could you be more specific about
> > requirements here though?
> 
> Yes, these LC services are highly sensitive to memory access latency
> and bandwidth, so they are provisioned by NUMA node granule to meet
> their performance requirements. While on the other hand, they usually
> do not make full use of cpu/mem resources which increases the TCO of
> our IDCs, so we have to co-locate them with background tasks.
> 
> Some of these LC services are memory-bound but leave much of cpu's
> capacity unused. In this case we hope the co-located background tasks
> to consume some leftover without introducing obvious mm overhead to
> the LC services.

This are some tough requirements and I am afraid far from any typical
usage. So I believe that you need a careful tunning much more than a
policy which I really have hard time to imagine wrt semantic TBH.
 
> > Let's say you run those latency critical services with "simple" memory
> > policies and mix them with the other workload without any policies in
> > place so they compete over memory. It is not really clear to me how can
> > you achieve any reasonable QoS in such an environment. Your latency
> > critical servises will be more constrained than the non-critical ones
> > yet they are more demanding AFAIU.
> 
> Yes, the QoS over memory is the biggest block in the way (the other
> resources are relatively easier). For now, we hacked a new mpol to
> achieve weighted-interleave behavior to balance the memory usage across
> NUMA nodes, and only set memcg protections to the LC services. If the
> memory pressure is still high, the background tasks will be killed.
> Ideas? Thanks!

It is not really clear what the new memory policy does and what is the
semantic of it from your description. Memory protection (via memcg) of
your sensitive workload makes sense but it would require proper setting
of background jobs as well. As soon as you hit the global direct reclaim
then the memory protection won't safe your sensitve workload.

-- 
Michal Hocko
SUSE Labs

      reply	other threads:[~2022-07-18 12:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08  8:21 [PATCH v2 0/5] mm, oom: Introduce per numa node oom for CONSTRAINT_{MEMORY_POLICY,CPUSET} Gang Li
2022-07-08  8:21 ` [PATCH v2 1/5] mm: add a new parameter `node` to `get/add/inc/dec_mm_counter` Gang Li
2022-07-12  6:33   ` [mm] c20f7bacef: WARNING:possible_circular_locking_dependency_detected kernel test robot
2022-07-12  6:33     ` kernel test robot
2022-07-08  8:21 ` [PATCH v2 2/5] mm: add numa_count field for rss_stat Gang Li
2022-07-08 12:22   ` kernel test robot
2022-07-08  8:21 ` [PATCH v2 3/5] mm: add numa fields for tracepoint rss_stat Gang Li
2022-07-08 17:31   ` Steven Rostedt
2022-07-08  8:21 ` [PATCH v2 4/5] mm: enable per numa node rss_stat count Gang Li
2022-07-08  8:21 ` [PATCH v2 5/5] mm, oom: enable per numa node oom for CONSTRAINT_{MEMORY_POLICY,CPUSET} Gang Li
2022-07-08  8:54 ` [PATCH v2 0/5] mm, oom: Introduce " Michal Hocko
2022-07-08  9:25   ` Gang Li
2022-07-08  9:37     ` Michal Hocko
2022-07-12 11:12   ` Abel Wu
2022-07-12 13:35     ` Michal Hocko
2022-07-12 15:00       ` Abel Wu
2022-07-18 12:11         ` Michal Hocko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YtVOAGga+B3CmFKC@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=acme@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=bigeasy@linutronix.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=brauner@kernel.org \
    --cc=ccross@google.com \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=fenghua.yu@intel.com \
    --cc=gor@linux.ibm.com \
    --cc=haolee.swjtu@gmail.com \
    --cc=hca@linux.ibm.com \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jolsa@kernel.org \
    --cc=keescook@chromium.org \
    --cc=ligang.bdlg@bytedance.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=ohoono.kwon@samsung.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sfr@canb.auug.org.au \
    --cc=shy828301@gmail.com \
    --cc=stephen.s.brennan@oracle.com \
    --cc=sujiaxun@uniontech.com \
    --cc=surenb@google.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vvghjk1234@gmail.com \
    --cc=willy@infradead.org \
    --cc=wuyun.abel@bytedance.com \
    --cc=xianting.tian@linux.alibaba.com \
    --cc=xu.xin16@zte.com.cn \
    --cc=yang.yang29@zte.com.cn \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.