All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Tim Murray <timmurray@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure
Date: Thu,  3 Mar 2022 10:59:33 +0800	[thread overview]
Message-ID: <20220303025933.2265-1-hdanton@sina.com> (raw)
In-Reply-To: <CAEe=Sxmow-jx60cDjFMY7qi7+KVc+BT++BTdwC5+G9E=1soMmQ@mail.gmail.com>

On Tue, 22 Feb 2022 11:47:01 -0800 Tim Murray  wrote:
> On Mon, Feb 21, 2022 at 12:55 AM Michal Hocko <mhocko@suse.com> wrote:
> > It would be cool to have some numbers here.
> 
> Are there any numbers beyond what Suren mentioned that would be
> useful? As one example, in a trace of a camera workload that I opened
> at random to check for drain_local_pages stalls, I saw the kworker
> that ran drain_local_pages stay at runnable for 68ms before getting
> any CPU time. I could try to query our trace corpus to find more
> examples, but they're not hard to find in individual traces already.
> 
> > If the draining is too slow and dependent on the current CPU/WQ
> > contention then we should address that. The original intention was that
> > having a dedicated WQ with WQ_MEM_RECLAIM would help to isolate the
> > operation from the rest of WQ activity. Maybe we need to fine tune
> > mm_percpu_wq. If that doesn't help then we should revise the WQ model
> > and use something else. Memory reclaim shouldn't really get stuck behind
> > other unrelated work.
> 
> In my experience, workqueues are easy to misuse and should be
> approached with a lot of care. For many workloads, they work fine 99%+
> of the time, but once you run into problems with scheduling delays for
> that workqueue, the only option is to stop using workqueues. If you
> have work that is system-initiated with minimal latency requirements
> (eg, some driver heartbeat every so often, devfreq governors, things
> like that), workqueues are great. If you have userspace-initiated work
> that should respect priority (eg, GPU command buffer submission in the
> critical path of UI) or latency-critical system-initiated work (eg,
> display synchronization around panel refresh), workqueues are the
> wrong choice because there is no RT capability. WQ_HIGHPRI has a minor
> impact, but it won't solve the fundamental problem if the system is
> under heavy enough load or if RT threads are involved. As Petr
> mentioned, the best solution for those cases seems to be "convert the
> workqueue to an RT kthread_worker." I've done that many times on many
> different Android devices over the years for latency-critical work,
> especially around GPU, display, and camera.

Feel free to list the URLs to the latency-critical works as I want to
learn the reasons why workqueue failed to fit in the scenarios.
> 
> In the drain_local_pages case, I think it is triggered by userspace
> work and should respect priority; I don't think a prio 50 RT task
> should be blocked waiting on a prio 120 (or prio 100 if WQ_HIGHPRI)
> kworker to be scheduled so it can run drain_local_pages. If that's a
> reasonable claim, then I think moving drain_local_pages away from
> workqueues is the best choice.

A prio-50 direct reclaimer implies design failure in 99.1% products.


      parent reply	other threads:[~2022-03-03  2:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-19 17:49 [PATCH 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure Suren Baghdasaryan
2022-02-20  0:40 ` Minchan Kim
2022-02-20 16:52   ` Suren Baghdasaryan
2022-02-23 18:54     ` Johannes Weiner
2022-02-23 19:06       ` Suren Baghdasaryan
2022-02-23 19:42         ` Suren Baghdasaryan
2022-02-21  8:55 ` Michal Hocko
2022-02-21 10:41   ` Petr Mladek
2022-02-21 19:13     ` Suren Baghdasaryan
2022-02-21 19:09   ` Suren Baghdasaryan
2022-02-22 19:47   ` Tim Murray
2022-02-23  0:15     ` Suren Baghdasaryan
2022-03-03  2:59     ` Hillf Danton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220303025933.2265-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.