From: Michal Hocko <mhocko@suse.com> To: Hao Lee <haolee.swjtu@gmail.com> Cc: Matthew Wilcox <willy@infradead.org>, Linux MM <linux-mm@kvack.org>, Johannes Weiner <hannes@cmpxchg.org>, vdavydov.dev@gmail.com, Shakeel Butt <shakeelb@google.com>, cgroups@vger.kernel.org, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH] mm: reduce spinlock contention in release_pages() Date: Fri, 26 Nov 2021 11:46:29 +0100 [thread overview] Message-ID: <YaC7BcTSijFj+bxR@dhcp22.suse.cz> (raw) In-Reply-To: <CA+PpKP=hsuBmvv09OcD2Nct8B8Cqa03UfKFHAHzKxwE0SXGP4g@mail.gmail.com> On Fri 26-11-21 14:50:44, Hao Lee wrote: > On Thu, Nov 25, 2021 at 10:18 PM Michal Hocko <mhocko@suse.com> wrote: [...] > > Could you share more about requirements for those? Why is unmapping in > > any of their hot paths which really require low latencies? Because as > > long as unmapping requires a shared resource - like lru lock - then you > > have a bottle necks. > > We deploy best-effort (BE) jobs (e.g. bigdata, machine learning) and > latency-critical (LC) jobs (e.g. map navigation, payments services) on the > same servers to improve resource utilization. The running time of BE jobs are > very short, but its memory consumption is large, and these jobs will run > periodically. The LC jobs are long-run services and are sensitive to delays > because jitters may cause customer churn. Have you tried to isolate those workloads by memory cgroups? That could help for lru lock at least. You are likely going to hit other locks on the way though. E.g. zone lock in the page allocator but that might be less problematic in the end. If you isolate your long running services to a different NUMA node then you can get even less interaction. > If a batch of BE jobs are finished simultaneously, lots of memory are freed, > and spinlock contentions happen. BE jobs don't care about these contentions, > but contentions cause them to spend more time in kernel mode, and thus, LC > jobs running on the same cpu cores will be delayed and jitters occur. (The > kernel preemption is disabled on our servers, and we try not to separate > LC/BE using cpuset in order to achieve "complete mixture deployment"). Then > LC services people will complain about the poor service stability. This > scenario has occurred several times, so we want to find a way to avoid it. It will be hard and a constant fight to get reasonably low latencies on a non preemptible kernel. It would likely be better to partition CPUs between latency sensitive and BE jobs. I can see how that might not be really practical but especially with non-preemptible kernels you have a large space for priority inversions that is hard to forsee or contain. -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> To: Hao Lee <haolee.swjtu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Cc: Matthew Wilcox <willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> Subject: Re: [PATCH] mm: reduce spinlock contention in release_pages() Date: Fri, 26 Nov 2021 11:46:29 +0100 [thread overview] Message-ID: <YaC7BcTSijFj+bxR@dhcp22.suse.cz> (raw) In-Reply-To: <CA+PpKP=hsuBmvv09OcD2Nct8B8Cqa03UfKFHAHzKxwE0SXGP4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> On Fri 26-11-21 14:50:44, Hao Lee wrote: > On Thu, Nov 25, 2021 at 10:18 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote: [...] > > Could you share more about requirements for those? Why is unmapping in > > any of their hot paths which really require low latencies? Because as > > long as unmapping requires a shared resource - like lru lock - then you > > have a bottle necks. > > We deploy best-effort (BE) jobs (e.g. bigdata, machine learning) and > latency-critical (LC) jobs (e.g. map navigation, payments services) on the > same servers to improve resource utilization. The running time of BE jobs are > very short, but its memory consumption is large, and these jobs will run > periodically. The LC jobs are long-run services and are sensitive to delays > because jitters may cause customer churn. Have you tried to isolate those workloads by memory cgroups? That could help for lru lock at least. You are likely going to hit other locks on the way though. E.g. zone lock in the page allocator but that might be less problematic in the end. If you isolate your long running services to a different NUMA node then you can get even less interaction. > If a batch of BE jobs are finished simultaneously, lots of memory are freed, > and spinlock contentions happen. BE jobs don't care about these contentions, > but contentions cause them to spend more time in kernel mode, and thus, LC > jobs running on the same cpu cores will be delayed and jitters occur. (The > kernel preemption is disabled on our servers, and we try not to separate > LC/BE using cpuset in order to achieve "complete mixture deployment"). Then > LC services people will complain about the poor service stability. This > scenario has occurred several times, so we want to find a way to avoid it. It will be hard and a constant fight to get reasonably low latencies on a non preemptible kernel. It would likely be better to partition CPUs between latency sensitive and BE jobs. I can see how that might not be really practical but especially with non-preemptible kernels you have a large space for priority inversions that is hard to forsee or contain. -- Michal Hocko SUSE Labs
next prev parent reply other threads:[~2021-11-26 10:48 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-11-24 15:19 [PATCH] mm: reduce spinlock contention in release_pages() Hao Lee 2021-11-24 15:19 ` Hao Lee 2021-11-24 15:57 ` Matthew Wilcox 2021-11-24 15:57 ` Matthew Wilcox 2021-11-25 3:13 ` Hao Lee 2021-11-25 3:13 ` Hao Lee 2021-11-24 16:31 ` Michal Hocko 2021-11-24 16:31 ` Michal Hocko 2021-11-25 3:24 ` Hao Lee 2021-11-25 3:24 ` Hao Lee 2021-11-25 3:30 ` Matthew Wilcox 2021-11-25 3:30 ` Matthew Wilcox 2021-11-25 8:02 ` Hao Lee 2021-11-25 10:01 ` Michal Hocko 2021-11-25 12:31 ` Hao Lee 2021-11-25 14:18 ` Michal Hocko 2021-11-25 14:18 ` Michal Hocko 2021-11-26 6:50 ` Hao Lee 2021-11-26 6:50 ` Hao Lee 2021-11-26 10:46 ` Michal Hocko [this message] 2021-11-26 10:46 ` Michal Hocko 2021-11-26 16:26 ` Hao Lee 2021-11-26 16:26 ` Hao Lee 2021-11-29 8:39 ` Michal Hocko 2021-11-29 13:23 ` Matthew Wilcox 2021-11-29 13:23 ` Matthew Wilcox 2021-11-29 13:39 ` Michal Hocko 2021-11-29 13:39 ` Michal Hocko 2021-11-25 18:04 ` Matthew Wilcox 2021-11-25 18:04 ` Matthew Wilcox 2021-11-26 6:54 ` Hao Lee 2021-11-26 6:54 ` Hao Lee
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YaC7BcTSijFj+bxR@dhcp22.suse.cz \ --to=mhocko@suse.com \ --cc=cgroups@vger.kernel.org \ --cc=hannes@cmpxchg.org \ --cc=haolee.swjtu@gmail.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=shakeelb@google.com \ --cc=vdavydov.dev@gmail.com \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.