From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE5F5C4361B for ; Mon, 7 Dec 2020 08:08:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4B85520897 for ; Mon, 7 Dec 2020 08:08:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4B85520897 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 53D048D0002; Mon, 7 Dec 2020 03:08:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EC188D0001; Mon, 7 Dec 2020 03:08:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DB5C8D0002; Mon, 7 Dec 2020 03:08:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by kanga.kvack.org (Postfix) with ESMTP id 228FE8D0001 for ; Mon, 7 Dec 2020 03:08:58 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DE6231EE6 for ; Mon, 7 Dec 2020 08:08:57 +0000 (UTC) X-FDA: 77565760314.21.farm04_400cee4273dc Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id BA30E180442C3 for ; Mon, 7 Dec 2020 08:08:57 +0000 (UTC) X-HE-Tag: farm04_400cee4273dc X-Filterd-Recvd-Size: 5462 Received: from gentwo.org (gentwo.org [3.19.106.255]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 08:08:57 +0000 (UTC) Received: by gentwo.org (Postfix, from userid 1002) id B005D3F11E; Mon, 7 Dec 2020 08:08:56 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by gentwo.org (Postfix) with ESMTP id ADB0A3EF63; Mon, 7 Dec 2020 08:08:56 +0000 (UTC) Date: Mon, 7 Dec 2020 08:08:56 +0000 (UTC) From: Christoph Lameter X-X-Sender: cl@www.lameter.com To: Thomas Gleixner cc: Marcelo Tosatti , Matthew Wilcox , linux-mm@kvack.org, Andrew Morton , Alex Belits , Phil Auld , Frederic Weisbecker , Peter Zijlstra Subject: Re: [PATCH] mm: introduce sysctl file to flush per-cpu vmstat statistics In-Reply-To: <87v9djd1db.fsf@nanos.tec.linutronix.de> Message-ID: References: <20201117162805.GA274911@fuller.cnet> <20201117180356.GT29991@casper.infradead.org> <20201117202317.GA282679@fuller.cnet> <20201127154845.GA9100@fuller.cnet> <87h7p4dwus.fsf@nanos.tec.linutronix.de> <87v9djd1db.fsf@nanos.tec.linutronix.de> User-Agent: Alpine 2.22 (DEB 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 3 Dec 2020, Thomas Gleixner wrote: > >> The current CPU isolation is a best effort approach and I agree that for > >> more strict isolation modes we need to be able to enforce that and hunt > >> down offenders and think about them one by one. > > > > There are two apprahces actually to make the OS quiet. One is the best > > effort approach which is more like the current NOHZ one with additional > > actions to flush things. The other is the strict approach were one wants a > > guarantee that the OS does not do anything at all. > > And here the consensus stops again :) > > The point is that between the relaxed best effort / heuristics based > scenario and the 'user space task asks for absolute silence' scenario is > a huge difference: The two approaches are: 1. Enforce silence and abort if the application tries anything that jeopardizes that silence (f.e. a syscall that cannot directly complete, a major page fault etc). This mode needs to work like an on/off switch so that the application can exit the mode and do regular system calls. Only specially designed software will be able to use this mode since it is so restrictive and care needs to be taken to enable/disable this mode. 2. Silence now (dump all caches, correlate counters, all pending work finishes). This is a one shot approach. Anything later that causes counter increments, cache population etc etc may occur but will then cause additional latency required to re-enable the caches, statistics threads and so on and so on. The silence now function will be used when the app waits for an event that may occur shortly and the reaction time to that event needs to be as low latency as possible. F.e. The app may be in a complex polling loop that should not be interrupted. Once an event is detected syscalls etc could potentially occur (depending on the event). When the app goes back to the polling loop it will do another "silence now" call. > Is this really a black and white decision? These are two different modes of usage. > And as we know that there are quite some shades of grey, there is lots > of choice and we need to come up with solutions for delegating the > policy decision to the user/admin and not just provide a off/on knob. One of these choices is not an on-off knob. > Again: I fundamentaly disagree with the proposed task isolation patches > approach as they leave no choice at all. There are no degres of gray here. I dont understand why you have these concerns. > pattern of the application, e.g. > > 1 read_data_set() <- involving syscalls/OS obviously ??? You cannot use syscalls for high speed or low latency I/O!!! > 2 compute_set() <- let me alone > 3 save_data_set() <- involving syscalls/OS obviously Again saving data may not be possible through the kernel since syscalls may have too much overhead and latency. > repeat the above... There is a fundamental misunderstanding here. This is not primarily about compute but about I/O. In particular I/O that does not involve the kernel. RDMA or things like DPDK, SPDK or other low hardware level things. Typically a user space poll loop checks numerous memory locations related to this I/O or shared memory areas where other cpus interact with the thread that wants to be OS noise free. > Summary: The problem to be solved cannot be restricted to > > self_defined_important_task(OWN_WORLD); > > Policy is not a binary on/off problem. It's manifold across all levels > of the stack and only a kernel problem when it comes down to the last > line of defence. This a clearly defined set of functions and I am not sure how policy fits into that.