From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC8BBC33C99 for ; Tue, 7 Jan 2020 08:23:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AD2152077B for ; Tue, 7 Jan 2020 08:23:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=colorremedies-com.20150623.gappssmtp.com header.i=@colorremedies-com.20150623.gappssmtp.com header.b="k92fYNra" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD2152077B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=colorremedies.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4D5518E001A; Tue, 7 Jan 2020 03:23:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 486E18E0001; Tue, 7 Jan 2020 03:23:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 375618E001A; Tue, 7 Jan 2020 03:23:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0097.hostedemail.com [216.40.44.97]) by kanga.kvack.org (Postfix) with ESMTP id 23DC68E0001 for ; Tue, 7 Jan 2020 03:23:57 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id CA2AD52B3 for ; Tue, 7 Jan 2020 08:23:56 +0000 (UTC) X-FDA: 76350150072.24.force93_3f77ef7b6c919 X-HE-Tag: force93_3f77ef7b6c919 X-Filterd-Recvd-Size: 7827 Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Tue, 7 Jan 2020 08:23:56 +0000 (UTC) Received: by mail-wr1-f66.google.com with SMTP id b6so52911200wrq.0 for ; Tue, 07 Jan 2020 00:23:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colorremedies-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=M6mEj8hlQzGKALdLErXEuwCuykGhOOxyxJOLmeQvbbY=; b=k92fYNraJxRgt4vvi7yOiAJZKjjQHOFJzFDfNj35+SqnMGu1vPlT1GrSF1P6sOG6Gh WZ41viNKYlvctoeeGOzVsmRVSCuS/fgxHn3sB12Hew64PdbkIF7tOPCAeBwS9yn80zQs zkSaosucNjHij4fp/5FlOJKgzKZ8hBcJhC93UbDd3y4ejAJtKIjvMsK0oFPjRsndSaV3 KDfTLjrP2hb/XfjrD2FqHI6v6tCeTeYXRNBYM2IB74ineSCOHXx3lejkD4nbthCkHNgH dXNsi+F8hInzFAsnHj5MBBVf9LqaGwmFc8IoP+DjOoMtt46bpN3wBAMU314pTQ7pOtXh jifw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=M6mEj8hlQzGKALdLErXEuwCuykGhOOxyxJOLmeQvbbY=; b=lP7vaGtwaDdqbKrj2dvOU5cQjmetixewoa5+im07/JbbufeTDd66cQSgmzmHrUT+Rm WbKEy4J4Dpkp7Y18bwMH90SYxu47Np8IjVe0QuHhGumJQPJjjGqEXGtNNerNKlOBvWoQ 6klC1xMWX6z3oOBTMdHqHbPZ2+VFMynUXWDNG8+dnvM98MT7HPBYSN8Y8RUDR3zvOsep qRO4P9TZj1OD0vEsXPFRoymLcFV7Mnwck7u8vILnFkeffL/QQCtC980+31iu6hwvKYX/ jovHhUlJbEHunodoNtMm7mHhyH9i4hCiFoucvroOumzPlPwlRd1t/amrq2Uc9hIyDaB8 +YSw== X-Gm-Message-State: APjAAAVu8hWG0GD/GmJxoV/n2s+7hms4HgMv0xafwL38TPdemrZguXUF E0A/Ubm3hiXnGOuyViUfCkwJmHXxvqwRPSLGNxH40w== X-Google-Smtp-Source: APXvYqy6WLx8JbEgE35DitDA0cNP7PmoEir6VdIMCuxyKoEAKiTYC2eWm5gJUqeYFD4NnXh14wQ6+9J7joUdHamrK00= X-Received: by 2002:adf:e6d2:: with SMTP id y18mr111400099wrm.262.1578385434699; Tue, 07 Jan 2020 00:23:54 -0800 (PST) MIME-Version: 1.0 References: <20191231125908.GD6788@bombadil.infradead.org> <20200106115514.GG12699@dhcp22.suse.cz> <20200106232100.GL23195@dread.disaster.area> In-Reply-To: <20200106232100.GL23195@dread.disaster.area> From: Chris Murphy Date: Tue, 7 Jan 2020 01:23:38 -0700 Message-ID: Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Congestion To: Dave Chinner Cc: Michal Hocko , Matthew Wilcox , lsf-pc@lists.linux-foundation.org, Linux FS Devel , linux-mm@kvack.org, Mel Gorman Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 6, 2020 at 4:21 PM Dave Chinner wrote: > > On Mon, Jan 06, 2020 at 12:55:14PM +0100, Michal Hocko wrote: > > On Tue 31-12-19 04:59:08, Matthew Wilcox wrote: > > > > > > I don't want to present this topic; I merely noticed the problem. > > > I nominate Jens Axboe and Michael Hocko as session leaders. See the > > > thread here: > > > > Thanks for bringing this up Matthew! The change in the behavior came as > > a surprise to me. I can lead the session for the MM side. > > > > > https://lore.kernel.org/linux-mm/20190923111900.GH15392@bombadil.infradead.org/ > > > > > > Summary: Congestion is broken and has been for years, and everybody's > > > system is sleeping waiting for congestion that will never clear. > > > > > > A good outcome for this meeting would be: > > > > > > - MM defines what information they want from the block stack. > > > > The history of the congestion waiting is kinda hairy but I will try to > > summarize expectations we used to have and we can discuss how much of > > that has been real and what followed up as a cargo cult. Maybe we just > > find out that we do not need functionality like that anymore. I believe > > Mel would be a great contributor to the discussion. > > We most definitely do need some form of reclaim throttling based on > IO congestion, because it is trivial to drive the system into swap > storms and OOM killer invocation when there are large dirty slab > caches that require IO to make reclaim progress and there's little > in the way of page cache to reclaim. > > This is one of the biggest issues I've come across trying to make > XFS inode reclaim non-blocking - the existing code blocks on inode > writeback IO congestion to throttle the overall reclaim rate and > so prevents swap storms and OOM killer rampages from occurring. > > The moment I remove the inode writeback blocking from the reclaim > path and move the backoffs to the core reclaim congestion backoff > algorithms, I see a sustantial increase in the typical reclaim scan > priority. This is because the reclaim code does not have an > integrated back-off mechanism that can balance reclaim throttling > between slab cache and page cache reclaim. This results in > insufficient page reclaim backoff under slab cache backoff > conditions, leading to excessive page cache reclaim and swapping out > all the anonymous pages in memory. Then performance goes to hell as > userspace then starts to block on page faults swap thrashing like > this: This really caught my attention, however unrelated it may actually be. The gist of my question is: what are distributions doing wrong, that it's possible for an unprivileged process to take down a system such that an ordinary user reaches for the power button? [1] More helpful would be, what should distributions be doing better to avoid the problem in the first place? User space oom daemons are now popular, and there's talk about avoiding swap thrashing and oom by strict use of cgroupsv2 and PSI. Some people say, oh yeah duh, just don't make a swap device at all, what are you crazy? Then there's swap on ZRAM. And alas zswap too. So what's actually recommended to help with this problem? I don't have many original thoughts, but I can't find a reference for why my brain is telling me the kernel oom-killer is mainly concerned about kernel survival in low memory situations, and not user space. But an approximate is "It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails." [2] However, a) failure has happened way before oom-killer is invoked, back when the GUI became unresponsive, and b) often it kills some small thing, seemingly freeing up just enough memory that the kernel is happy to stay in this state for indeterminate time. For my testing that's 30 minutes, but I'm compelled to defend a user who asserts a mere 15 second grace period before reaching for the power button. This isn't a common experience across a broad user population, but those who have experienced it once are really familiar with it (they haven't experienced it only once). And I really want to know what can be done to make the user experience better, but it's not clear to me how to do that. [1] Fedora 30/31 default installation, 8G RAM, 8G swap (on plain SSD partition), and compile webkitgtk. Within ~5 minutes all RAM is cosumed, and the "swap storm" begins. The GUI stutters, even the mouse pointer starts to gets choppy, and soon after it's pretty much locked up and for all practical purposes it's locked up. Most typical, it stays this way for 30+ minutes. Occasionally oom-killer kicks in and clobbers something. Sometimes it's one of the compile threads. And also occasionally it'll be something absurd like sshd, sssd, or systemd-journald - which really makes no sense at all. [2] https://linux-mm.org/OOM_Killer -- Chris Murphy