From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7EC0C433E6 for ; Tue, 21 Jul 2020 15:50:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 95A552073A for ; Tue, 21 Jul 2020 15:50:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 95A552073A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D5CC16B0005; Tue, 21 Jul 2020 11:50:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE5326B0006; Tue, 21 Jul 2020 11:50:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAC796B0007; Tue, 21 Jul 2020 11:50:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id 9E6906B0005 for ; Tue, 21 Jul 2020 11:50:40 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E75F01801875E for ; Tue, 21 Jul 2020 15:50:39 +0000 (UTC) X-FDA: 77062520598.05.mark65_190b8e026f2e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id D1E6F181B9497 for ; Tue, 21 Jul 2020 15:49:42 +0000 (UTC) X-HE-Tag: mark65_190b8e026f2e X-Filterd-Recvd-Size: 4194 Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Tue, 21 Jul 2020 15:49:42 +0000 (UTC) Received: by mail-wr1-f68.google.com with SMTP id f2so21657846wrp.7 for ; Tue, 21 Jul 2020 08:49:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Xtdmt+rnuzEMChFrWyb/dtdXZAk0jWQrZ/8hX8rAe9g=; b=ev4qpyRalmyV1KCIzac76ColW3fOlq7UfzESvBopKpqZVBqWi2XePrIEebO0+IDxuw YxkmJ9FOpO0HaD1kQN0wyjsCHJ2fy7oIlsrUhAqgBGrn4f4+dKbyHu3P2hyNpFeuZ3GU NjT98jBtly81wl/WO9GSN07rIShrQMh59pz086FH+eYVfXsbbrChqWf65BbV3DhtHeKO Jw6Yghv86sB67ehfFjLtITW0xQrYcl5TB6cKEYuZiOXXmSheA3WMP9tra7Sp/aWF1fOp 3RrrlfVBaHdbBxjEjG7hIBrIEfTZ3VTUFmxJ/5lLl0zsu1LmaXXQ0REhvHs+M1+zfNoo ngxg== X-Gm-Message-State: AOAM531bIIqfm+inmXa6Wlj+Wok0nw/8RhhZhva7SU8RWx/I0E5LBzlc qW1gLWC2MUZU560j0K3HHW+8aZtc X-Google-Smtp-Source: ABdhPJxy4F2wodkJo49QxNFn3BgSfG8D2cN6gDXk89JOUVRjjB4/hpsD+Uemj9tul5q4R9wlsOVHFg== X-Received: by 2002:a5d:6a8d:: with SMTP id s13mr19257556wru.201.1595346581091; Tue, 21 Jul 2020 08:49:41 -0700 (PDT) Received: from localhost (ip-37-188-169-187.eurotel.cz. [37.188.169.187]) by smtp.gmail.com with ESMTPSA id 138sm4050398wmb.1.2020.07.21.08.49.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jul 2020 08:49:40 -0700 (PDT) Date: Tue, 21 Jul 2020 17:49:39 +0200 From: Michal Hocko To: Linus Torvalds Cc: Linux-MM , LKML , Andrew Morton , Tim Chen Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page Message-ID: <20200721154939.GO4061@dhcp22.suse.cz> References: <20200721063258.17140-1-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: D1E6F181B9497 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 21-07-20 08:33:33, Linus Torvalds wrote: > On Mon, Jul 20, 2020 at 11:33 PM Michal Hocko wrote: > > > > The lockup is in page_unlock in do_read_fault and I suspect that this is > > yet another effect of a very long waitqueue chain which has been > > addresses by 11a19c7b099f ("sched/wait: Introduce wakeup boomark in > > wake_up_page_bit") previously. > > Hmm. > > I do not believe that you can actually get to the point where you have > a million waiters and it takes 20+ seconds to wake everybody up. I was really suprised as well! > More likely, it's actually *caused* by that commit 11a19c7b099f, and > what might be happening is that other CPU's are just adding new > waiters to the list *while* we're waking things up, because somebody > else already got the page lock again. > > Humor me.. Does something like this work instead? It's > whitespace-damaged because of just a cut-and-paste, but it's entirely > untested, and I haven't really thought about any memory ordering > issues, but I think it's ok. > > The logic is that anybody who called wake_up_page_bit() _must_ have > cleared that bit before that. So if we ever see it set again (and > memory ordering doesn't matter), then clearly somebody else got access > to the page bit (whichever it was), and we should not > > (a) waste time waking up people who can't get the bit anyway > > (b) be in a livelock where other CPU's continually add themselves to > the wait queue because somebody else got the bit. > > and it's that (b) case that I think happens for you. > > NOTE! Totally UNTESTED patch follows. I think it's good, but maybe > somebody sees some problem with this approach? I can ask them to give it a try. -- Michal Hocko SUSE Labs