From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA1F2C433E0 for ; Tue, 5 Jan 2021 19:31:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5A69A22D6E for ; Tue, 5 Jan 2021 19:31:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5A69A22D6E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C60868D00A5; Tue, 5 Jan 2021 14:31:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C34B38D006E; Tue, 5 Jan 2021 14:31:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFDB28D00A5; Tue, 5 Jan 2021 14:31:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0109.hostedemail.com [216.40.44.109]) by kanga.kvack.org (Postfix) with ESMTP id 9888A8D006E for ; Tue, 5 Jan 2021 14:31:58 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 48E06181AEF1A for ; Tue, 5 Jan 2021 19:31:58 +0000 (UTC) X-FDA: 77672716716.07.trip82_280ec73274db Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 2866F1803F9B0 for ; Tue, 5 Jan 2021 19:31:58 +0000 (UTC) X-HE-Tag: trip82_280ec73274db X-Filterd-Recvd-Size: 6414 Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 5 Jan 2021 19:31:57 +0000 (UTC) Received: by mail-lf1-f41.google.com with SMTP id y19so998314lfa.13 for ; Tue, 05 Jan 2021 11:31:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=z1FwgA8Ww/7IgT5bjT946IYpX2R2TC69Q6b+o2dVbrM=; b=hA6nC4lKLxKrUs7MpCZCbI/RF0tI9dTZsLbOuov9YMyi2/eXblwkza4XzkyzrEciUj crhUOeNHz05e8YTS0MYujgw/1O+8VFW3/DqDYUINeFKBBLNL9kILdj3grIay+vTfxBP1 dk/siDfsguJmdFkH8uAtals9Ndiz3nqLvVNcw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=z1FwgA8Ww/7IgT5bjT946IYpX2R2TC69Q6b+o2dVbrM=; b=XdWX4ahi8WFpGFMep+SDymNvsbmmzKN7HHq/aUyp9tI39hNHee7lEIC6mK3rrf7Mr2 aIWWFTRF67gsF7bbJ0JrKdtE5CsenZWPYzToSEKQtGi9+IQQceEWmScqp86Qlt3gDOmI +ubCjN9+EN5vSQIeRbe7P+SE08eNX6vKPbxwx8ZoZ2Xj1WtrQ+bN3egWvr+A03gh1mNv 7igqaeReHvWbfjVZkaphkWEMl9S+/4H7MnOpFe03X55IIU5fGo4UuOCX020jdKErI2xJ uxHkLMrt93WoSs38x2EWIFLxMTQKvnGCr0yUU9seJd3y1JagtDdMMun9i089Ihi+0HQz PhIg== X-Gm-Message-State: AOAM530pefWW3FXwwihm21uA/jga89IWpCVDI0Rsh0Ezg2M174tE122C mBIR9+QiHj4hIPTfwPykw4a9BLo2zByC/w== X-Google-Smtp-Source: ABdhPJzHr7jVHWskcoT77HYEb4plgIS0/UXpKJGwLRGc0WGyhHqd/hlG3lHYM8Flk3QtcjQG0ocrnw== X-Received: by 2002:a2e:8512:: with SMTP id j18mr502230lji.31.1609875115480; Tue, 05 Jan 2021 11:31:55 -0800 (PST) Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com. [209.85.167.47]) by smtp.gmail.com with ESMTPSA id c16sm8361lft.264.2021.01.05.11.31.53 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 Jan 2021 11:31:54 -0800 (PST) Received: by mail-lf1-f47.google.com with SMTP id o17so1146916lfg.4 for ; Tue, 05 Jan 2021 11:31:53 -0800 (PST) X-Received: by 2002:a2e:b4af:: with SMTP id q15mr470264ljm.507.1609875113609; Tue, 05 Jan 2021 11:31:53 -0800 (PST) MIME-Version: 1.0 References: <000000000000886dbd05b7ffa8db@google.com> <20210104124153.0992b1f7fd1a145e193a333f@linux-foundation.org> In-Reply-To: From: Linus Torvalds Date: Tue, 5 Jan 2021 11:31:37 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: kernel BUG at mm/page-writeback.c:LINE! To: Hugh Dickins Cc: Andrew Morton , Matthew Wilcox , syzbot , Linux Kernel Mailing List , Linux-MM , syzkaller-bugs Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 4, 2021 at 7:29 PM Hugh Dickins wrote: > > > But I feel it's really that end_page_writeback() itself is > > fundamentally buggy, because the "wakeup" is not atomic with the bit > > clearing _and_ it doesn't actually hold the page lock that is > > allegedly serializing this all. > > And we won't be adding a lock_page() into end_page_writeback()! Right. However, the more I think about this, the more I feel end_page_writeback() is kind of ok, and the real problem is that we should have had a "lock/unlock" pattern for the PG_writeback bit instead. Instead, what we have is basically a special "wait for bit to clear", and then external sychronization - even if lacking - for set/clear bit. And the "wait for bit to clear" and "set bit" aren't even adjacent - the waiting happens in write_cache_pages(), but then the actual setting happens later in the actual "(*writepage)()" function. What _would_ be nicer, I feel, is if write_cache_pages() simply did the equivalent of "lock_page()" except for setting the PG_writeback bit. The whole "wait for bit to clear" is fundamentally a much more ambiguous thing for that whole "what if somebody else got it" reason, but it's also nasty because it can be very unfair (the same way our "lock_page()" itself used to be very unfair). IOW, you can have that situation where one actor continually waits for the bit to clear, but then somebody else comes in and gets the bit instead. Of course, writeback is much less likely than lock_page(), so it probably doesn't happen much in practice. And honestly, while I think it would be good to make the rule be "writepage function was entered with the writeback bit already held", that kind of change would be a major pain. We have at leastr 49 different 'writepage' implementations, and while I think a few of them end up just being block_write_full_page(), it means that we have a lot of that BUG_ON(PageWriteback(page)); set_page_writeback(page); pattern in various random filesystem code that all would have to be changed. So I think I know what I'd _like_ the code to be like, but I don't think it's worth it. > > So the one-liner of changing the "if" to "while" in > > wait_on_page_writeback() should get us back to what we used to do. > > I think that is the realistic way to go. Yeah, that's what I'll do. > > Which makes me think that the best option would actually be to move > > the bit clearing _into_ wake_up_page(). But that looks like a very big > > change. See above - having thought about this overnight, I don't think this is even the best change. > But I expect you to take one look and quickly opt instead for the "while" > in wait_on_page_writeback(): which is not so bad now that you've shown a > scenario which justifies it. [ patch removed - I agree, this pain isn't worth it ] Linus