From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=htka=MP=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D1827C4338F
	for <linux-mm@archiver.kernel.org>; Fri, 23 Jul 2021 21:54:40 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 52D4E60EBC
	for <linux-mm@archiver.kernel.org>; Fri, 23 Jul 2021 21:54:40 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 52D4E60EBC
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id 50B346B0033; Fri, 23 Jul 2021 17:54:39 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4BC296B005D; Fri, 23 Jul 2021 17:54:39 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3D06E6B006C; Fri, 23 Jul 2021 17:54:39 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0165.hostedemail.com [216.40.44.165])
	by kanga.kvack.org (Postfix) with ESMTP id 25EEC6B0033
	for <linux-mm@kvack.org>; Fri, 23 Jul 2021 17:54:39 -0400 (EDT)
Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id 4001E253B3
	for <linux-mm@kvack.org>; Fri, 23 Jul 2021 21:54:37 +0000 (UTC)
X-FDA: 78395207394.19.38EB82A
Received: from casper.infradead.org (casper.infradead.org [90.155.50.34])
	by imf11.hostedemail.com (Postfix) with ESMTP id 9FE7DF005427
	for <linux-mm@kvack.org>; Fri, 23 Jul 2021 21:54:36 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version:
	References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:
	Content-Transfer-Encoding:Content-ID:Content-Description;
	bh=kdgrmM/4B1zIITGzh1S8nGZIo69an1gVWYkQmjkwQuI=; b=GXDzwmUkiDX8PVhs4if4UKHl5V
	uKk2esXvWHiA3xZEXXRYOIpChTAvlSlDocPcMc1pQXYlD6xAxGnFLrH0BaIEbXYAV9n4ta+dGo4Wz
	bA2/UY6Gge6TL6rAaocs4FrJXTLDEraDWgLciHmCSnJxpimqd96N+ZbSfEgS3Xov5gY7fcRanKBmr
	PiUGPFwL5Bksbiq1zstJDS7wC1boPEJcJ9tX4Ddgk3IFm3uL/Q1h7hC7onVunWIAyRfY1X94NADZ5
	KtbERfyR3Hig5Uew6oNQ/RKGJZEAuuGDbjHVSQGRtYCnw+eX0pEc9q2sGbWfJqFp0rot6KwYM+/0N
	VQFH+0dA==;
Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux))
	id 1m737B-00BmSR-6k; Fri, 23 Jul 2021 21:53:55 +0000
Date: Fri, 23 Jul 2021 22:53:53 +0100
From: Matthew Wilcox <willy@infradead.org>
To: Hugh Dickins <hughd@google.com>
Cc: Huang Ying <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Yang Shi <shy828301@gmail.com>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Miaohe Lin <linmiaohe@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH] mm,shmem: Fix a typo in shmem_swapin_page()
Message-ID: <YPs6cQo7iG1JcOn8@casper.infradead.org>
References: <20210723080000.93953-1-ying.huang@intel.com>
 <24187e5e-069-9f3f-cefe-39ac70783753@google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <24187e5e-069-9f3f-cefe-39ac70783753@google.com>
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 9FE7DF005427
Authentication-Results: imf11.hostedemail.com;
	dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=GXDzwmUk;
	dmarc=none;
	spf=none (imf11.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org
X-Stat-Signature: ehjt9tnbyoupmuaj193w8fdmjyfb6n4e
X-HE-Tag: 1627077276-834505
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Jul 23, 2021 at 01:23:07PM -0700, Hugh Dickins wrote:
> I was wary because, if the (never observed) race to be fixed is in
> swap_cluster_readahead(), why was shmem_swapin_page() being patched?
> Not explained in its commit message, probably a misunderstanding of
> how mm/shmem.c already manages races (and prefers not to be involved
> in swap_info_struct stuff).
> 
> But why do I now say it's bad?  Because even if you correct the EINVAL
> to -EINVAL, that's an unexpected error: -EEXIST is common, -ENOMEM is
> not surprising, -ENOSPC can need consideration, but -EIO and anything
> else just end up as SIGBUS when faulting (or as error from syscall).
> So, 2efa33fc7f6e converts a race with swapoff to SIGBUS: not good,
> and I think much more likely than the race to be fixed (since
> swapoff's percpu_ref_kill() rightly comes before synchronize_rcu()).

Yes, I think a lot more thought was needed here.  And I would have
preferred to start with a reproducer instead of "hey, this could
happen".  Maybe something like booting a 1GB VM, adding two 2GB swap
partitions, swapon(partition A); run a 2GB memhog and then

loop:
	swapon(part B);
	swapoff(part A);
	swapon(part A);
	swapoff(part B);

to make this happen.

but if it does happen, why would returning EINVAL be the right thing
to do?  We've swapped it out.  It must be on swap somewhere, or we've
really messed up.  So I could see there being a race where we get
preempted between looking up the swap entry and calling get_swap_device().
But if that does happen, then the page gets brought in, and potentially
reswapped to the other swap device.

So returning -EEXIST here would actually work.  That forces a re-lookup
in the page cache, so we'll get the new swap entry that tells us which
swap device the page is now on.

But I REALLY REALLY REALLY want a reproducer.  Right now, I have a hard
time believing this, or any of the other races can really happen.

> 2efa33fc7f6e was intending to fix a race introduced by two-year-old
> 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested
> or not"), which added a call to inode_read_congested().  Certainly
> relying on si->swap_file->f_mapping->host there was new territory:
> whether actually racy I'm not sure offhand - I've forgotten whether
> synchronize_rcu() waits for preempted tasks or not.
> 
> But if it is racy, then I wonder if the right fix might be to revert
> 8fd2e0b505d1 too. Convincing numbers were offered for it, but I'm
> puzzled: because Matthew has in the past noted that the block layer
> broke and further broke bdi congestion tracking (I don't know the
> relevant release numbers), so I don't understand how checking
> inode_read_congested() is actually useful there nowadays.

It might be useful for NFS?  I don't think congestion is broken there
(except how does the NFS client have any idea whether the server is
congested or not?)