From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=eAi4=M2=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1047EC432BE
	for <linux-mm@archiver.kernel.org>; Tue,  3 Aug 2021 08:14:47 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 95BB760F11
	for <linux-mm@archiver.kernel.org>; Tue,  3 Aug 2021 08:14:46 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 95BB760F11
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id 20E446B0033; Tue,  3 Aug 2021 04:14:46 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1BE636B0036; Tue,  3 Aug 2021 04:14:46 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 0AD868D0001; Tue,  3 Aug 2021 04:14:46 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0194.hostedemail.com [216.40.44.194])
	by kanga.kvack.org (Postfix) with ESMTP id E30DC6B0033
	for <linux-mm@kvack.org>; Tue,  3 Aug 2021 04:14:45 -0400 (EDT)
Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 869121801FA58
	for <linux-mm@kvack.org>; Tue,  3 Aug 2021 08:14:45 +0000 (UTC)
X-FDA: 78433058130.31.D96AF0D
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
	by imf17.hostedemail.com (Postfix) with ESMTP id B3B7AF00349E
	for <linux-mm@kvack.org>; Tue,  3 Aug 2021 08:14:44 +0000 (UTC)
X-IronPort-AV: E=McAfee;i="6200,9189,10064"; a="235562237"
X-IronPort-AV: E=Sophos;i="5.84,291,1620716400"; 
   d="scan'208";a="235562237"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 01:14:43 -0700
X-IronPort-AV: E=Sophos;i="5.84,291,1620716400"; 
   d="scan'208";a="521142332"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.159.119])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 01:14:40 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>,  Andrew Morton
 <akpm@linux-foundation.org>,  David Hildenbrand <david@redhat.com>,  Yang
 Shi <shy828301@gmail.com>,  linux-mm@kvack.org,
  linux-kernel@vger.kernel.org,  Miaohe Lin <linmiaohe@huawei.com>,
  Johannes Weiner <hannes@cmpxchg.org>,  Michal Hocko <mhocko@suse.com>,
  Joonsoo Kim <iamjoonsoo.kim@lge.com>,  Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH] mm,shmem: Fix a typo in shmem_swapin_page()
References: <20210723080000.93953-1-ying.huang@intel.com>
	<24187e5e-069-9f3f-cefe-39ac70783753@google.com>
	<YPs6cQo7iG1JcOn8@casper.infradead.org>
Date: Tue, 03 Aug 2021 16:14:38 +0800
In-Reply-To: <YPs6cQo7iG1JcOn8@casper.infradead.org> (Matthew Wilcox's message
	of "Fri, 23 Jul 2021 22:53:53 +0100")
Message-ID: <8735rr54i9.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: B3B7AF00349E
Authentication-Results: imf17.hostedemail.com;
	dkim=none;
	dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none);
	spf=none (imf17.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.88) smtp.mailfrom=ying.huang@intel.com
X-Stat-Signature: n5txmx86g1c8hje1e6p5zgrapx7tfhqy
X-HE-Tag: 1627978484-193808
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Matthew Wilcox <willy@infradead.org> writes:

> On Fri, Jul 23, 2021 at 01:23:07PM -0700, Hugh Dickins wrote:
>> I was wary because, if the (never observed) race to be fixed is in
>> swap_cluster_readahead(), why was shmem_swapin_page() being patched?
>> Not explained in its commit message, probably a misunderstanding of
>> how mm/shmem.c already manages races (and prefers not to be involved
>> in swap_info_struct stuff).
>> 
>> But why do I now say it's bad?  Because even if you correct the EINVAL
>> to -EINVAL, that's an unexpected error: -EEXIST is common, -ENOMEM is
>> not surprising, -ENOSPC can need consideration, but -EIO and anything
>> else just end up as SIGBUS when faulting (or as error from syscall).
>> So, 2efa33fc7f6e converts a race with swapoff to SIGBUS: not good,
>> and I think much more likely than the race to be fixed (since
>> swapoff's percpu_ref_kill() rightly comes before synchronize_rcu()).
>
> Yes, I think a lot more thought was needed here.  And I would have
> preferred to start with a reproducer instead of "hey, this could
> happen".  Maybe something like booting a 1GB VM, adding two 2GB swap
> partitions, swapon(partition A); run a 2GB memhog and then
>
> loop:
> 	swapon(part B);
> 	swapoff(part A);
> 	swapon(part A);
> 	swapoff(part B);
>
> to make this happen.
>
> but if it does happen, why would returning EINVAL be the right thing
> to do?  We've swapped it out.  It must be on swap somewhere, or we've
> really messed up.  So I could see there being a race where we get
> preempted between looking up the swap entry and calling get_swap_device().
> But if that does happen, then the page gets brought in, and potentially
> reswapped to the other swap device.
>
> So returning -EEXIST here would actually work.  That forces a re-lookup
> in the page cache, so we'll get the new swap entry that tells us which
> swap device the page is now on.

Yes.  -EEXIST is the right error code.  We use that in
shmem_swapin_page() to deal with race condition.

> But I REALLY REALLY REALLY want a reproducer.  Right now, I have a hard
> time believing this, or any of the other races can really happen.

I think the race is only theoretical too.  Firstly, swapoff is a rare
operations in practice; secondly, the race window is really small.

Best Regards,
Huang, Ying

>> 2efa33fc7f6e was intending to fix a race introduced by two-year-old
>> 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested
>> or not"), which added a call to inode_read_congested().  Certainly
>> relying on si->swap_file->f_mapping->host there was new territory:
>> whether actually racy I'm not sure offhand - I've forgotten whether
>> synchronize_rcu() waits for preempted tasks or not.
>> 
>> But if it is racy, then I wonder if the right fix might be to revert
>> 8fd2e0b505d1 too. Convincing numbers were offered for it, but I'm
>> puzzled: because Matthew has in the past noted that the block layer
>> broke and further broke bdi congestion tracking (I don't know the
>> relevant release numbers), so I don't understand how checking
>> inode_read_congested() is actually useful there nowadays.
>
> It might be useful for NFS?  I don't think congestion is broken there
> (except how does the NFS client have any idea whether the server is
> congested or not?)