From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5D88C433ED for ; Mon, 12 Apr 2021 01:49:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B3336102A for ; Mon, 12 Apr 2021 01:49:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236199AbhDLBov (ORCPT ); Sun, 11 Apr 2021 21:44:51 -0400 Received: from mga09.intel.com ([134.134.136.24]:16117 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235543AbhDLBos (ORCPT ); Sun, 11 Apr 2021 21:44:48 -0400 IronPort-SDR: WP7ihIab611jDvhYJKrkOmidfX8AtLmhScRFE5eexnwiKN+fPquVtR/wVNy6j5exPqWAVmtETq BGqcMwbhcelQ== X-IronPort-AV: E=McAfee;i="6000,8403,9951"; a="194203436" X-IronPort-AV: E=Sophos;i="5.82,214,1613462400"; d="scan'208";a="194203436" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2021 18:44:30 -0700 IronPort-SDR: /TLdrKv6QNTpc9r6cNaJvN03c0rl7R1vpQvNcc+PWAqDmUej2sY+vRll5ALrrzHw6Wxu4tLV6A 5a3AdkRs4etQ== X-IronPort-AV: E=Sophos;i="5.82,214,1613462400"; d="scan'208";a="459970583" Received: from yhuang6-desk1.sh.intel.com (HELO yhuang6-desk1.ccr.corp.intel.com) ([10.239.13.1]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2021 18:44:26 -0700 From: "Huang, Ying" To: Miaohe Lin Cc: Tim Chen , , , , , , , , , , , , Subject: Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff References: <20210408130820.48233-1-linmiaohe@huawei.com> <20210408130820.48233-3-linmiaohe@huawei.com> <7684b3de-2824-9b1f-f033-d4bc14f9e195@linux.intel.com> <50d34b02-c155-bad7-da1f-03807ad31275@huawei.com> <995a130b-f07a-4771-1fe3-477d2f3c1e8e@linux.intel.com> Date: Mon, 12 Apr 2021 09:44:23 +0800 In-Reply-To: (Miaohe Lin's message of "Sat, 10 Apr 2021 11:17:29 +0800") Message-ID: <87r1jgwa2w.fsf@yhuang6-desk1.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Miaohe Lin writes: > On 2021/4/10 1:17, Tim Chen wrote: >> >> >> On 4/9/21 1:42 AM, Miaohe Lin wrote: >>> On 2021/4/9 5:34, Tim Chen wrote: >>>> >>>> >>>> On 4/8/21 6:08 AM, Miaohe Lin wrote: >>>>> When I was investigating the swap code, I found the below possible race >>>>> window: >>>>> >>>>> CPU 1 CPU 2 >>>>> ----- ----- >>>>> do_swap_page >>>>> synchronous swap_readpage >>>>> alloc_page_vma >>>>> swapoff >>>>> release swap_file, bdev, or ... >>>> >>> >>> Many thanks for quick review and reply! >>> >>>> Perhaps I'm missing something. The release of swap_file, bdev etc >>>> happens after we have cleared the SWP_VALID bit in si->flags in destroy_swap_extents >>>> if I read the swapoff code correctly. >>> Agree. Let's look this more close: >>> CPU1 CPU2 >>> ----- ----- >>> swap_readpage >>> if (data_race(sis->flags & SWP_FS_OPS)) { >>> swapoff >>> p->swap_file = NULL; >>> struct file *swap_file = sis->swap_file; >>> struct address_space *mapping = swap_file->f_mapping;[oops!] >>> ... >>> p->flags = 0; >>> ... >>> >>> Does this make sense for you? >> >> p->swapfile = NULL happens after the >> p->flags &= ~SWP_VALID, synchronize_rcu(), destroy_swap_extents() sequence in swapoff(). >> >> So I don't think the sequence you illustrated on CPU2 is in the right order. >> That said, without get_swap_device/put_swap_device in swap_readpage, you could >> potentially blow pass synchronize_rcu() on CPU2 and causes a problem. so I think >> the problematic race looks something like the following: >> >> >> CPU1 CPU2 >> ----- ----- >> swap_readpage >> if (data_race(sis->flags & SWP_FS_OPS)) { >> swapoff >> p->flags = &= ~SWP_VALID; >> .. >> synchronize_rcu(); >> .. >> p->swap_file = NULL; >> struct file *swap_file = sis->swap_file; >> struct address_space *mapping = swap_file->f_mapping;[oops!] >> ... >> ... >> > > Agree. This is also what I meant to illustrate. And you provide a better one. Many thanks! For the pages that are swapped in through swap cache. That isn't an issue. Because the page is locked, the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() cannot proceed until the page has been unlocked. So the race is for the fast path as follows, if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) I found it in your original patch description. But please make it more explicit to reduce the potential confusing. Best Regards, Huang, Ying From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58188C433B4 for ; Mon, 12 Apr 2021 01:44:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BCBB061027 for ; Mon, 12 Apr 2021 01:44:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BCBB061027 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 494F56B0036; Sun, 11 Apr 2021 21:44:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 446536B006C; Sun, 11 Apr 2021 21:44:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E6656B006E; Sun, 11 Apr 2021 21:44:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id 110DF6B0036 for ; Sun, 11 Apr 2021 21:44:34 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AD001A76B for ; Mon, 12 Apr 2021 01:44:33 +0000 (UTC) X-FDA: 78022020426.15.45C55A7 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf19.hostedemail.com (Postfix) with ESMTP id E18A290009F1 for ; Mon, 12 Apr 2021 01:44:19 +0000 (UTC) IronPort-SDR: oOmmKtFztuiQ1e57LcDTlbihRECrtailHUpQKdbKY0rqq794tt9wh7s/jNob3FULAQDLWY+Mnb 6TId52rO7uZQ== X-IronPort-AV: E=McAfee;i="6000,8403,9951"; a="191956056" X-IronPort-AV: E=Sophos;i="5.82,214,1613462400"; d="scan'208";a="191956056" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2021 18:44:30 -0700 IronPort-SDR: /TLdrKv6QNTpc9r6cNaJvN03c0rl7R1vpQvNcc+PWAqDmUej2sY+vRll5ALrrzHw6Wxu4tLV6A 5a3AdkRs4etQ== X-IronPort-AV: E=Sophos;i="5.82,214,1613462400"; d="scan'208";a="459970583" Received: from yhuang6-desk1.sh.intel.com (HELO yhuang6-desk1.ccr.corp.intel.com) ([10.239.13.1]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2021 18:44:26 -0700 From: "Huang, Ying" To: Miaohe Lin Cc: Tim Chen , , , , , , , , , , , , Subject: Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff References: <20210408130820.48233-1-linmiaohe@huawei.com> <20210408130820.48233-3-linmiaohe@huawei.com> <7684b3de-2824-9b1f-f033-d4bc14f9e195@linux.intel.com> <50d34b02-c155-bad7-da1f-03807ad31275@huawei.com> <995a130b-f07a-4771-1fe3-477d2f3c1e8e@linux.intel.com> Date: Mon, 12 Apr 2021 09:44:23 +0800 In-Reply-To: (Miaohe Lin's message of "Sat, 10 Apr 2021 11:17:29 +0800") Message-ID: <87r1jgwa2w.fsf@yhuang6-desk1.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: E18A290009F1 X-Stat-Signature: j3d45bgmhemmab6rhrwytf5nm5x1po6t X-Rspamd-Server: rspam02 Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from=""; helo=mga04.intel.com; client-ip=192.55.52.120 X-HE-DKIM-Result: none/none X-HE-Tag: 1618191859-852624 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Miaohe Lin writes: > On 2021/4/10 1:17, Tim Chen wrote: >> >> >> On 4/9/21 1:42 AM, Miaohe Lin wrote: >>> On 2021/4/9 5:34, Tim Chen wrote: >>>> >>>> >>>> On 4/8/21 6:08 AM, Miaohe Lin wrote: >>>>> When I was investigating the swap code, I found the below possible race >>>>> window: >>>>> >>>>> CPU 1 CPU 2 >>>>> ----- ----- >>>>> do_swap_page >>>>> synchronous swap_readpage >>>>> alloc_page_vma >>>>> swapoff >>>>> release swap_file, bdev, or ... >>>> >>> >>> Many thanks for quick review and reply! >>> >>>> Perhaps I'm missing something. The release of swap_file, bdev etc >>>> happens after we have cleared the SWP_VALID bit in si->flags in destroy_swap_extents >>>> if I read the swapoff code correctly. >>> Agree. Let's look this more close: >>> CPU1 CPU2 >>> ----- ----- >>> swap_readpage >>> if (data_race(sis->flags & SWP_FS_OPS)) { >>> swapoff >>> p->swap_file = NULL; >>> struct file *swap_file = sis->swap_file; >>> struct address_space *mapping = swap_file->f_mapping;[oops!] >>> ... >>> p->flags = 0; >>> ... >>> >>> Does this make sense for you? >> >> p->swapfile = NULL happens after the >> p->flags &= ~SWP_VALID, synchronize_rcu(), destroy_swap_extents() sequence in swapoff(). >> >> So I don't think the sequence you illustrated on CPU2 is in the right order. >> That said, without get_swap_device/put_swap_device in swap_readpage, you could >> potentially blow pass synchronize_rcu() on CPU2 and causes a problem. so I think >> the problematic race looks something like the following: >> >> >> CPU1 CPU2 >> ----- ----- >> swap_readpage >> if (data_race(sis->flags & SWP_FS_OPS)) { >> swapoff >> p->flags = &= ~SWP_VALID; >> .. >> synchronize_rcu(); >> .. >> p->swap_file = NULL; >> struct file *swap_file = sis->swap_file; >> struct address_space *mapping = swap_file->f_mapping;[oops!] >> ... >> ... >> > > Agree. This is also what I meant to illustrate. And you provide a better one. Many thanks! For the pages that are swapped in through swap cache. That isn't an issue. Because the page is locked, the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() cannot proceed until the page has been unlocked. So the race is for the fast path as follows, if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) I found it in your original patch description. But please make it more explicit to reduce the potential confusing. Best Regards, Huang, Ying