From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7278C54FD5 for ; Wed, 25 Mar 2020 11:26:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 899FD20771 for ; Wed, 25 Mar 2020 11:26:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="N1qHxAU8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 899FD20771 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 214BA6B000A; Wed, 25 Mar 2020 07:26:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 19DCD6B000C; Wed, 25 Mar 2020 07:26:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03EC06B000D; Wed, 25 Mar 2020 07:26:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id DD9516B000A for ; Wed, 25 Mar 2020 07:26:25 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7FDC3180AD811 for ; Wed, 25 Mar 2020 11:26:25 +0000 (UTC) X-FDA: 76633656330.11.clam28_5524cd02f6d4c X-HE-Tag: clam28_5524cd02f6d4c X-Filterd-Recvd-Size: 6548 Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Wed, 25 Mar 2020 11:26:24 +0000 (UTC) Received: by mail-lj1-f194.google.com with SMTP id t17so1974108ljc.12 for ; Wed, 25 Mar 2020 04:26:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=YOfrzNtBD+kaSq3A3o6ofrhotN+4gW4t0TzfIyOGf6g=; b=N1qHxAU8daqgzKVA78lW1z8YiWgnyQflNvanwAbaaSbxe+J6qsWIiYobd8fHoKVJAT 3pzs0pJSa3qhLC4wTSVnZIUW+cr+9Kck5ptl0DQbp7+nJKbKrgIifDJKIkVZOTsI4XJk DXEOT8fI2Tj8Xgup8Rp78iA43fQJFXNleB6Tpdt3ASUW2I6QzHbGKAwcYK5Kk2XsqZlO z4o94BI4nhziUaiOGhBtPDMV5+Z0k9NycrxFI2GzvhreAbN4Wg70JVSh9QIyh5d/HM17 aeOaPfErX/ejPcovWvvoX6yB69UZ51xHfKeoQYPfGjJq1XS9Sje7GMcNFSxwjfaNDvc/ a3hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=YOfrzNtBD+kaSq3A3o6ofrhotN+4gW4t0TzfIyOGf6g=; b=be3BCrvsZX72OOXBEkAyy1pnL9x4qN7rtTgzyEyXxrktn+bZ/g+jNhiV3Si6UTw7Xk BRVUTTCewFe7xIlFAq6K4wmVbbsgEzo56d252Ujt0POcuwYjun6EZEMSpCDbjWQGsAQs I8ZpAFul1ERrZ6RwQ8mdWTR6WpnwgYOfI/Vkdgvw6nvfalyaCsyx1IV6w+d7h5VXzi53 gnp97xmtcyGefyyUflJbRR8lInfczReRoG3PKIPscn+uf8V1Dv+Jiz16dG3YNeLxeesw erSphj4uoHR1IWdcJF7FI9dwAgjcpgs2Io2kfFo3MaVnCPeY832TiAXUFA5tV15Si3HP Fjxg== X-Gm-Message-State: ANhLgQ1WUn6y/b1I7U5zz14KmMugO/ehl6W0A/ZCGkUEARyhZfSHKLl6 zHdQCSzxz9LkC8fDPAAFWjtjTg== X-Google-Smtp-Source: ADFU+vtoB2CKJ9MUshcTofbsGK5QAiU+q6+TSbvsyySoYsuH66LVFaZ+n2c2xv0jk4QHgC5PAs/71A== X-Received: by 2002:a2e:b4e9:: with SMTP id s9mr1386148ljm.108.1585135583316; Wed, 25 Mar 2020 04:26:23 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id t1sm11416204lji.98.2020.03.25.04.26.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Mar 2020 04:26:22 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id E53A61020AF; Wed, 25 Mar 2020 14:26:23 +0300 (+03) Date: Wed, 25 Mar 2020 14:26:23 +0300 From: "Kirill A. Shutemov" To: Yang Shi Cc: kirill.shutemov@linux.intel.com, hughd@google.com, aarcange@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: khugepaged: fix potential page state corruption Message-ID: <20200325112623.ur4owwbnow5c5mng@box> References: <1584573582-116702-1-git-send-email-yang.shi@linux.alibaba.com> <20200319001258.creziw6ffw4jvwl3@box> <2cdc734c-c222-4b9d-9114-1762b29dafb4@linux.alibaba.com> <20200319104938.vphyajoyz6ob6jtl@box> <99b78cdb-5a4d-e28b-4464-d34ee39e5501@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <99b78cdb-5a4d-e28b-4464-d34ee39e5501@linux.alibaba.com> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 24, 2020 at 10:17:13AM -0700, Yang Shi wrote: >=20 >=20 > On 3/19/20 3:49 AM, Kirill A. Shutemov wrote: > > On Wed, Mar 18, 2020 at 10:39:21PM -0700, Yang Shi wrote: > > >=20 > > > On 3/18/20 5:55 PM, Yang Shi wrote: > > > >=20 > > > > On 3/18/20 5:12 PM, Kirill A. Shutemov wrote: > > > > > On Thu, Mar 19, 2020 at 07:19:42AM +0800, Yang Shi wrote: > > > > > > When khugepaged collapses anonymous pages, the base pages wou= ld > > > > > > be freed > > > > > > via pagevec or free_page_and_swap_cache().=A0 But, the anonym= ous page may > > > > > > be added back to LRU, then it might result in the below race: > > > > > >=20 > > > > > > =A0=A0=A0=A0CPU A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= CPU B > > > > > > khugepaged: > > > > > > =A0=A0 unlock page > > > > > > =A0=A0 putback_lru_page > > > > > > =A0=A0=A0=A0 add to lru > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 page reclaim: > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 isolate = this page > > > > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 try_to_u= nmap > > > > > > =A0=A0 page_remove_rmap <-- corrupt _mapcount > > > > > >=20 > > > > > > It looks nothing would prevent the pages from isolating by re= claimer. > > > > > Hm. Why should it? > > > > >=20 > > > > > try_to_unmap() doesn't exclude parallel page unmapping. _mapcou= nt is > > > > > protected by ptl. And this particular _mapcount pin is reachabl= e for > > > > > reclaim as it's not part of usual page table tree. Basically > > > > > try_to_unmap() will never succeeds until we give up the _mapcou= nt on > > > > > khugepaged side. > > > > I don't quite get. What does "not part of usual page table tree" = means? > > > >=20 > > > > How's about try_to_unmap() acquires ptl before khugepaged? > > The page table we are dealing with was detached from the process' pag= e > > table tree: see pmdp_collapse_flush(). try_to_unmap() will not see th= e > > pte. >=20 > A follow-up question here. pmdp_collapse_flush() clears pmd entry and d= oes > TLB shootdown on x86. I'm supposed the main purpose is to serialize fas= t gup > since it doesn't acquire any lock (mmap_sem, ptl ,etc), but disable > interrupt so the TLB shootdown IPI would get blocked. This could guaran= tee > synchronization on x86, but it looks not all architectures do TLB shoot= down > or implement it via IPI, so how they could serialize with fast gup? The main purpose of pmdp_collapse_flush() is to block access to pages under collapse, including access via GUP (and its variants). It's up to architecture to implement it correctly, including TLB flush vs= . GUP_fast serialization. Genetic way works fine for most architectures. Notable exceptions are Power and S390. > In addition it looks acquiring pmd lock is not necessary. Before both w= rite > mmap_sem and write anon_vma lock are acquired which could serialize pag= e > fault and rmap walk, so it looks fast gup is the only one which could r= un > concurrently, but fast gup doesn't acquire ptl at all. It seems the > pmd_lock/unlock could be removed. This is likely true. And we have a comment there. But taking uncontended lock is check, so why not. --=20 Kirill A. Shutemov