From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EE9FC433F5 for ; Tue, 5 Oct 2021 09:03:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B0C8161037 for ; Tue, 5 Oct 2021 09:03:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B0C8161037 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 36B9A6B006C; Tue, 5 Oct 2021 05:03:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 31A266B0071; Tue, 5 Oct 2021 05:03:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2094F900002; Tue, 5 Oct 2021 05:03:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 104A76B006C for ; Tue, 5 Oct 2021 05:03:07 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A2F0E182945CC for ; Tue, 5 Oct 2021 09:03:06 +0000 (UTC) X-FDA: 78661794372.12.322095D Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf20.hostedemail.com (Postfix) with ESMTP id 2C11BD002892 for ; Tue, 5 Oct 2021 09:03:04 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0UqdvfmN_1633424581; Received: from 30.25.232.89(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0UqdvfmN_1633424581) by smtp.aliyun-inc.com(127.0.0.1); Tue, 05 Oct 2021 17:03:01 +0800 Message-ID: <847363b1-1b31-dcc8-6d6c-7b5d8a6d1972@linux.alibaba.com> Date: Tue, 5 Oct 2021 17:03:00 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:93.0) Gecko/20100101 Thunderbird/93.0 Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache Content-Language: en-US To: Hugh Dickins , Matthew Wilcox Cc: Song Liu , Andrew Morton , Linux MM , Linux Kernel Mailing List , William Kucharski References: <20210923194343.ca0f29e1c4d361170343a6f2@linux-foundation.org> <9e41661d-9919-d556-8c49-610dae157553@linux.alibaba.com> <68737431-01d2-e6e3-5131-7d7c731e49ae@linux.alibaba.com> <8d8fb192-bd8d-8a08-498d-ca7204d4a716@linux.alibaba.com> From: Rongwei Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2C11BD002892 X-Stat-Signature: danf4ecx6h8x6gyz16rezotzktcz1914 Authentication-Results: imf20.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf20.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com X-HE-Tag: 1633424584-805646 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/5/21 10:58 AM, Hugh Dickins wrote: > On Tue, 5 Oct 2021, Rongwei Wang wrote: >=20 >> Hi, >> I have run our cases these two days to stress test new Patch #1. The n= ew Patch >> #1 mainly add filemap_invalidate_{un}lock before and after >> truncate_pagecache(), basing on original Patch #1. And the crash has n= ot >> happened. >> >> Now, I keep the original Patch #1, then adding the code below which su= ggested >> by liu song (I'm not sure which one I should add in the next version, >> Suggested-by or Signed-off-by? If you know, please remind me). >> >> - if (filemap_nr_thps(inode->i_mapping)) >> + if (filemap_nr_thps(inode->i_mapping)) { >> + filemap_invalidate_lock(inode->i_mapping); >> truncate_pagecache(inode, 0); >> + filemap_invalidate_unlock(inode->i_mapping); >> + } >=20 > I won't NAK that patch; but I still believe it's unnecessary, and don't > see how it protects against all the races (collapse_file() does not use > that lock, whereas collapse_file() does use page lock). And if you're > hoping to fix 5.10, then you will have to backport those invalidate_loc= k > patches there too (they're really intended to protect hole-punching). >=20 >> >> And the reason for keeping the original Patch #1 is mainly to fix the = race >> between collapse_file and truncate_pagecache. It seems necessary. Desp= ite the >> two-day test, I did not reproduce this race any more. >> >> In addition, I also test the below method: >> >> diff --git a/mm/truncate.c b/mm/truncate.c >> index 3f47190f98a8..33604e4ce60a 100644 >> --- a/mm/truncate.c >> +++ b/mm/truncate.c >> @@ -210,8 +210,6 @@ invalidate_complete_page(struct address_space *map= ping, >> struct page *page) >> >> int truncate_inode_page(struct address_space *mapping, struct page *= page) >> { >> - VM_BUG_ON_PAGE(PageTail(page), page); >> - >> if (page->mapping !=3D mapping) >> return -EIO; >> >> I am not very sure this VM_BUG_ON_PAGE(PageTail) is what Hugh means. A= nd >> the test results show that only removing this VM_BUG_ON_PAGE(PageTail)= has no >> effect. So, I still keep the original Patch #1 to fix one race. >=20 > Yes, that's exactly what I meant, and thank you for intending to try it= . >=20 > But if that patch had "no effect", then I think you were not running th= e > kernel with that patch applied: because it deletes the BUG on line 213 > of mm/truncate.c, which is what you reported in the first mail! >=20 > Or, is line 213 of mm/truncate.c in your 5.10.46-hugetext+ kernel > something else? I've been looking at 5.15-rc. Hi, Hugh I'm sorry the confusing '5.10.46-hugetext+'. I am also look and test at=20 5.15-rc. >=20 > But I wasn't proposing to delete it merely to hide the BUG: as I hope > I explained, we could move it below the page->mapping check, but it > wouldn't really be of any value there since tails have NULL page->mappi= ng > anyway (well, I didn't check first and second tails, maybe mapping gets > reused for some compound page field in those). I was proposing to delet= e > it because the page->mapping check then weeds out the racy case once > we're holding page lock, without the need for adding anything special. >=20 > Hugh Today, I try again to create some cases to reproduce the race, such as=20 ensuring that multiple processes are always executing=20 =E2=80=98truncate_pagecache=E2=80=99 and only mapping 2M DSO. In this way= , I try to=20 ensure that the target of 'collapse_file' and 'truncate_pagecache' can=20 only be the same VMA, to increase the probability of reproducing that=20 race. But, I can't reproduce that race any more. In fact, according to the previous experience, the current number of=20 attempts should be able to reproduce that race. If you have the idea about creating this case, please tell me, and I can=20 try again. Or we can solve it when it appears again. Thanks! >=20