From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D110C07E96 for ; Thu, 15 Jul 2021 11:30:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44FE9613BA for ; Thu, 15 Jul 2021 11:30:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240939AbhGOLda (ORCPT ); Thu, 15 Jul 2021 07:33:30 -0400 Received: from szxga08-in.huawei.com ([45.249.212.255]:11281 "EHLO szxga08-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240679AbhGOLd3 (ORCPT ); Thu, 15 Jul 2021 07:33:29 -0400 Received: from dggeme703-chm.china.huawei.com (unknown [172.30.72.56]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4GQX8l2drwz1CK4j; Thu, 15 Jul 2021 19:24:55 +0800 (CST) Received: from [10.174.178.125] (10.174.178.125) by dggeme703-chm.china.huawei.com (10.1.199.99) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Thu, 15 Jul 2021 19:30:33 +0800 Subject: Re: [PATCH 1/5] mm/vmscan: put the redirtied MADV_FREE pages back to anonymous LRU list To: John Hubbard , Matthew Wilcox CC: Michal Hocko , Yu Zhao , , , , , , , , , , , , , References: <20210710100329.49174-1-linmiaohe@huawei.com> <20210710100329.49174-2-linmiaohe@huawei.com> <9409189e-44f7-2608-68af-851629b6d453@huawei.com> From: Miaohe Lin Message-ID: <0634e9d6-9fcc-e65f-dc5e-bed13004b8fe@huawei.com> Date: Thu, 15 Jul 2021 19:30:33 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.178.125] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggeme703-chm.china.huawei.com (10.1.199.99) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/7/15 3:43, John Hubbard wrote: > On 7/14/21 4:48 AM, Matthew Wilcox wrote: >> On Wed, Jul 14, 2021 at 07:36:57PM +0800, Miaohe Lin wrote: >>> On 2021/7/13 21:34, Matthew Wilcox wrote: >>>> On Tue, Jul 13, 2021 at 09:13:51PM +0800, Miaohe Lin wrote: >>>>>>> When the MADV_FREE pages are redirtied before they could be reclaimed, the pages >>>>>>> should be put back to anonymous LRU list by setting SwapBacked flag, thus the >>>>>>> pages will be reclaimed in normal swapout way. >>>>>> >>>>>> Agreed. But the question is why this needs an explicit handling here >>>>>> when we already do handle this case when trying to unmap the page. >>>>> >>>>> This makes me think more. It seems even the page_ref_freeze call is guaranteed to >>>>> success as no one can grab the page refcnt after the page is successfully unmapped. >>>> >>>> NO!  This is wrong.  Every page can have its refcount speculatively raised >>>> (and then lowered).  The two prime candidates for this are lockless GUP >>>> and page cache lookups, but there can be others too. >>>> >>> >>> Many thanks for pointing this out. My overlook! Sorry! >>> So, it seems lockless GUP can redirty the MADV_FREE page. But is it ok to just release >>> a redirtied MADV_FREE pages? Because we hold the last reference here and the page will >>> be freed anyway... >> >> I don't see how lockless GUP can redirty the page.  It can grab the >> refcount, thus making the refcount here two.  Then the call to freeze >> here fails and the page stays on the list.  But the lockless GUP checks >> the page is still in the page table (and discovers it isn't, so releases >> the reference count).  Am I missing a path that lets lockless GUP dirty >> the page? >> > > If a device driver pins some pages using gup, and the device then uses dma > to write to those pages, then you could get there. That story is part of the > reasoning that led to creating pin_user_pages(), which btw does not yet > fully solve that case. Many thanks for your explanation. So the similar scenario that is clarified in the __remove_mapping() is possible: get_user_pages(&page); [user mapping goes away] write_to(page); !PageDirty(page) [good] SetPageDirty(page); put_page(page); !page_count(page) [good, discard it] [oops, our write_to data is lost] The page can be redirtied after the page is unmapped. And there is no way to restore the page table as clean MADV_FREE page is simply cleared from page table via the try_to_unmap path. Is it ok to just release the redirtied MADV_FREE pages here as we hold the last reference and the page will be freed anyway... ? > > Basically, though, unless a non-CPU device has access to the page, it's > hard to see how gup itself can lead to a page getting dirtied. > > thanks,