From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A27E2C433EF for ; Wed, 2 Feb 2022 21:28:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E165F8D011A; Wed, 2 Feb 2022 16:28:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DC5D28D0114; Wed, 2 Feb 2022 16:28:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8E4D8D011A; Wed, 2 Feb 2022 16:28:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id BAE208D0114 for ; Wed, 2 Feb 2022 16:28:07 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 79F3C9A81C for ; Wed, 2 Feb 2022 21:28:07 +0000 (UTC) X-FDA: 79099127814.24.4F7C870 Received: from smtp-relay-internal-1.canonical.com (smtp-relay-internal-1.canonical.com [185.125.188.123]) by imf31.hostedemail.com (Postfix) with ESMTP id 139A420007 for ; Wed, 2 Feb 2022 21:28:06 +0000 (UTC) Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id E24893F1D9 for ; Wed, 2 Feb 2022 21:28:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1643837280; bh=L4fb5I2Qy+n1xMTtUMeKV+ktMXhZO+7bzeLzs5mvbSQ=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=TII8T+Pp2zOkzM45w1ZFC1BqpkOdIQki9lS86DcSbdCjl5mq6arSXZtBI6dh4SgM8 umF0V8fLYO3MkQTVDk/d3UnSEGkN2yk3xK4maKHusSo9NgA9G/15AOUQDFH8Amvpd1 Ldv51pSLTd8kbrci2qnKYzLV4z8ztpWuyhLZq38KJ7xqwi3T6ghjcMRhyh02+COs6k X+h1DAmFeAJclRzKqTc/mml04bXk95Py6viPFQsye7EK9O4Ekun8+V/JuJ6yY3d4kD uKRfJn1V0m3HNTGwyoUbvgG2M7ytvZaSTY4BYz1ZPF/Cn6RMB9nS10w0n2dowpwyGl 6l3HcBXGHMFAg== Received: by mail-pf1-f197.google.com with SMTP id bd15-20020a056a00278f00b004c7617c47dbso280788pfb.0 for ; Wed, 02 Feb 2022 13:28:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=L4fb5I2Qy+n1xMTtUMeKV+ktMXhZO+7bzeLzs5mvbSQ=; b=uKQ4RkzgsMUTpikCxSf1Sx3HW0bjf8Qz+srJhaw8cNEjPDa9KauB4fqRBriKhTtxH1 Ozea3YapxKcVnTgsQqa/kKpxkg4odVC7wNnkVJC8+nptJex6/ymPJUHJG/iW95uVZ6fB MI0VUN8TMdXUn9Ugjb+5CqsnYNbaj0BGzy8Nd12oS2pdLEGDTN3Cay3zylu23sA9fSaE OiWeWuIxpJdhd4ADR6gLiJkVKfw5aiy7L5CoP6tswqXSuB2rY2sBE/DvNl9kuV/i8cn1 joNwh61E8s4i/fuaW5bj5zlSFy5EjCe64VqMzBNAFRDFQQ3ulXGoIei6YSY/PaNMoHL6 7F/w== X-Gm-Message-State: AOAM5308IpVgfcZQ6rVEnkI86KF5S59JbxNPVsNwOWFIdliAIqxMPu5h wup7QrbYkmKdPa8DBLhZAJu7wjwqpJD8sPv3uwBKiUzaHSDLs3dk6MwBMrURtl4ouCrRckC7z0e wGrU2/2fB/Vu1F4PXcQa6e7il/8BDeXtiLCH5jk3KmYuw X-Received: by 2002:a17:902:c412:: with SMTP id k18mr32559616plk.142.1643837279502; Wed, 02 Feb 2022 13:27:59 -0800 (PST) X-Google-Smtp-Source: ABdhPJzzvdObhXy/sbCjnrjCiO10WZGFW0bvTdQ82TtuOngDDa/6lCGmqoZkAsggiBDk/qQwN32jO6EPQJ2oSzAxQEg= X-Received: by 2002:a17:902:c412:: with SMTP id k18mr32559593plk.142.1643837279128; Wed, 02 Feb 2022 13:27:59 -0800 (PST) MIME-Version: 1.0 References: <20220131230255.789059-1-mfo@canonical.com> In-Reply-To: From: Mauricio Faria de Oliveira Date: Wed, 2 Feb 2022 18:27:47 -0300 Message-ID: Subject: Re: [PATCH v3] mm: fix race between MADV_FREE reclaim and blkdev direct IO read To: Yu Zhao Cc: Minchan Kim , "Huang, Ying" , Andrew Morton , Yang Shi , Miaohe Lin , linux-mm@kvack.org, linux-block@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: b9q9mfr3kmmj5r5b3jdcb64cdnuzapkm X-Rspam-User: nil Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b=TII8T+Pp; spf=pass (imf31.hostedemail.com: domain of mauricio.oliveira@canonical.com designates 185.125.188.123 as permitted sender) smtp.mailfrom=mauricio.oliveira@canonical.com; dmarc=pass (policy=none) header.from=canonical.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 139A420007 X-HE-Tag: 1643837286-291491 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 2, 2022 at 4:56 PM Yu Zhao wrote: > > On Mon, Jan 31, 2022 at 08:02:55PM -0300, Mauricio Faria de Oliveira wrote: > > Problem: > > ======= > > Thanks for the update. A couple of quick questions: > > > Userspace might read the zero-page instead of actual data from a > > direct IO read on a block device if the buffers have been called > > madvise(MADV_FREE) on earlier (this is discussed below) due to a > > race between page reclaim on MADV_FREE and blkdev direct IO read. > > 1) would page migration be affected as well? Could you please elaborate on the potential problem you considered? I checked migrate_pages() -> try_to_migrate() holds the page lock, thus shouldn't race with shrink_page_list() -> with try_to_unmap() (where the issue with MADV_FREE is), but maybe I didn't get you correctly. > > > @@ -1599,7 +1599,30 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > > > /* MADV_FREE page check */ > > if (!PageSwapBacked(page)) { > > - if (!PageDirty(page)) { > > + int ref_count, map_count; > > + > > + /* > > + * Synchronize with gup_pte_range(): > > + * - clear PTE; barrier; read refcount > > + * - inc refcount; barrier; read PTE > > + */ > > + smp_mb(); > > + > > + ref_count = page_count(page); > > + map_count = page_mapcount(page); > > + > > + /* > > + * Order reads for page refcount and dirty flag; > > + * see __remove_mapping(). > > + */ > > + smp_rmb(); > > 2) why does it need to order against __remove_mapping()? It seems to > me that here (called from the reclaim path) it can't race with > __remove_mapping() because both lock the page. I'll improve that comment in v4. The ordering isn't against __remove_mapping(), but actually because of an issue described in __remove_mapping()'s comments (something else that doesn't hold the page lock, just has a page reference, that may clear the page dirty flag then drop the reference; thus check ref, then dirty). Hope this clarifies the question. Thanks! > > > + /* > > + * The only page refs must be from the isolation > > + * plus one or more rmap's (dropped by discard:). > > + */ > > + if ((ref_count == 1 + map_count) && > > + !PageDirty(page)) { > > /* Invalidate as we cleared the pte */ > > mmu_notifier_invalidate_range(mm, > > address, address + PAGE_SIZE); -- Mauricio Faria de Oliveira