From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0C04C433F5 for ; Tue, 15 Mar 2022 20:34:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351738AbiCOUfq (ORCPT ); Tue, 15 Mar 2022 16:35:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241887AbiCOUfo (ORCPT ); Tue, 15 Mar 2022 16:35:44 -0400 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7224313D55 for ; Tue, 15 Mar 2022 13:34:31 -0700 (PDT) Received: by mail-oi1-x22b.google.com with SMTP id h10so551086oia.4 for ; Tue, 15 Mar 2022 13:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=HuxzPBFcNGizUpUFeFsNjMUynwK/YImVehMj2Mb+h5A=; b=K3HGrbkU3n9yvQYCY/kzElGn6znziTraIc5bj7MoyQgemj6Jy89aSAsaWRBNeSQirf 5XAGCDiWwqOiXzbReSY3yKhwLgKrG7zIljAb0MxtNcmhyYqdc+BbqjG11NdnvjIGO7pF WhYkc0Pj2Lpv6CKFjhncgf5mGTCONSuF6qVBOAl3I/yiUt3sNVL9My/LEgJzuZ9PNz1+ KNp706Dq/JKj2fKDY5vdz4tqkqUxLy0tS2/hFySkXnSULhgAZIqeLKYu1+UvCMNJsCH5 IhISLb9TUOzNwHkwJbSji4zTaCSKtDvaR3Ge1ojWe2ATreVqSqdLJBV+Jk1aDaeLLEjs owUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=HuxzPBFcNGizUpUFeFsNjMUynwK/YImVehMj2Mb+h5A=; b=7P4AhAdsVDdula2Z0XnFwyl/00YhdsK12RdFm7pa15kDhquVfed4Oex6IK3cdOYCsc vsWzNLcTyoR4EppminxPpnugwzBMn7zrYzkC+b1wtBzJ+DrnR07sUAFmOZWYWDMYu17X 63SCIE+rtmcMJ97RKwZVrtl9okLYAG+Yn92PjMIw6fySmirFLhxrkrS3Cnvt7Unir/oG wpC2bNpItc31fdyhW5v2DnZmBUVCMK8SsLsJy0SJfLP+yWj1ch05DZyQmcJde8RfRn0v /7Uja10/yXQRC5wk5pFhYpeiqw+vtd7fQ1hmtKKP+sTrDg0QQQTQqBC5X3zZYHTH9ddy iTWQ== X-Gm-Message-State: AOAM532NfpaFkZ1JCMddVqyWUpTisyoqWfmgomwW5d/UMalWWpfqXB+1 pXcEEpKfqt659lRazNzHAQDJ1MXAVmXA8A== X-Google-Smtp-Source: ABdhPJyIXsrs++KyhkbY8OaCTfEwShUw+XWKA+egNWequzBiJdYyCj3Gy31D2jBYT9eCoIr7cy9G0w== X-Received: by 2002:a54:4e11:0:b0:2ec:e0ee:ac29 with SMTP id a17-20020a544e11000000b002ece0eeac29mr2429740oiy.257.1647376470526; Tue, 15 Mar 2022 13:34:30 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a10-20020a056808120a00b002d404a71444sm95511oil.35.2022.03.15.13.34.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 13:34:29 -0700 (PDT) Date: Tue, 15 Mar 2022 13:34:02 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: David Hildenbrand cc: Andrew Morton , Andrew Yang , Matthias Brugger , Matthew Wilcox , Vlastimil Babka , David Howells , William Kucharski , Yang Shi , Marc Zyngier , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, wsd_upstream@mediatek.com, Nicholas Tang , Kuan-Ying Lee Subject: Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated In-Reply-To: <4cb789a5-c49c-f095-1f7e-67be65ba508a@redhat.com> Message-ID: <883877a-30b0-96e0-48a6-7cfc3c59de93@google.com> References: <20220315030515.20263-1-andrew.yang@mediatek.com> <20220314212127.a2797926ee0ef8a7ad05dcaa@linux-foundation.org> <4cb789a5-c49c-f095-1f7e-67be65ba508a@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 15 Mar 2022, David Hildenbrand wrote: > On 15.03.22 05:21, Andrew Morton wrote: > > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang wrote: > > > >> When memory is tight, system may start to compact memory for large > >> continuous memory demands. If one process tries to lock a memory page > >> that is being locked and isolated for compaction, it may wait a long time > >> or even forever. This is because compaction will perform non-atomic > >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters > >> set by the process that can't obtain the page lock and add itself to the > >> waiting queue to wait for the lock to be unlocked. > >> > >> CPU1 CPU2 > >> lock_page(page); (successful) > >> lock_page(); (failed) > >> __ClearPageIsolated(page); SetPageWaiters(page) (may be overwritten) > >> unlock_page(page); > >> > >> The solution is to not perform non-atomic operation on page flags while > >> holding page lock. > > > > Sure, the non-atomic bitop optimization is really risky and I suspect > > we reach for it too often. Or at least without really clearly > > demonstrating that it is safe, and documenting our assumptions. > > I agree. IIRC, non-atomic variants are mostly only safe while the > refcount is 0. Everything else is just absolutely fragile. It is normal and correct to use __SetPageFlag(page) on a page just allocated from the buddy, and not yet logically visible to others: that has refcount 1. Of course, it might have refcount 2 or more, through being speculatively visible to get_page_unless_zero() users: perhaps through earlier usage of the same struct page, or by physical scan of memmap. Those few such others - compaction's isolate_migratepages_block() is the one I know best - must be very careful in their sequence of operations. Preliminary read-only checks are usually okay (but some VM_BUG_ON_PGFLAGS are increasingly problematic: I've had to turn off that CONFIG), then get_page_unless_zero(), then read-only check that the page is of the manageable kind (PageLRU in my world), and only then can it be safe to lock the page - which of course touches page flags, and so would be problematic for a racing user's __SetPageFlag(page). But PageMovable and PageIsolated are beyond my ken: I can't say there. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6ADACC433EF for ; Tue, 15 Mar 2022 20:34:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:Message-ID: In-Reply-To:Subject:cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=XONtQ8we9UVRGxmqfaKcrdjdVzKtgYNXGkVpBcISd+Y=; b=s025rxHOfRmB6h jmKaP3YdvlV1JhQX1wCButMhgQmbtTtBUeuiahADyQ6ydYgQRKNE8NREigEfr2Qt8bDl/w+tLgbC7 FDRjjsEnT6xNGvr3c/NkywdXXP0cQmdnsM7iesYC5Eo6ccTdGy3DN61G0HjUOTF+y2FF2Z6YbpNoG OTQYxCeOWbVQG94PTTyTX20S0P1QYSDwvP8nyXzYlwGuOxfw2Tg9df2s/IrDOUHGRKKEiXxPpHTh7 /L8+BkqmVT2jmjHfGnMXPoHydze9qNcawyO5NT4+T6A1Z+LmBl1HFwWbLYyBgoueE/IDCo3GSh+Gc lx1vzr8QIvEZH8Ha693Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nUDsT-00AZmh-IW; Tue, 15 Mar 2022 20:34:45 +0000 Received: from mail-oi1-x233.google.com ([2607:f8b0:4864:20::233]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nUDsH-00AZjz-JA for linux-mediatek@lists.infradead.org; Tue, 15 Mar 2022 20:34:34 +0000 Received: by mail-oi1-x233.google.com with SMTP id q189so527112oia.9 for ; Tue, 15 Mar 2022 13:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=HuxzPBFcNGizUpUFeFsNjMUynwK/YImVehMj2Mb+h5A=; b=K3HGrbkU3n9yvQYCY/kzElGn6znziTraIc5bj7MoyQgemj6Jy89aSAsaWRBNeSQirf 5XAGCDiWwqOiXzbReSY3yKhwLgKrG7zIljAb0MxtNcmhyYqdc+BbqjG11NdnvjIGO7pF WhYkc0Pj2Lpv6CKFjhncgf5mGTCONSuF6qVBOAl3I/yiUt3sNVL9My/LEgJzuZ9PNz1+ KNp706Dq/JKj2fKDY5vdz4tqkqUxLy0tS2/hFySkXnSULhgAZIqeLKYu1+UvCMNJsCH5 IhISLb9TUOzNwHkwJbSji4zTaCSKtDvaR3Ge1ojWe2ATreVqSqdLJBV+Jk1aDaeLLEjs owUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=HuxzPBFcNGizUpUFeFsNjMUynwK/YImVehMj2Mb+h5A=; b=19PpSkeQ0bDYyjmTwnZCYxoYz7y52iXgrzYLsVNIW4KOL3Hssnpg0Fxd6C1IZohyY4 d7888gQLfUMHWi0E7BEG9joI440zlscK/LBDt0EUUsHDrNiH4BhqoM53ly18SgusXd+Q azw8WWjbV41kwuSYbUJRtc+ANbbKd20VFduqIUx6pkOYE/WsG/IR5sPN5IJ90FCeiky/ bqzGg8ZUkIxK0MA92CuS7o78IH+OhSpKiGJ+w0+yOnPMThAZx0yOARmM59cWoW2poAKT UmilwAcj5D52RyKQ5semSCpdiEtig+/s2GtnqY6Kb79tuVik+pkx2m4rYu/UpIiOHHKH ldqg== X-Gm-Message-State: AOAM530ks7XpoOIS57IMZu2/2SRFi74YpxoMIA5U6JQX6PfXOlCGF5vr DBVVK5Id8EcwNRB3CTUk1ZdS1w== X-Google-Smtp-Source: ABdhPJyIXsrs++KyhkbY8OaCTfEwShUw+XWKA+egNWequzBiJdYyCj3Gy31D2jBYT9eCoIr7cy9G0w== X-Received: by 2002:a54:4e11:0:b0:2ec:e0ee:ac29 with SMTP id a17-20020a544e11000000b002ece0eeac29mr2429740oiy.257.1647376470526; Tue, 15 Mar 2022 13:34:30 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a10-20020a056808120a00b002d404a71444sm95511oil.35.2022.03.15.13.34.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 13:34:29 -0700 (PDT) Date: Tue, 15 Mar 2022 13:34:02 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: David Hildenbrand cc: Andrew Morton , Andrew Yang , Matthias Brugger , Matthew Wilcox , Vlastimil Babka , David Howells , William Kucharski , Yang Shi , Marc Zyngier , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, wsd_upstream@mediatek.com, Nicholas Tang , Kuan-Ying Lee Subject: Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated In-Reply-To: <4cb789a5-c49c-f095-1f7e-67be65ba508a@redhat.com> Message-ID: <883877a-30b0-96e0-48a6-7cfc3c59de93@google.com> References: <20220315030515.20263-1-andrew.yang@mediatek.com> <20220314212127.a2797926ee0ef8a7ad05dcaa@linux-foundation.org> <4cb789a5-c49c-f095-1f7e-67be65ba508a@redhat.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220315_133433_689213_0AAB124B X-CRM114-Status: GOOD ( 20.84 ) X-BeenThere: linux-mediatek@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "Linux-mediatek" Errors-To: linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org On Tue, 15 Mar 2022, David Hildenbrand wrote: > On 15.03.22 05:21, Andrew Morton wrote: > > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang wrote: > > > >> When memory is tight, system may start to compact memory for large > >> continuous memory demands. If one process tries to lock a memory page > >> that is being locked and isolated for compaction, it may wait a long time > >> or even forever. This is because compaction will perform non-atomic > >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters > >> set by the process that can't obtain the page lock and add itself to the > >> waiting queue to wait for the lock to be unlocked. > >> > >> CPU1 CPU2 > >> lock_page(page); (successful) > >> lock_page(); (failed) > >> __ClearPageIsolated(page); SetPageWaiters(page) (may be overwritten) > >> unlock_page(page); > >> > >> The solution is to not perform non-atomic operation on page flags while > >> holding page lock. > > > > Sure, the non-atomic bitop optimization is really risky and I suspect > > we reach for it too often. Or at least without really clearly > > demonstrating that it is safe, and documenting our assumptions. > > I agree. IIRC, non-atomic variants are mostly only safe while the > refcount is 0. Everything else is just absolutely fragile. It is normal and correct to use __SetPageFlag(page) on a page just allocated from the buddy, and not yet logically visible to others: that has refcount 1. Of course, it might have refcount 2 or more, through being speculatively visible to get_page_unless_zero() users: perhaps through earlier usage of the same struct page, or by physical scan of memmap. Those few such others - compaction's isolate_migratepages_block() is the one I know best - must be very careful in their sequence of operations. Preliminary read-only checks are usually okay (but some VM_BUG_ON_PGFLAGS are increasingly problematic: I've had to turn off that CONFIG), then get_page_unless_zero(), then read-only check that the page is of the manageable kind (PageLRU in my world), and only then can it be safe to lock the page - which of course touches page flags, and so would be problematic for a racing user's __SetPageFlag(page). But PageMovable and PageIsolated are beyond my ken: I can't say there. Hugh _______________________________________________ Linux-mediatek mailing list Linux-mediatek@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-mediatek From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1084CC433EF for ; Tue, 15 Mar 2022 20:35:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:Message-ID: In-Reply-To:Subject:cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=2/lck2vRaKA77NzsB9S2XbjxsrhsU8nLRW+4u89RL3Q=; b=EkvKS7RZNXsjvP 9RIDiIHwQsPoW20tCKCC/nCY8aafy4WlhfTHZkZTjiv1Sjr7Lpee/9Xfq/NGEzgyk3SqpIWTNZoDV AA8aIm/AYjIqcSguuvbRKNEyja0YXT7Cp4625/a+Ds70d7evsRymZBqnBFYZuaCI+IH/cEbWyBJMO ZkqdZ7Fvn+mQ0CJC7cF6gOGY3WEZJE0EOEHmxzEFE/PA+tHE5hv/hkG5c41xsNqRxNggj0j/TNcEZ +g9/QdWdFNgLOzwDnRZuoGzan5mVGIyCRFDvYuitZQnALmk1A54dYUYumt1aKkyltYw9XxOpG3Mk5 0PbwIxrrTudrelMJGiyg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nUDsL-00AZlo-5i; Tue, 15 Mar 2022 20:34:37 +0000 Received: from mail-oi1-x22c.google.com ([2607:f8b0:4864:20::22c]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nUDsH-00AZjy-IT for linux-arm-kernel@lists.infradead.org; Tue, 15 Mar 2022 20:34:34 +0000 Received: by mail-oi1-x22c.google.com with SMTP id b188so508029oia.13 for ; Tue, 15 Mar 2022 13:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=HuxzPBFcNGizUpUFeFsNjMUynwK/YImVehMj2Mb+h5A=; b=K3HGrbkU3n9yvQYCY/kzElGn6znziTraIc5bj7MoyQgemj6Jy89aSAsaWRBNeSQirf 5XAGCDiWwqOiXzbReSY3yKhwLgKrG7zIljAb0MxtNcmhyYqdc+BbqjG11NdnvjIGO7pF WhYkc0Pj2Lpv6CKFjhncgf5mGTCONSuF6qVBOAl3I/yiUt3sNVL9My/LEgJzuZ9PNz1+ KNp706Dq/JKj2fKDY5vdz4tqkqUxLy0tS2/hFySkXnSULhgAZIqeLKYu1+UvCMNJsCH5 IhISLb9TUOzNwHkwJbSji4zTaCSKtDvaR3Ge1ojWe2ATreVqSqdLJBV+Jk1aDaeLLEjs owUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=HuxzPBFcNGizUpUFeFsNjMUynwK/YImVehMj2Mb+h5A=; b=oGFlKwAGL18Aw2hDgogH5CTi1xQQzc53Qjv3T9xlAp62Iea2sGCl4BEcDk8wHYo/uW Dl2OFaH6/hF+C2BVXmjNKlYMKBu6h1SUR6T9dMNoHw+PhCr1NuMyK0MRl+vS1olb1X14 TGQGL+FK3s1A0GNO5J8n0TTEH9K1/QC1MeOd+yFw7+SDmENlXJ4ya2nap1atgzGBnzS4 bxHZG1LZP4eeer6kLMKgRyWPZcQS39geXGYoqdmtho7HlWBdI4fUbnwBZhfFary9ClEM wvuJmWJw1zRx08o2aQJ1PvwKPu8FTd7Eb3dQG+vWnZbAllFjnfARQqm6b/0h90gKxnBD fZog== X-Gm-Message-State: AOAM530YTgFl9Oj1XLFVTT1PJJ6T00qy7Xpp8QIL3sEZcLCV7ZML/Esd G0XUYZmIjyB8O4pA87//em54/g== X-Google-Smtp-Source: ABdhPJyIXsrs++KyhkbY8OaCTfEwShUw+XWKA+egNWequzBiJdYyCj3Gy31D2jBYT9eCoIr7cy9G0w== X-Received: by 2002:a54:4e11:0:b0:2ec:e0ee:ac29 with SMTP id a17-20020a544e11000000b002ece0eeac29mr2429740oiy.257.1647376470526; Tue, 15 Mar 2022 13:34:30 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a10-20020a056808120a00b002d404a71444sm95511oil.35.2022.03.15.13.34.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 13:34:29 -0700 (PDT) Date: Tue, 15 Mar 2022 13:34:02 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: David Hildenbrand cc: Andrew Morton , Andrew Yang , Matthias Brugger , Matthew Wilcox , Vlastimil Babka , David Howells , William Kucharski , Yang Shi , Marc Zyngier , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, wsd_upstream@mediatek.com, Nicholas Tang , Kuan-Ying Lee Subject: Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated In-Reply-To: <4cb789a5-c49c-f095-1f7e-67be65ba508a@redhat.com> Message-ID: <883877a-30b0-96e0-48a6-7cfc3c59de93@google.com> References: <20220315030515.20263-1-andrew.yang@mediatek.com> <20220314212127.a2797926ee0ef8a7ad05dcaa@linux-foundation.org> <4cb789a5-c49c-f095-1f7e-67be65ba508a@redhat.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220315_133433_689951_18290BBC X-CRM114-Status: GOOD ( 22.26 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, 15 Mar 2022, David Hildenbrand wrote: > On 15.03.22 05:21, Andrew Morton wrote: > > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang wrote: > > > >> When memory is tight, system may start to compact memory for large > >> continuous memory demands. If one process tries to lock a memory page > >> that is being locked and isolated for compaction, it may wait a long time > >> or even forever. This is because compaction will perform non-atomic > >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters > >> set by the process that can't obtain the page lock and add itself to the > >> waiting queue to wait for the lock to be unlocked. > >> > >> CPU1 CPU2 > >> lock_page(page); (successful) > >> lock_page(); (failed) > >> __ClearPageIsolated(page); SetPageWaiters(page) (may be overwritten) > >> unlock_page(page); > >> > >> The solution is to not perform non-atomic operation on page flags while > >> holding page lock. > > > > Sure, the non-atomic bitop optimization is really risky and I suspect > > we reach for it too often. Or at least without really clearly > > demonstrating that it is safe, and documenting our assumptions. > > I agree. IIRC, non-atomic variants are mostly only safe while the > refcount is 0. Everything else is just absolutely fragile. It is normal and correct to use __SetPageFlag(page) on a page just allocated from the buddy, and not yet logically visible to others: that has refcount 1. Of course, it might have refcount 2 or more, through being speculatively visible to get_page_unless_zero() users: perhaps through earlier usage of the same struct page, or by physical scan of memmap. Those few such others - compaction's isolate_migratepages_block() is the one I know best - must be very careful in their sequence of operations. Preliminary read-only checks are usually okay (but some VM_BUG_ON_PGFLAGS are increasingly problematic: I've had to turn off that CONFIG), then get_page_unless_zero(), then read-only check that the page is of the manageable kind (PageLRU in my world), and only then can it be safe to lock the page - which of course touches page flags, and so would be problematic for a racing user's __SetPageFlag(page). But PageMovable and PageIsolated are beyond my ken: I can't say there. Hugh _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel