From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93418C433E0 for ; Tue, 16 Feb 2021 09:26:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4ECE064DA3 for ; Tue, 16 Feb 2021 09:26:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230010AbhBPJ0j (ORCPT ); Tue, 16 Feb 2021 04:26:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229916AbhBPJ0L (ORCPT ); Tue, 16 Feb 2021 04:26:11 -0500 Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C836C0613D6 for ; Tue, 16 Feb 2021 01:25:31 -0800 (PST) Received: by mail-lj1-x22a.google.com with SMTP id u4so10902721ljh.6 for ; Tue, 16 Feb 2021 01:25:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=konsulko.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=ZL5rlbjxVsMFMNC4kLAKrTGkX2YqE+jSsZ4fQeIWLjE=; b=sHlHzFqopMADfQ5d0JJkdYxB+PHGs8gAnMu7s0pRn33BBLrrOLZcSZIfCf0/AjML1d 01h6daPUhTznplC2tFeI9PSqCEs6s4MzdLMxvEQpaGJRn4KgCAV0UABS3NoMaTa0m72T 1ahY+6xR2aHpfK2IhPlS/2qtFeDfX5yf6b2TA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ZL5rlbjxVsMFMNC4kLAKrTGkX2YqE+jSsZ4fQeIWLjE=; b=FAS5sK+RnkDMF9NQLCCMQsEZ5bcdv0R9VbayJYmFSo1UHBT3gCceZn84CxsL9W0JNc 3ZuQ+cq5ic91Ab7ygCNGlUPpPrRspLZTV6sG1s9f0dCghmXTb7MURztjqifJjKy4yG5p QjvDAlqtWJ2+6ILARmKhPTq1XPg49SNvpv8oehCV+5ELR4dFA9LOQC5CLBfJocPMmqop uHRI6CYsOuII9FuKN6onfAlh4Irnb0KNRK22y0SA0GwxedcyYincA6Y8uTGlcmykWExU YDGLAv5kre0oC8F4FTGVF+yKAJVQBOJdzI/dEVRopRzp/3UUgp4fUlFA6gHPDRS39BGs SMJQ== X-Gm-Message-State: AOAM5333yAr+ISjUo3xCYuVp+MPK/mwuRgF8KjoDbIVGQUlB6zNNjL/p 9a4jSPyXrRJaszOlhfbH5eG8bBflG98FRGSK7wM4y42lBzBwAg== X-Google-Smtp-Source: ABdhPJwGpNqk9J0+m/RVQCl7105a0SvmpahvJoFwBxlshZuov4EM3MWKeHzLBL4WDxR8NLzbX6CaTX4eq1B7/GVEI3g= X-Received: by 2002:a2e:8803:: with SMTP id x3mr11091233ljh.173.1613467529563; Tue, 16 Feb 2021 01:25:29 -0800 (PST) MIME-Version: 1.0 References: <8c4f1cb7c51b03d2b2cd451a6404db8e269d94b7.1613465062.git.tommyhebb@gmail.com> In-Reply-To: <8c4f1cb7c51b03d2b2cd451a6404db8e269d94b7.1613465062.git.tommyhebb@gmail.com> From: Vitaly Wool Date: Tue, 16 Feb 2021 10:25:18 +0100 Message-ID: Subject: Re: [RFC PATCH] z3fold: prevent reclaim/free race for headless pages To: Thomas Hebb Cc: LKML , Andrew Morton , Greg Kroah-Hartman , Linux-MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Thomas, On Tue, Feb 16, 2021 at 9:44 AM Thomas Hebb wrote: > > commit ca0246bb97c2 ("z3fold: fix possible reclaim races") introduced > the PAGE_CLAIMED flag "to avoid racing on a z3fold 'headless' page > release." By atomically testing and setting the bit in each of > z3fold_free() and z3fold_reclaim_page(), a double-free was avoided. > > However, commit 746d179b0e66 ("z3fold: stricter locking and more careful > reclaim") appears to have unintentionally broken this behavior by moving > the PAGE_CLAIMED check in z3fold_reclaim_page() to after the page lock > gets taken, which only happens for non-headless pages. For headless > pages, the check is now skipped entirely and races can occur again. > > I have observed such a race on my system: > > page:00000000ffbd76b7 refcount:0 mapcount:0 mapping:0000000000000000 = index:0x0 pfn:0x165316 > flags: 0x2ffff0000000000() > raw: 02ffff0000000000 ffffea0004535f48 ffff8881d553a170 0000000000000= 000 > raw: 0000000000000000 0000000000000011 00000000ffffffff 0000000000000= 000 > page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) =3D=3D 0) > ------------[ cut here ]------------ > kernel BUG at include/linux/mm.h:707! > invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI > CPU: 2 PID: 291928 Comm: kworker/2:0 Tainted: G B 5.10= .7-arch1-1-kasan #1 > Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIO= S F9b 03/03/2016 > Workqueue: zswap-shrink shrink_worker > RIP: 0010:__free_pages+0x10a/0x130 > Code: c1 e7 06 48 01 ef 45 85 e4 74 d1 44 89 e6 31 d2 41 83 ec 01 e8 = e7 b0 ff ff eb da 48 c7 c6 e0 32 91 88 48 89 ef e8 a6 89 f8 ff <0f> 0b 4c 8= 9 e7 e8 fc 79 07 00 e9 33 ff ff ff 48 89 ef e8 ff 79 07 > RSP: 0000:ffff88819a2ffb98 EFLAGS: 00010296 > RAX: 0000000000000000 RBX: ffffea000594c5a8 RCX: 0000000000000000 > RDX: 1ffffd4000b298b7 RSI: 0000000000000000 RDI: ffffea000594c5b8 > RBP: ffffea000594c580 R08: 000000000000003e R09: ffff8881d5520bbb > R10: ffffed103aaa4177 R11: 0000000000000001 R12: ffffea000594c5b4 > R13: 0000000000000000 R14: ffff888165316000 R15: ffffea000594c588 > FS: 0000000000000000(0000) GS:ffff8881d5500000(0000) knlGS:000000000= 0000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f7c8c3654d8 CR3: 0000000103f42004 CR4: 00000000001706e0 > Call Trace: > z3fold_zpool_shrink+0x9b6/0x1240 > ? sugov_update_single+0x357/0x990 > ? sched_clock+0x5/0x10 > ? sched_clock_cpu+0x18/0x180 > ? z3fold_zpool_map+0x490/0x490 > ? _raw_spin_lock_irq+0x88/0xe0 > shrink_worker+0x35/0x90 > process_one_work+0x70c/0x1210 > ? pwq_dec_nr_in_flight+0x15b/0x2a0 > worker_thread+0x539/0x1200 > ? __kthread_parkme+0x73/0x120 > ? rescuer_thread+0x1000/0x1000 > kthread+0x330/0x400 > ? __kthread_bind_mask+0x90/0x90 > ret_from_fork+0x22/0x30 > Modules linked in: rfcomm ebtable_filter ebtables ip6table_filter ip6= _tables iptable_filter ccm algif_aead des_generic libdes ecb algif_skcipher= cmac bnep md4 algif_hash af_alg vfat fat intel_rapl_msr intel_rapl_common = x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm hid_logitec= h_hidpp kvm at24 mac80211 snd_hda_codec_realtek iTCO_wdt snd_hda_codec_gene= ric intel_pmc_bxt snd_hda_codec_hdmi ledtrig_audio iTCO_vendor_support mei_= wdt mei_hdcp snd_hda_intel snd_intel_dspcfg libarc4 soundwire_intel irqbypa= ss iwlwifi soundwire_generic_allocation rapl soundwire_cadence intel_cstate= snd_hda_codec intel_uncore btusb joydev mousedev snd_usb_audio pcspkr btrt= l uvcvideo nouveau btbcm i2c_i801 btintel snd_hda_core videobuf2_vmalloc i2= c_smbus snd_usbmidi_lib videobuf2_memops bluetooth snd_hwdep soundwire_bus = snd_soc_rt5640 videobuf2_v4l2 cfg80211 snd_soc_rl6231 videobuf2_common snd_= rawmidi lpc_ich alx videodev mdio snd_seq_device snd_soc_core mc ecdh_gener= ic mxm_wmi mei_me > hid_logitech_dj wmi snd_compress e1000e ac97_bus mei ttm rfkill snd_= pcm_dmaengine ecc snd_pcm snd_timer snd soundcore mac_hid acpi_pad pkcs8_ke= y_parser it87 hwmon_vid crypto_user fuse ip_tables x_tables ext4 crc32c_gen= eric crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm rng_core us= bhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel = aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas i915 v= ideo intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgbl= t fb_sys_fops cec drm agpgart > ---[ end trace 126d646fc3dc0ad8 ]--- > > To fix the issue, re-add the earlier test and set in the case where we > have a headless page. > > Fixes: 746d179b0e66 ("z3fold: stricter locking and more careful reclaim") > Signed-off-by: Thomas Hebb > --- > I have NOT tested this patch yet beyond compiling it. If the approach > seems good, I'll test it on my system for a period of several days and > see if I can reproduce the crash before sending a v1. thanks for debugging this, please proceed with the testing. Best regards, Vitaly > mm/z3fold.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/mm/z3fold.c b/mm/z3fold.c > index 0152ad9931a8..8ae944eeb8e2 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -1350,8 +1350,22 @@ static int z3fold_reclaim_page(struct z3fold_pool = *pool, unsigned int retries) > page =3D list_entry(pos, struct page, lru); > > zhdr =3D page_address(page); > - if (test_bit(PAGE_HEADLESS, &page->private)) > + if (test_bit(PAGE_HEADLESS, &page->private)) { > + /* > + * For non-headless pages, we wait to do = this > + * until we have the page lock to avoid r= acing > + * with __z3fold_alloc(). Headless pages = don't > + * have a lock (and __z3fold_alloc() will= never > + * see them), but we still need to test a= nd set > + * PAGE_CLAIMED to avoid racing with > + * z3fold_free(), so just do it now befor= e > + * leaving the loop. > + */ > + if (test_and_set_bit(PAGE_CLAIMED, &page-= >private)) > + continue; > + > break; > + } > > if (kref_get_unless_zero(&zhdr->refcount) =3D=3D = 0) { > zhdr =3D NULL; > -- > 2.30.0 > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA13BC433E0 for ; Tue, 16 Feb 2021 09:25:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 538FE64DA1 for ; Tue, 16 Feb 2021 09:25:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 538FE64DA1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=konsulko.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D257E8D0165; Tue, 16 Feb 2021 04:25:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CD6168D0140; Tue, 16 Feb 2021 04:25:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE9BE8D0165; Tue, 16 Feb 2021 04:25:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id A74C98D0140 for ; Tue, 16 Feb 2021 04:25:33 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 7408CC5A7 for ; Tue, 16 Feb 2021 09:25:33 +0000 (UTC) X-FDA: 77823598146.29.D0CF0A4 Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) by imf24.hostedemail.com (Postfix) with ESMTP id CDDA4A0009C5 for ; Tue, 16 Feb 2021 09:25:28 +0000 (UTC) Received: by mail-lj1-f175.google.com with SMTP id b16so10866171lji.13 for ; Tue, 16 Feb 2021 01:25:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=konsulko.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=ZL5rlbjxVsMFMNC4kLAKrTGkX2YqE+jSsZ4fQeIWLjE=; b=sHlHzFqopMADfQ5d0JJkdYxB+PHGs8gAnMu7s0pRn33BBLrrOLZcSZIfCf0/AjML1d 01h6daPUhTznplC2tFeI9PSqCEs6s4MzdLMxvEQpaGJRn4KgCAV0UABS3NoMaTa0m72T 1ahY+6xR2aHpfK2IhPlS/2qtFeDfX5yf6b2TA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ZL5rlbjxVsMFMNC4kLAKrTGkX2YqE+jSsZ4fQeIWLjE=; b=ETAXyZzaQHlylqhGlYniD7mIhp4KT4BlARzgfCrg97M9Fc67wpQocKYWEDNY24Bxgl 3k3UJQdQSfPX3EKKtOJIApE0++Bn2DjGYOo2WQgp5eVIBJ6s2uTxVG8y0ZK2HL9AK0OV 7huY+snHsP6WLJ9k136XVK31Gx8KOriAUeTfolXPLtrlnX4FzU8RfaaUzNYq+EanmJyS CDDEEh8ynw86I+WPKeioXabRl7r8ZezlAt+TT8u/5OtVWa9AI1O0fCg833bJBUuTaIrE b5l2+Q3s5hY6+xw8fMZYckG5F5j98pyKTXXArvP8LQB7+0h4CsyT6f3pSpaY3Zy2VgYG yLLw== X-Gm-Message-State: AOAM530t38A9xp6vejl5raRFC2oX+GXiJu1ratMCUqdFupKd0gX/fBbg ND1IF7T0Hm9PnIx5223J8Og5crr8MVXWhb27yTF+KQ== X-Google-Smtp-Source: ABdhPJwGpNqk9J0+m/RVQCl7105a0SvmpahvJoFwBxlshZuov4EM3MWKeHzLBL4WDxR8NLzbX6CaTX4eq1B7/GVEI3g= X-Received: by 2002:a2e:8803:: with SMTP id x3mr11091233ljh.173.1613467529563; Tue, 16 Feb 2021 01:25:29 -0800 (PST) MIME-Version: 1.0 References: <8c4f1cb7c51b03d2b2cd451a6404db8e269d94b7.1613465062.git.tommyhebb@gmail.com> In-Reply-To: <8c4f1cb7c51b03d2b2cd451a6404db8e269d94b7.1613465062.git.tommyhebb@gmail.com> From: Vitaly Wool Date: Tue, 16 Feb 2021 10:25:18 +0100 Message-ID: Subject: Re: [RFC PATCH] z3fold: prevent reclaim/free race for headless pages To: Thomas Hebb Cc: LKML , Andrew Morton , Greg Kroah-Hartman , Linux-MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: djtdcew6g5ejz5gwy7e9xjh1qnj6dz59 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CDDA4A0009C5 Received-SPF: none (konsulko.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=mail-lj1-f175.google.com; client-ip=209.85.208.175 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613467528-562556 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Thomas, On Tue, Feb 16, 2021 at 9:44 AM Thomas Hebb wrote: > > commit ca0246bb97c2 ("z3fold: fix possible reclaim races") introduced > the PAGE_CLAIMED flag "to avoid racing on a z3fold 'headless' page > release." By atomically testing and setting the bit in each of > z3fold_free() and z3fold_reclaim_page(), a double-free was avoided. > > However, commit 746d179b0e66 ("z3fold: stricter locking and more careful > reclaim") appears to have unintentionally broken this behavior by moving > the PAGE_CLAIMED check in z3fold_reclaim_page() to after the page lock > gets taken, which only happens for non-headless pages. For headless > pages, the check is now skipped entirely and races can occur again. > > I have observed such a race on my system: > > page:00000000ffbd76b7 refcount:0 mapcount:0 mapping:0000000000000000 = index:0x0 pfn:0x165316 > flags: 0x2ffff0000000000() > raw: 02ffff0000000000 ffffea0004535f48 ffff8881d553a170 0000000000000= 000 > raw: 0000000000000000 0000000000000011 00000000ffffffff 0000000000000= 000 > page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) =3D=3D 0) > ------------[ cut here ]------------ > kernel BUG at include/linux/mm.h:707! > invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI > CPU: 2 PID: 291928 Comm: kworker/2:0 Tainted: G B 5.10= .7-arch1-1-kasan #1 > Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIO= S F9b 03/03/2016 > Workqueue: zswap-shrink shrink_worker > RIP: 0010:__free_pages+0x10a/0x130 > Code: c1 e7 06 48 01 ef 45 85 e4 74 d1 44 89 e6 31 d2 41 83 ec 01 e8 = e7 b0 ff ff eb da 48 c7 c6 e0 32 91 88 48 89 ef e8 a6 89 f8 ff <0f> 0b 4c 8= 9 e7 e8 fc 79 07 00 e9 33 ff ff ff 48 89 ef e8 ff 79 07 > RSP: 0000:ffff88819a2ffb98 EFLAGS: 00010296 > RAX: 0000000000000000 RBX: ffffea000594c5a8 RCX: 0000000000000000 > RDX: 1ffffd4000b298b7 RSI: 0000000000000000 RDI: ffffea000594c5b8 > RBP: ffffea000594c580 R08: 000000000000003e R09: ffff8881d5520bbb > R10: ffffed103aaa4177 R11: 0000000000000001 R12: ffffea000594c5b4 > R13: 0000000000000000 R14: ffff888165316000 R15: ffffea000594c588 > FS: 0000000000000000(0000) GS:ffff8881d5500000(0000) knlGS:000000000= 0000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f7c8c3654d8 CR3: 0000000103f42004 CR4: 00000000001706e0 > Call Trace: > z3fold_zpool_shrink+0x9b6/0x1240 > ? sugov_update_single+0x357/0x990 > ? sched_clock+0x5/0x10 > ? sched_clock_cpu+0x18/0x180 > ? z3fold_zpool_map+0x490/0x490 > ? _raw_spin_lock_irq+0x88/0xe0 > shrink_worker+0x35/0x90 > process_one_work+0x70c/0x1210 > ? pwq_dec_nr_in_flight+0x15b/0x2a0 > worker_thread+0x539/0x1200 > ? __kthread_parkme+0x73/0x120 > ? rescuer_thread+0x1000/0x1000 > kthread+0x330/0x400 > ? __kthread_bind_mask+0x90/0x90 > ret_from_fork+0x22/0x30 > Modules linked in: rfcomm ebtable_filter ebtables ip6table_filter ip6= _tables iptable_filter ccm algif_aead des_generic libdes ecb algif_skcipher= cmac bnep md4 algif_hash af_alg vfat fat intel_rapl_msr intel_rapl_common = x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm hid_logitec= h_hidpp kvm at24 mac80211 snd_hda_codec_realtek iTCO_wdt snd_hda_codec_gene= ric intel_pmc_bxt snd_hda_codec_hdmi ledtrig_audio iTCO_vendor_support mei_= wdt mei_hdcp snd_hda_intel snd_intel_dspcfg libarc4 soundwire_intel irqbypa= ss iwlwifi soundwire_generic_allocation rapl soundwire_cadence intel_cstate= snd_hda_codec intel_uncore btusb joydev mousedev snd_usb_audio pcspkr btrt= l uvcvideo nouveau btbcm i2c_i801 btintel snd_hda_core videobuf2_vmalloc i2= c_smbus snd_usbmidi_lib videobuf2_memops bluetooth snd_hwdep soundwire_bus = snd_soc_rt5640 videobuf2_v4l2 cfg80211 snd_soc_rl6231 videobuf2_common snd_= rawmidi lpc_ich alx videodev mdio snd_seq_device snd_soc_core mc ecdh_gener= ic mxm_wmi mei_me > hid_logitech_dj wmi snd_compress e1000e ac97_bus mei ttm rfkill snd_= pcm_dmaengine ecc snd_pcm snd_timer snd soundcore mac_hid acpi_pad pkcs8_ke= y_parser it87 hwmon_vid crypto_user fuse ip_tables x_tables ext4 crc32c_gen= eric crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm rng_core us= bhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel = aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas i915 v= ideo intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgbl= t fb_sys_fops cec drm agpgart > ---[ end trace 126d646fc3dc0ad8 ]--- > > To fix the issue, re-add the earlier test and set in the case where we > have a headless page. > > Fixes: 746d179b0e66 ("z3fold: stricter locking and more careful reclaim") > Signed-off-by: Thomas Hebb > --- > I have NOT tested this patch yet beyond compiling it. If the approach > seems good, I'll test it on my system for a period of several days and > see if I can reproduce the crash before sending a v1. thanks for debugging this, please proceed with the testing. Best regards, Vitaly > mm/z3fold.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/mm/z3fold.c b/mm/z3fold.c > index 0152ad9931a8..8ae944eeb8e2 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -1350,8 +1350,22 @@ static int z3fold_reclaim_page(struct z3fold_pool = *pool, unsigned int retries) > page =3D list_entry(pos, struct page, lru); > > zhdr =3D page_address(page); > - if (test_bit(PAGE_HEADLESS, &page->private)) > + if (test_bit(PAGE_HEADLESS, &page->private)) { > + /* > + * For non-headless pages, we wait to do = this > + * until we have the page lock to avoid r= acing > + * with __z3fold_alloc(). Headless pages = don't > + * have a lock (and __z3fold_alloc() will= never > + * see them), but we still need to test a= nd set > + * PAGE_CLAIMED to avoid racing with > + * z3fold_free(), so just do it now befor= e > + * leaving the loop. > + */ > + if (test_and_set_bit(PAGE_CLAIMED, &page-= >private)) > + continue; > + > break; > + } > > if (kref_get_unless_zero(&zhdr->refcount) =3D=3D = 0) { > zhdr =3D NULL; > -- > 2.30.0 >