From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17D2AC4332F for ; Tue, 20 Dec 2022 07:26:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233646AbiLTH0h (ORCPT ); Tue, 20 Dec 2022 02:26:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46406 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233639AbiLTHZs (ORCPT ); Tue, 20 Dec 2022 02:25:48 -0500 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4EF26146 for ; Mon, 19 Dec 2022 23:25:47 -0800 (PST) Received: by mail-pg1-x52b.google.com with SMTP id s196so7792778pgs.3 for ; Mon, 19 Dec 2022 23:25:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JcdvTJSUQptFc3AVHk0e+fX/02wn2fKXzabvFygcaXg=; b=eM2vnsugEhx9Fxi3d/0pz8nOmWBqT6WOzDrbW2LhnZvHmhw0chNBsYkvjN9VHvsJoE XDGGsi1BW5fvDjyFKw5DRpVKmd/q1UNAs2X7gpQCPCeM5unfWrK8STQMKuqo5m+tn9go 0+vnQSFdH2xF64ozR84c/+1qjLCTk8g3LGNK1HF4gauRIInyVcbu4uDtltrrzuP7hpYL 6xnBk5U0Sod8Q3OjEmpZwHe0xKZeNbnQo8Oosk8YJ3s0R8xbrKHj8svVdgtzpBKrYvC+ VkfDBibWFHr5ZAjwx+ekYkHpMvMzJNRc7L7rEtQKsT6YkAzjevQWGv8xW9Fmrvl9nbQC m0qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JcdvTJSUQptFc3AVHk0e+fX/02wn2fKXzabvFygcaXg=; b=u8IQ7ScBU/5UTwWjO31Vaa2tymI7En2WWsJKGrcu4PBIWH4IrIIhLohFmcqbOC1WOf U7VI2brV0SdXIGeKRv7fHAI2eJWC3VfH9ZFMmq6hETnOf33kZ5zPhXUUfe/JilG8QuPr mYRO+UAy2uqf91fdp4kV91B7/9tvHbAMKKOTdIRe8Vi3HWqurL5CIpxHsXEBDzmYvmqT VbYMyWwWcOAyhyRzdQecMSUXpcZfbgb8Et40PYMgLnMgiQpaPtSzA8z5+nTyfHNdy6s2 O73RY6T6MxT5CYFnn/aD7SfbSXBWEx9H3qwh3KFHyHkXYSIOJGWLDcD2DlMznCSLNXQS aHQg== X-Gm-Message-State: ANoB5pl0T4mRGMbR2i5nKgIz4957Ak7Hl0MVz6xqh8Ja2hhguqFOBhwf EeaQK6YUDHXVx1h6KzlcApU= X-Google-Smtp-Source: AA0mqf58qBYe4mGrS2Ei5/k0QEHKSaSFh9E15Ikmi+osvtgPbnlSPrbtQehqB3CGfj1ZyaecavszbA== X-Received: by 2002:aa7:814f:0:b0:56c:232e:3b00 with SMTP id d15-20020aa7814f000000b0056c232e3b00mr41607310pfn.7.1671521147136; Mon, 19 Dec 2022 23:25:47 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.25.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:25:46 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 05/14] mm/khugepaged: Break COW PTE before scanning pte Date: Tue, 20 Dec 2022 15:27:34 +0800 Message-Id: <20221220072743.3039060-6-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We should not allow THP to collapse COW-ed PTE. So, break COW PTE before collapse_pte_mapped_thp() collapse to THP. Also, break COW PTE before khugepaged_scan_pmd() scan PTE. Signed-off-by: Chih-En Lin --- include/trace/events/huge_memory.h | 1 + mm/khugepaged.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 760455dfa8600..881553aa0f2f2 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -13,6 +13,7 @@ EM( SCAN_PMD_NULL, "pmd_null") \ EM( SCAN_PMD_NONE, "pmd_none") \ EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ + EM( SCAN_COW_PTE, "cowed_pte") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a8d5ef2a77d24..106e1ce3931f7 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -31,6 +31,7 @@ enum scan_result { SCAN_PMD_NULL, SCAN_PMD_NONE, SCAN_PMD_MAPPED, + SCAN_COW_PTE, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, @@ -1030,6 +1031,9 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, if (result != SCAN_SUCCEED) goto out_up_write; + /* We should already handled COW-ed PTE. */ + VM_WARN_ON(test_bit(MMF_COW_PTE, &mm->flags) && !pmd_write(*pmd)); + anon_vma_lock_write(vma->anon_vma); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, @@ -1139,6 +1143,16 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + + /* + * Before we scan each pte entry, we should first check PTE + * could be modified. So, we break COW if PTE is COW-ed. + */ + if (break_cow_pte(vma, pmd, address) < 0) { + result = SCAN_COW_PTE; + goto out; + } + pte = pte_offset_map_lock(mm, pmd, address, &ptl); for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR; _pte++, _address += PAGE_SIZE) { @@ -1197,6 +1211,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, goto out_unmap; } + /* + * If we only trigger the break COW PTE, the page usually + * still in COW mapping, which it still be shared. + */ if (page_mapcount(page) > 1) { ++shared; if (cc->is_khugepaged && @@ -1472,6 +1490,11 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } + /* We shouldn't let COW-ed PTE collapse. */ + if (break_cow_pte(vma, pmd, haddr) < 0) + goto drop_hpage; + VM_WARN_ON(test_bit(MMF_COW_PTE, &mm->flags) && !pmd_write(*pmd)); + start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); result = SCAN_FAIL; -- 2.37.3