From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CA4AC433F5 for ; Wed, 16 Feb 2022 09:48:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D57F6B007B; Wed, 16 Feb 2022 04:48:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8853C6B007D; Wed, 16 Feb 2022 04:48:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7861F6B007E; Wed, 16 Feb 2022 04:48:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0161.hostedemail.com [216.40.44.161]) by kanga.kvack.org (Postfix) with ESMTP id 6A0E96B007B for ; Wed, 16 Feb 2022 04:48:53 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 237CC894CE for ; Wed, 16 Feb 2022 09:48:53 +0000 (UTC) X-FDA: 79148168946.15.B5DC920 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 7E2C91A0006 for ; Wed, 16 Feb 2022 09:48:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645004932; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r0OLEdIrUqYcaY6DYkFVTZPLD4+1hLYjm8QOXt2Ctgs=; b=bnWmuNK7ZH3+P2nYPLbVQMDnt3FgmCA8z4gA6xmEfVLqTRYip1lwzjAUc6gSLH3c30C+qj EA83DxdqWoohe0SaCfz9g3sAFsLzCzMl0WKMVq0QotNAJMVtgWsgnRMxhN+VKm8fw16vog em+BPCW4bp7tHW0fNEdFfFtXe+mp+NY= Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-155-5ZgI5VFnNu2HSUFcvJV9yQ-1; Wed, 16 Feb 2022 04:48:50 -0500 X-MC-Unique: 5ZgI5VFnNu2HSUFcvJV9yQ-1 Received: by mail-pj1-f71.google.com with SMTP id f6-20020a17090a654600b001b9e4758439so1104573pjs.1 for ; Wed, 16 Feb 2022 01:48:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r0OLEdIrUqYcaY6DYkFVTZPLD4+1hLYjm8QOXt2Ctgs=; b=W9Ic4mN3MexBUj4D/H41nUL+2kyRFcIoL7unDUzfYDfZ2Qidm7YrmbDH9bBfTiWsYv iOPNrK7l0RpH/+iseB0xryC1IgUY0AeY2SiXGoxoD/TVEuoyE5eNZuxTivepVhXNz8ed QO0dFyWYq8SreNNz6PQ+XDmwUpmUYYPK6+NrAc0kXdqDbjF1MQ29yjbpG5yOODKaEOfm 66XwG6PYQycYe68+FN/0+N2IPrKa2i2+vdzDSO0d1tKl3Tq+ueSl4/VCIDSu/WiaCXok vydR+gxE8s5uzB317KAZNEaV9C4UfPlaCUZEgLGfmGpwjtYpIQ08r7AIGnfDYU9IrWG6 FbAA== X-Gm-Message-State: AOAM531OJw43SqtJGNXgofHKx7J5O433MVgcwtfol8IOMZgaC1oNnyXy pxeKvI9LTJ+6Xva7sLE3U2iLzx7+4/gvBb0HjdWP2lRrTyax4v5f8EOVpE3Uc6oSaaKZwHG8wod VaRlK41g7D4c= X-Received: by 2002:a17:902:6acc:b0:149:8f60:a526 with SMTP id i12-20020a1709026acc00b001498f60a526mr2088573plt.25.1645004929853; Wed, 16 Feb 2022 01:48:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJy8ro/PQOMeJ2c0+5tdk79xjZJksrx44v8baDKpAy/uaeFgFPs2vRvtMpMFMhDbUf7ALQrB3g== X-Received: by 2002:a17:902:6acc:b0:149:8f60:a526 with SMTP id i12-20020a1709026acc00b001498f60a526mr2088543plt.25.1645004929405; Wed, 16 Feb 2022 01:48:49 -0800 (PST) Received: from localhost.localdomain ([64.64.123.81]) by smtp.gmail.com with ESMTPSA id qe7sm11567835pjb.25.2022.02.16.01.48.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 16 Feb 2022 01:48:49 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , "Kirill A . Shutemov" , Matthew Wilcox , Yang Shi , Andrea Arcangeli , peterx@redhat.com, John Hubbard , Alistair Popple , David Hildenbrand , Vlastimil Babka , Hugh Dickins Subject: [PATCH v4 1/4] mm: Don't skip swap entry even if zap_details specified Date: Wed, 16 Feb 2022 17:48:07 +0800 Message-Id: <20220216094810.60572-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220216094810.60572-1-peterx@redhat.com> References: <20220216094810.60572-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bnWmuNK7; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf19.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7E2C91A0006 X-Stat-Signature: 4u6m6bkuz7z8o878inu7pr3npizcf6q8 X-HE-Tag: 1645004932-247898 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The "details" pointer shouldn't be the token to decide whether we should = skip swap entries. For example, when the user specified details->zap_mapping=3D= =3DNULL, it means the user wants to zap all the pages (including COWed pages), the= n we need to look into swap entries because there can be private COWed pages t= hat was swapped out. Skipping some swap entries when details is non-NULL may lead to wrongly l= eaving some of the swap entries while we should have zapped them. A reproducer of the problem: =3D=3D=3D8<=3D=3D=3D #define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include #include #include #include int page_size; int shmem_fd; char *buffer; void main(void) { int ret; char val; page_size =3D getpagesize(); shmem_fd =3D memfd_create("test", 0); assert(shmem_fd >=3D 0); ret =3D ftruncate(shmem_fd, page_size * 2); assert(ret =3D=3D 0); buffer =3D mmap(NULL, page_size * 2, PROT_READ | PROT_WRI= TE, MAP_PRIVATE, shmem_fd, 0); assert(buffer !=3D MAP_FAILED); /* Write private page, swap it out */ buffer[page_size] =3D 1; madvise(buffer, page_size * 2, MADV_PAGEOUT); /* This should drop private buffer[page_size] already */ ret =3D ftruncate(shmem_fd, page_size); assert(ret =3D=3D 0); /* Recover the size */ ret =3D ftruncate(shmem_fd, page_size * 2); assert(ret =3D=3D 0); /* Re-read the data, it should be all zero */ val =3D buffer[page_size]; if (val =3D=3D 0) printf("Good\n"); else printf("BUG\n"); } =3D=3D=3D8<=3D=3D=3D We don't need to touch up the pmd path, because pmd never had a issue wit= h swap entries. For example, shmem pmd migration will always be split into pte = level, and same to swapping on anonymous. Add another helper should_zap_cows() so that we can also check whether we should zap private mappings when there's no page pointer specified. This patch drops that trick, so we handle swap ptes coherently. Meanwhil= e we should do the same check upon migration entry, hwpoison entry and genuine= swap entries too. To be explicit, we should still remember to keep the privat= e entries if even_cows=3D=3Dfalse, and always zap them when even_cows=3D=3D= true. The issue seems to exist starting from the initial commit of git. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peter Xu --- mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 36 insertions(+), 9 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..4bfeaca7cbc7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1313,6 +1313,17 @@ struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ }; =20 +/* Whether we should zap all COWed (private) pages too */ +static inline bool should_zap_cows(struct zap_details *details) +{ + /* By default, zap all pages */ + if (!details) + return true; + + /* Or, we zap COWed pages only if the caller wants to */ + return !details->zap_mapping; +} + /* * We set details->zap_mapping when we want to unmap shared but keep pri= vate * pages. Return true if skip zapping this page, false otherwise. @@ -1320,11 +1331,15 @@ struct zap_details { static inline bool zap_skip_check_mapping(struct zap_details *details, struct page *page) { - if (!details || !page) + /* If we can make a decision without *page.. */ + if (should_zap_cows(details)) return false; =20 - return details->zap_mapping && - (details->zap_mapping !=3D page_rmapping(page)); + /* E.g. zero page */ + if (!page) + return false; + + return details->zap_mapping !=3D page_rmapping(page); } =20 static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -1405,17 +1420,29 @@ static unsigned long zap_pte_range(struct mmu_gat= her *tlb, continue; } =20 - /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) - continue; - - if (!non_swap_entry(entry)) + if (!non_swap_entry(entry)) { + /* + * If this is a genuine swap entry, then it must be an + * private anon page. If the caller wants to skip + * COWed pages, ignore it. + */ + if (!should_zap_cows(details)) + continue; rss[MM_SWAPENTS]--; - else if (is_migration_entry(entry)) { + } else if (is_migration_entry(entry)) { struct page *page; =20 page =3D pfn_swap_entry_to_page(entry); + if (zap_skip_check_mapping(details, page)) + continue; rss[mm_counter(page)]--; + } else if (is_hwpoison_entry(entry)) { + /* If the caller wants to skip COWed pages, ignore it */ + if (!should_zap_cows(details)) + continue; + } else { + /* We should have covered all the swap entry types */ + WARN_ON_ONCE(1); } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); --=20 2.32.0