* [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
[not found] <20230125015738.912924-1-zokeefe@google.com>
@ 2023-01-25 1:57 ` Zach O'Keefe
2023-01-25 12:54 ` kernel test robot
2023-01-25 13:38 ` kernel test robot
0 siblings, 2 replies; 4+ messages in thread
From: Zach O'Keefe @ 2023-01-25 1:57 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, Hugh Dickins, Yang Shi,
Zach O'Keefe, stable
In commit 34488399fa08 ("mm/madvise: add file and shmem support to
MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():
- if (!pmd_present(pmde))
- return SCAN_PMD_NULL;
+ if (pmd_none(pmde))
+ return SCAN_PMD_NONE;
This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE
might identify a pte-mapped hugepage, only to have khugepaged race-in, free
the pte table, and clear the pmd. Such codepaths include:
A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER
already in the pagecache.
B) In retract_page_tables(), if we fail to grab mmap_lock for the target
mm/address.
In these cases, collapse_pte_mapped_thp() really does expect a none (not
just !present) pmd, and we want to suitably identify that case separate
from the case where no pmd is found, or it's a bad-pmd (of course, many
things could happen once we drop mmap_lock, and the pmd could plausibly
undergo multiple transitions due to intervening fault, split, etc).
Regardless, the code is prepared install a huge-pmd only when the existing
pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.
However, the commit introduces a logical hole; namely, that we've allowed
!none- && !huge- && !bad-pmds to be classified as genuine
pte-table-mapping-pmds. One such example that could leak through are swap
entries. The pmd values aren't checked again before use in
pte_offset_map_lock(), which is expecting nothing less than a genuine
pte-table-mapping-pmd.
We want to put back the !pmd_present() check (below the pmd_none() check),
but need to be careful to deal with subtleties in pmd transitions and
treatments by various arch.
The issue is that __split_huge_pmd_locked() temporarily clears the present
bit (or otherwise marks the entry as invalid), but pmd_present()
and pmd_trans_huge() still need to return true while the pmd is in this
transitory state. For example, x86's pmd_present() also checks the
_PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also
checks a PMD_PRESENT_INVALID bit.
Covering all 4 cases for x86 (all checks done on the same pmd value):
1) pmd_present() && pmd_trans_huge()
All we actually know here is that the PSE bit is set. Either:
a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE
is set.
=> huge-pmd
b) We are currently racing with __split_huge_page(). The danger here
is that we proceed as-if we have a huge-pmd, but really we are
looking at a pte-mapping-pmd. So, what is the risk of this
danger?
The only relevant path is:
madvise_collapse() -> collapse_pte_mapped_thp()
Where we might just incorrectly report back "success", when really
the memory isn't pmd-backed. This is fine, since split could
happen immediately after (actually) successful madvise_collapse().
So, it should be safe to just assume huge-pmd here.
2) pmd_present() && !pmd_trans_huge()
Either:
a) PSE not set and either PRESENT or PROTNONE is.
=> pte-table-mapping pmd (or PROT_NONE)
b) devmap. This routine can be called immediately after
unlocking/locking mmap_lock -- or called with no locks held (see
khugepaged_scan_mm_slot()), so previous VMA checks have since been
invalidated.
3) !pmd_present() && pmd_trans_huge()
Not possible.
4) !pmd_present() && !pmd_trans_huge()
Neither PRESENT nor PROTNONE set
=> not present
I've checked all archs that implement pmd_trans_huge() (arm64, riscv,
powerpc, longarch, x86, mips, s390) and this logic roughly translates
(though devmap treatment is unique to x86 and powerpc, and (3) doesn't
necessarily hold in general -- but that doesn't matter since !pmd_present()
always takes failure path).
Also, add a comment above find_pmd_or_thp_or_none() to help future
travelers reason about the validity of the code; namely, the possible
mutations that might happen out from under us, depending on how
mmap_lock is held (if at all).
Fixes: 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: stable@vger.kernel.org
---
Request that this be pulled into stable since it's theoretically
possible (though I have no reproducer) that while mmap_lock is dropped,
racing thp migration installs a pmd migration entry which then has a path to
be consumed, unchecked, by pte_offset_map().
---
mm/khugepaged.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fa38cae240b9..7ea668bbea70 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -941,6 +941,10 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
return SCAN_SUCCEED;
}
+/*
+ * See pmd_trans_unstable() for how the result may change out from
+ * underneath us, even if we hold mmap_lock in read.
+ */
static int find_pmd_or_thp_or_none(struct mm_struct *mm,
unsigned long address,
pmd_t **pmd)
@@ -959,8 +963,12 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm,
#endif
if (pmd_none(pmde))
return SCAN_PMD_NONE;
+ if (!pmd_present(pmde))
+ return SCAN_PMD_NULL;
if (pmd_trans_huge(pmde))
return SCAN_PMD_MAPPED;
+ if (pmd_devmap(pmd))
+ return SCAN_PMD_NULL;
if (pmd_bad(pmde))
return SCAN_PMD_NULL;
return SCAN_SUCCEED;
--
2.39.1.405.gd4c25cc71f-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
2023-01-25 1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
@ 2023-01-25 12:54 ` kernel test robot
2023-01-25 13:38 ` kernel test robot
1 sibling, 0 replies; 4+ messages in thread
From: kernel test robot @ 2023-01-25 12:54 UTC (permalink / raw)
To: Zach O'Keefe, linux-mm
Cc: llvm, oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Hugh Dickins, Yang Shi,
Zach O'Keefe, stable
Hi Zach,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230125015738.912924-2-zokeefe%40google.com
patch subject: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
config: x86_64-randconfig-r025-20230123 (https://download.01.org/0day-ci/archive/20230125/202301252033.HoFIRXm4-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/6001eb9a8f1687a1d0b72831d991886106cac37b
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954
git checkout 6001eb9a8f1687a1d0b72831d991886106cac37b
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> mm/khugepaged.c:972:17: error: passing 'pmd_t **' to parameter of incompatible type 'pmd_t'
if (pmd_devmap(pmd))
^~~
arch/x86/include/asm/pgtable.h:254:36: note: passing argument to parameter 'pmd' here
static inline int pmd_devmap(pmd_t pmd)
^
1 error generated.
vim +972 mm/khugepaged.c
945
946 /*
947 * See pmd_trans_unstable() for how the result may change out from
948 * underneath us, even if we hold mmap_lock in read.
949 */
950 static int find_pmd_or_thp_or_none(struct mm_struct *mm,
951 unsigned long address,
952 pmd_t **pmd)
953 {
954 pmd_t pmde;
955
956 *pmd = mm_find_pmd(mm, address);
957 if (!*pmd)
958 return SCAN_PMD_NULL;
959
960 pmde = pmdp_get_lockless(*pmd);
961
962 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
963 /* See comments in pmd_none_or_trans_huge_or_clear_bad() */
964 barrier();
965 #endif
966 if (pmd_none(pmde))
967 return SCAN_PMD_NONE;
968 if (!pmd_present(pmde))
969 return SCAN_PMD_NULL;
970 if (pmd_trans_huge(pmde))
971 return SCAN_PMD_MAPPED;
> 972 if (pmd_devmap(pmd))
973 return SCAN_PMD_NULL;
974 if (pmd_bad(pmde))
975 return SCAN_PMD_NULL;
976 return SCAN_SUCCEED;
977 }
978
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
2023-01-25 1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
2023-01-25 12:54 ` kernel test robot
@ 2023-01-25 13:38 ` kernel test robot
2023-01-25 18:14 ` Zach O'Keefe
1 sibling, 1 reply; 4+ messages in thread
From: kernel test robot @ 2023-01-25 13:38 UTC (permalink / raw)
To: Zach O'Keefe, linux-mm
Cc: oe-kbuild-all, linux-kernel, Andrew Morton,
Linux Memory Management List, Hugh Dickins, Yang Shi,
Zach O'Keefe, stable
Hi Zach,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230125015738.912924-2-zokeefe%40google.com
patch subject: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20230125/202301252110.hFYRsbrm-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/6001eb9a8f1687a1d0b72831d991886106cac37b
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954
git checkout 6001eb9a8f1687a1d0b72831d991886106cac37b
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 olddefconfig
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
mm/khugepaged.c: In function 'find_pmd_or_thp_or_none':
>> mm/khugepaged.c:972:24: error: incompatible type for argument 1 of 'pmd_devmap'
972 | if (pmd_devmap(pmd))
| ^~~
| |
| pmd_t **
In file included from include/linux/pgtable.h:6,
from include/linux/mm.h:29,
from mm/khugepaged.c:4:
arch/x86/include/asm/pgtable.h:254:36: note: expected 'pmd_t' but argument is of type 'pmd_t **'
254 | static inline int pmd_devmap(pmd_t pmd)
| ~~~~~~^~~
vim +/pmd_devmap +972 mm/khugepaged.c
945
946 /*
947 * See pmd_trans_unstable() for how the result may change out from
948 * underneath us, even if we hold mmap_lock in read.
949 */
950 static int find_pmd_or_thp_or_none(struct mm_struct *mm,
951 unsigned long address,
952 pmd_t **pmd)
953 {
954 pmd_t pmde;
955
956 *pmd = mm_find_pmd(mm, address);
957 if (!*pmd)
958 return SCAN_PMD_NULL;
959
960 pmde = pmdp_get_lockless(*pmd);
961
962 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
963 /* See comments in pmd_none_or_trans_huge_or_clear_bad() */
964 barrier();
965 #endif
966 if (pmd_none(pmde))
967 return SCAN_PMD_NONE;
968 if (!pmd_present(pmde))
969 return SCAN_PMD_NULL;
970 if (pmd_trans_huge(pmde))
971 return SCAN_PMD_MAPPED;
> 972 if (pmd_devmap(pmd))
973 return SCAN_PMD_NULL;
974 if (pmd_bad(pmde))
975 return SCAN_PMD_NULL;
976 return SCAN_SUCCEED;
977 }
978
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
2023-01-25 13:38 ` kernel test robot
@ 2023-01-25 18:14 ` Zach O'Keefe
0 siblings, 0 replies; 4+ messages in thread
From: Zach O'Keefe @ 2023-01-25 18:14 UTC (permalink / raw)
To: kernel test robot
Cc: linux-mm, oe-kbuild-all, linux-kernel, Andrew Morton,
Hugh Dickins, Yang Shi, stable
Apologies here; shouldn't have overlooked the 4 line change. Will
follow-up with a v2 here in a second.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-01-25 18:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20230125015738.912924-1-zokeefe@google.com>
2023-01-25 1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
2023-01-25 12:54 ` kernel test robot
2023-01-25 13:38 ` kernel test robot
2023-01-25 18:14 ` Zach O'Keefe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).