linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
@ 2023-01-25  1:57 Zach O'Keefe
  2023-01-25  1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Zach O'Keefe @ 2023-01-25  1:57 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, Yang Shi, Zach O'Keefe

During collapse, in a few places we check to see if a given small page
has any unaccounted references.  If the refcount on the page doesn't
match our expectations, it must be there is an unknown user concurrently
interested in the page, and so it's not safe to move the contents
elsewhere. However, the unaccounted pins are likely an ephemeral state.

In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
collapse may succeed on retry.

Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>

---
 mm/khugepaged.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e23619bfecc4..fa38cae240b9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r)
 	case SCAN_CGROUP_CHARGE_FAIL:
 		return -EBUSY;
 	/* Resource temporary unavailable - trying again might succeed */
+	case SCAN_PAGE_COUNT:
 	case SCAN_PAGE_LOCK:
 	case SCAN_PAGE_LRU:
 	case SCAN_DEL_PAGE_LRU:
-- 
2.39.1.405.gd4c25cc71f-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
  2023-01-25  1:57 [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Zach O'Keefe
@ 2023-01-25  1:57 ` Zach O'Keefe
  2023-01-25 12:54   ` kernel test robot
  2023-01-25 13:38   ` kernel test robot
  2023-01-25 18:06 ` [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Yang Shi
  2023-02-09  5:09 ` Hugh Dickins
  2 siblings, 2 replies; 12+ messages in thread
From: Zach O'Keefe @ 2023-01-25  1:57 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, Yang Shi,
	Zach O'Keefe, stable

In commit 34488399fa08 ("mm/madvise: add file and shmem support to
MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():

	-       if (!pmd_present(pmde))
	-               return SCAN_PMD_NULL;
	+       if (pmd_none(pmde))
	+               return SCAN_PMD_NONE;

This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE
might identify a pte-mapped hugepage, only to have khugepaged race-in, free
the pte table, and clear the pmd.  Such codepaths include:

A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER
   already in the pagecache.
B) In retract_page_tables(), if we fail to grab mmap_lock for the target
   mm/address.

In these cases, collapse_pte_mapped_thp() really does expect a none (not
just !present) pmd, and we want to suitably identify that case separate
from the case where no pmd is found, or it's a bad-pmd (of course, many
things could happen once we drop mmap_lock, and the pmd could plausibly
undergo multiple transitions due to intervening fault, split, etc).
Regardless, the code is prepared install a huge-pmd only when the existing
pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.

However, the commit introduces a logical hole; namely, that we've allowed
!none- && !huge- && !bad-pmds to be classified as genuine
pte-table-mapping-pmds.  One such example that could leak through are swap
entries.  The pmd values aren't checked again before use in
pte_offset_map_lock(), which is expecting nothing less than a genuine
pte-table-mapping-pmd.

We want to put back the !pmd_present() check (below the pmd_none() check),
but need to be careful to deal with subtleties in pmd transitions and
treatments by various arch.

The issue is that __split_huge_pmd_locked() temporarily clears the present
bit (or otherwise marks the entry as invalid), but pmd_present()
and pmd_trans_huge() still need to return true while the pmd is in this
transitory state.  For example, x86's pmd_present() also checks the
_PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also
checks a PMD_PRESENT_INVALID bit.

Covering all 4 cases for x86 (all checks done on the same pmd value):

1) pmd_present() && pmd_trans_huge()
   All we actually know here is that the PSE bit is set. Either:
   a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE
      is set.
      => huge-pmd
   b) We are currently racing with __split_huge_page().  The danger here
      is that we proceed as-if we have a huge-pmd, but really we are
      looking at a pte-mapping-pmd.  So, what is the risk of this
      danger?

      The only relevant path is:

	madvise_collapse() -> collapse_pte_mapped_thp()

      Where we might just incorrectly report back "success", when really
      the memory isn't pmd-backed.  This is fine, since split could
      happen immediately after (actually) successful madvise_collapse().
      So, it should be safe to just assume huge-pmd here.

2) pmd_present() && !pmd_trans_huge()
   Either:
   a) PSE not set and either PRESENT or PROTNONE is.
      => pte-table-mapping pmd (or PROT_NONE)
   b) devmap.  This routine can be called immediately after
      unlocking/locking mmap_lock -- or called with no locks held (see
      khugepaged_scan_mm_slot()), so previous VMA checks have since been
      invalidated.

3) !pmd_present() && pmd_trans_huge()
  Not possible.

4) !pmd_present() && !pmd_trans_huge()
  Neither PRESENT nor PROTNONE set
  => not present

I've checked all archs that implement pmd_trans_huge() (arm64, riscv,
powerpc, longarch, x86, mips, s390) and this logic roughly translates
(though devmap treatment is unique to x86 and powerpc, and (3) doesn't
necessarily hold in general -- but that doesn't matter since !pmd_present()
always takes failure path).

Also, add a comment above find_pmd_or_thp_or_none() to help future
travelers reason about the validity of the code; namely, the possible
mutations that might happen out from under us, depending on how
mmap_lock is held (if at all).

Fixes: 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: stable@vger.kernel.org

---
Request that this be pulled into stable since it's theoretically
possible (though I have no reproducer) that while mmap_lock is dropped,
racing thp migration installs a pmd migration entry which then has a path to
be consumed, unchecked, by pte_offset_map().
---
 mm/khugepaged.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fa38cae240b9..7ea668bbea70 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -941,6 +941,10 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
 	return SCAN_SUCCEED;
 }
 
+/*
+ * See pmd_trans_unstable() for how the result may change out from
+ * underneath us, even if we hold mmap_lock in read.
+ */
 static int find_pmd_or_thp_or_none(struct mm_struct *mm,
 				   unsigned long address,
 				   pmd_t **pmd)
@@ -959,8 +963,12 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm,
 #endif
 	if (pmd_none(pmde))
 		return SCAN_PMD_NONE;
+	if (!pmd_present(pmde))
+		return SCAN_PMD_NULL;
 	if (pmd_trans_huge(pmde))
 		return SCAN_PMD_MAPPED;
+	if (pmd_devmap(pmd))
+		return SCAN_PMD_NULL;
 	if (pmd_bad(pmde))
 		return SCAN_PMD_NULL;
 	return SCAN_SUCCEED;
-- 
2.39.1.405.gd4c25cc71f-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
  2023-01-25  1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
@ 2023-01-25 12:54   ` kernel test robot
  2023-01-25 13:38   ` kernel test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-01-25 12:54 UTC (permalink / raw)
  To: Zach O'Keefe, linux-mm
  Cc: llvm, oe-kbuild-all, linux-kernel, Andrew Morton,
	Linux Memory Management List, Hugh Dickins, Yang Shi,
	Zach O'Keefe, stable

Hi Zach,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230125015738.912924-2-zokeefe%40google.com
patch subject: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
config: x86_64-randconfig-r025-20230123 (https://download.01.org/0day-ci/archive/20230125/202301252033.HoFIRXm4-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/6001eb9a8f1687a1d0b72831d991886106cac37b
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954
        git checkout 6001eb9a8f1687a1d0b72831d991886106cac37b
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/khugepaged.c:972:17: error: passing 'pmd_t **' to parameter of incompatible type 'pmd_t'
           if (pmd_devmap(pmd))
                          ^~~
   arch/x86/include/asm/pgtable.h:254:36: note: passing argument to parameter 'pmd' here
   static inline int pmd_devmap(pmd_t pmd)
                                      ^
   1 error generated.


vim +972 mm/khugepaged.c

   945	
   946	/*
   947	 * See pmd_trans_unstable() for how the result may change out from
   948	 * underneath us, even if we hold mmap_lock in read.
   949	 */
   950	static int find_pmd_or_thp_or_none(struct mm_struct *mm,
   951					   unsigned long address,
   952					   pmd_t **pmd)
   953	{
   954		pmd_t pmde;
   955	
   956		*pmd = mm_find_pmd(mm, address);
   957		if (!*pmd)
   958			return SCAN_PMD_NULL;
   959	
   960		pmde = pmdp_get_lockless(*pmd);
   961	
   962	#ifdef CONFIG_TRANSPARENT_HUGEPAGE
   963		/* See comments in pmd_none_or_trans_huge_or_clear_bad() */
   964		barrier();
   965	#endif
   966		if (pmd_none(pmde))
   967			return SCAN_PMD_NONE;
   968		if (!pmd_present(pmde))
   969			return SCAN_PMD_NULL;
   970		if (pmd_trans_huge(pmde))
   971			return SCAN_PMD_MAPPED;
 > 972		if (pmd_devmap(pmd))
   973			return SCAN_PMD_NULL;
   974		if (pmd_bad(pmde))
   975			return SCAN_PMD_NULL;
   976		return SCAN_SUCCEED;
   977	}
   978	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
  2023-01-25  1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
  2023-01-25 12:54   ` kernel test robot
@ 2023-01-25 13:38   ` kernel test robot
  2023-01-25 18:14     ` Zach O'Keefe
  1 sibling, 1 reply; 12+ messages in thread
From: kernel test robot @ 2023-01-25 13:38 UTC (permalink / raw)
  To: Zach O'Keefe, linux-mm
  Cc: oe-kbuild-all, linux-kernel, Andrew Morton,
	Linux Memory Management List, Hugh Dickins, Yang Shi,
	Zach O'Keefe, stable

Hi Zach,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230125015738.912924-2-zokeefe%40google.com
patch subject: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20230125/202301252110.hFYRsbrm-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/6001eb9a8f1687a1d0b72831d991886106cac37b
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954
        git checkout 6001eb9a8f1687a1d0b72831d991886106cac37b
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/khugepaged.c: In function 'find_pmd_or_thp_or_none':
>> mm/khugepaged.c:972:24: error: incompatible type for argument 1 of 'pmd_devmap'
     972 |         if (pmd_devmap(pmd))
         |                        ^~~
         |                        |
         |                        pmd_t **
   In file included from include/linux/pgtable.h:6,
                    from include/linux/mm.h:29,
                    from mm/khugepaged.c:4:
   arch/x86/include/asm/pgtable.h:254:36: note: expected 'pmd_t' but argument is of type 'pmd_t **'
     254 | static inline int pmd_devmap(pmd_t pmd)
         |                              ~~~~~~^~~


vim +/pmd_devmap +972 mm/khugepaged.c

   945	
   946	/*
   947	 * See pmd_trans_unstable() for how the result may change out from
   948	 * underneath us, even if we hold mmap_lock in read.
   949	 */
   950	static int find_pmd_or_thp_or_none(struct mm_struct *mm,
   951					   unsigned long address,
   952					   pmd_t **pmd)
   953	{
   954		pmd_t pmde;
   955	
   956		*pmd = mm_find_pmd(mm, address);
   957		if (!*pmd)
   958			return SCAN_PMD_NULL;
   959	
   960		pmde = pmdp_get_lockless(*pmd);
   961	
   962	#ifdef CONFIG_TRANSPARENT_HUGEPAGE
   963		/* See comments in pmd_none_or_trans_huge_or_clear_bad() */
   964		barrier();
   965	#endif
   966		if (pmd_none(pmde))
   967			return SCAN_PMD_NONE;
   968		if (!pmd_present(pmde))
   969			return SCAN_PMD_NULL;
   970		if (pmd_trans_huge(pmde))
   971			return SCAN_PMD_MAPPED;
 > 972		if (pmd_devmap(pmd))
   973			return SCAN_PMD_NULL;
   974		if (pmd_bad(pmde))
   975			return SCAN_PMD_NULL;
   976		return SCAN_SUCCEED;
   977	}
   978	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
  2023-01-25  1:57 [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Zach O'Keefe
  2023-01-25  1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
@ 2023-01-25 18:06 ` Yang Shi
  2023-01-25 19:15   ` Zach O'Keefe
  2023-02-09  5:09 ` Hugh Dickins
  2 siblings, 1 reply; 12+ messages in thread
From: Yang Shi @ 2023-01-25 18:06 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: linux-mm, linux-kernel, Andrew Morton, Hugh Dickins

On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> During collapse, in a few places we check to see if a given small page
> has any unaccounted references.  If the refcount on the page doesn't
> match our expectations, it must be there is an unknown user concurrently
> interested in the page, and so it's not safe to move the contents
> elsewhere. However, the unaccounted pins are likely an ephemeral state.
>
> In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
> collapse may succeed on retry.

The page may be DMA pinned (for example, pin_user_pages()), it is not
worth retrying for such pages. But it may also not be worth optimizing
for this case at this point.

So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>
> ---
>  mm/khugepaged.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index e23619bfecc4..fa38cae240b9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r)
>         case SCAN_CGROUP_CHARGE_FAIL:
>                 return -EBUSY;
>         /* Resource temporary unavailable - trying again might succeed */
> +       case SCAN_PAGE_COUNT:
>         case SCAN_PAGE_LOCK:
>         case SCAN_PAGE_LRU:
>         case SCAN_DEL_PAGE_LRU:
> --
> 2.39.1.405.gd4c25cc71f-goog
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
  2023-01-25 13:38   ` kernel test robot
@ 2023-01-25 18:14     ` Zach O'Keefe
  0 siblings, 0 replies; 12+ messages in thread
From: Zach O'Keefe @ 2023-01-25 18:14 UTC (permalink / raw)
  To: kernel test robot
  Cc: linux-mm, oe-kbuild-all, linux-kernel, Andrew Morton,
	Hugh Dickins, Yang Shi, stable

Apologies here; shouldn't have overlooked the 4 line change. Will
follow-up with a v2 here in a second.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
  2023-01-25 18:06 ` [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Yang Shi
@ 2023-01-25 19:15   ` Zach O'Keefe
  0 siblings, 0 replies; 12+ messages in thread
From: Zach O'Keefe @ 2023-01-25 19:15 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-mm, linux-kernel, Andrew Morton, Hugh Dickins

On Wed, Jan 25, 2023 at 10:06 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > During collapse, in a few places we check to see if a given small page
> > has any unaccounted references.  If the refcount on the page doesn't
> > match our expectations, it must be there is an unknown user concurrently
> > interested in the page, and so it's not safe to move the contents
> > elsewhere. However, the unaccounted pins are likely an ephemeral state.
> >
> > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
> > collapse may succeed on retry.
>
> The page may be DMA pinned (for example, pin_user_pages()), it is not
> worth retrying for such pages. But it may also not be worth optimizing
> for this case at this point.
>
> So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com>

Thanks as always, Yang, and good point about DMA pinning. As you
mentioned, I don't know if it's worth considering that too much right
now, as it's unlikely these two uses (MADV_COLLAPSE and DMA pining)
would be used together. We can revisit if necessary later if it's an
issue, but for now, I think it's a win that MADV_COLLAPSE (+ a bounded
userspace retry loop based off erno) is more likely to succeed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
  2023-01-25  1:57 [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Zach O'Keefe
  2023-01-25  1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
  2023-01-25 18:06 ` [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Yang Shi
@ 2023-02-09  5:09 ` Hugh Dickins
  2023-02-09 21:28   ` Andrew Morton
  2 siblings, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2023-02-09  5:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Hugh Dickins, Yang Shi, Zach O'Keefe

On Tue, 24 Jan 2023, Zach O'Keefe wrote:

> During collapse, in a few places we check to see if a given small page
> has any unaccounted references.  If the refcount on the page doesn't
> match our expectations, it must be there is an unknown user concurrently
> interested in the page, and so it's not safe to move the contents
> elsewhere. However, the unaccounted pins are likely an ephemeral state.
> 
> In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
> collapse may succeed on retry.
> 
> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

This was
Reviewed-by: Yang Shi <shy828301@gmail.com>
and now I'll give it a nudge with
Acked-by: Hugh Dickins <hughd@google.com>
since it hasn't appeared in mm-unstable or linux-next yet:
I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention.

Thanks!
Hugh

> 
> ---
>  mm/khugepaged.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index e23619bfecc4..fa38cae240b9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r)
>  	case SCAN_CGROUP_CHARGE_FAIL:
>  		return -EBUSY;
>  	/* Resource temporary unavailable - trying again might succeed */
> +	case SCAN_PAGE_COUNT:
>  	case SCAN_PAGE_LOCK:
>  	case SCAN_PAGE_LRU:
>  	case SCAN_DEL_PAGE_LRU:
> -- 
> 2.39.1.405.gd4c25cc71f-goog

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
  2023-02-09  5:09 ` Hugh Dickins
@ 2023-02-09 21:28   ` Andrew Morton
  2023-02-09 21:50     ` Hugh Dickins
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2023-02-09 21:28 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel, Yang Shi, Zach O'Keefe

On Wed, 8 Feb 2023 21:09:04 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:

> On Tue, 24 Jan 2023, Zach O'Keefe wrote:
> 
> > During collapse, in a few places we check to see if a given small page
> > has any unaccounted references.  If the refcount on the page doesn't
> > match our expectations, it must be there is an unknown user concurrently
> > interested in the page, and so it's not safe to move the contents
> > elsewhere. However, the unaccounted pins are likely an ephemeral state.
> > 
> > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
> > collapse may succeed on retry.
> > 
> > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> > Reported-by: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> 
> This was
> Reviewed-by: Yang Shi <shy828301@gmail.com>
> and now I'll give it a nudge with
> Acked-by: Hugh Dickins <hughd@google.com>
> since it hasn't appeared in mm-unstable or linux-next yet:

Buildbot failed on [2/2] so I skipped the whole series in expectation
of a v2 series, which didn't happen.  Instead, Zach trickily sent what
was [2/2] as a standalone patch.  So [1/2] got lost.  Sigh, poor me.

Thanks, I'll merge [1/2] into mm-hotfixes.

> I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention.

I'm not seeing anything in the [1/2] changelog which indicates that a
backport is needed.  IOW,

# cat .signature
When fixing a bug, please describe the end-user visible effects of that bug.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
  2023-02-09 21:28   ` Andrew Morton
@ 2023-02-09 21:50     ` Hugh Dickins
  2023-02-09 22:12       ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2023-02-09 21:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, linux-mm, linux-kernel, Yang Shi, Zach O'Keefe

On Thu, 9 Feb 2023, Andrew Morton wrote:
> 
> Thanks, I'll merge [1/2] into mm-hotfixes.

Great, thanks.

> 
> I'm not seeing anything in the [1/2] changelog which indicates that a
> backport is needed.  IOW,

Correct: it's just changing the errno for some racy cases from "you're
wrong, don't bother me again" to "it might be worth having another go":
not fixing an instability, as 2/2 was.

> 
> # cat .signature
> When fixing a bug, please describe the end-user visible effects of that bug.

If whatever's being run by the end-user is coded to try again on -EAGAIN,
then the end-user will less often see occasional unexplained failures.

Hugh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
  2023-02-09 21:50     ` Hugh Dickins
@ 2023-02-09 22:12       ` Andrew Morton
  2023-02-09 22:29         ` Zach O'Keefe
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2023-02-09 22:12 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel, Yang Shi, Zach O'Keefe

On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:

> 
> > 
> > I'm not seeing anything in the [1/2] changelog which indicates that a
> > backport is needed.  IOW,
> 
> Correct: it's just changing the errno for some racy cases from "you're
> wrong, don't bother me again" to "it might be worth having another go":
> not fixing an instability, as 2/2 was.
> 
> > 
> > # cat .signature
> > When fixing a bug, please describe the end-user visible effects of that bug.
> 
> If whatever's being run by the end-user is coded to try again on -EAGAIN,
> then the end-user will less often see occasional unexplained failures.
> 

OK, thanks.  I redid the changelog's final paragraph thusly:

: In this situation, MADV_COLLAPSE returns -EINVAL when it should return
: -EAGAIN.  This could cause userspace to conclude that the syscall failed,
: when it in fact could succeed by retrying.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
  2023-02-09 22:12       ` Andrew Morton
@ 2023-02-09 22:29         ` Zach O'Keefe
  0 siblings, 0 replies; 12+ messages in thread
From: Zach O'Keefe @ 2023-02-09 22:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, linux-kernel, Yang Shi

On Thu, Feb 9, 2023 at 2:12 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:
>
> >
> > >
> > > I'm not seeing anything in the [1/2] changelog which indicates that a
> > > backport is needed.  IOW,
> >
> > Correct: it's just changing the errno for some racy cases from "you're
> > wrong, don't bother me again" to "it might be worth having another go":
> > not fixing an instability, as 2/2 was.
> >
> > >
> > > # cat .signature
> > > When fixing a bug, please describe the end-user visible effects of that bug.
> >
> > If whatever's being run by the end-user is coded to try again on -EAGAIN,
> > then the end-user will less often see occasional unexplained failures.
> >
>
> OK, thanks.  I redid the changelog's final paragraph thusly:
>
> : In this situation, MADV_COLLAPSE returns -EINVAL when it should return
> : -EAGAIN.  This could cause userspace to conclude that the syscall failed,
> : when it in fact could succeed by retrying.
>

This looks good to me. Thanks Andrew! Also thanks Hugh for being on
the lookout for this patch -- I hastily read through my emails
regarding which patches were merged where and had assumed this merged
with 2/2.

Also, apologies about the confusing v1 [1/2] and v2 [2/2] fiasco; in
hindsight that probably wasn't the most decipherable thing to do :)

Best,
Zach

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-09 22:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-25  1:57 [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Zach O'Keefe
2023-01-25  1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
2023-01-25 12:54   ` kernel test robot
2023-01-25 13:38   ` kernel test robot
2023-01-25 18:14     ` Zach O'Keefe
2023-01-25 18:06 ` [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Yang Shi
2023-01-25 19:15   ` Zach O'Keefe
2023-02-09  5:09 ` Hugh Dickins
2023-02-09 21:28   ` Andrew Morton
2023-02-09 21:50     ` Hugh Dickins
2023-02-09 22:12       ` Andrew Morton
2023-02-09 22:29         ` Zach O'Keefe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).