incoming

mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* incoming
@ 2020-12-06  6:14 Andrew Morton
  2020-12-06  6:14 ` [patch 01/12] zlib: export S390 symbols for zlib modules Andrew Morton
                   ` (11 more replies)
  0 siblings, 12 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

12 patches, based on 33256ce194110874d4bc90078b577c59f9076c59.

Subsystems affected by this patch series:

  lib
  coredump
  mm/memcg
  mm/zsmalloc
  mm/swap
  mailmap
  mm/selftests
  mm/pagecache
  mm/hugetlb
  mm/pagemap

Subsystem: lib

    Randy Dunlap <rdunlap@infradead.org>:
      zlib: export S390 symbols for zlib modules

Subsystem: coredump

    Menglong Dong <dong.menglong@zte.com.cn>:
      coredump: fix core_pattern parse error

Subsystem: mm/memcg

    Roman Gushchin <guro@fb.com>:
      mm: memcg/slab: fix obj_cgroup_charge() return value handling

    Yang Shi <shy828301@gmail.com>:
      mm: list_lru: set shrinker map bit when child nr_items is not zero

Subsystem: mm/zsmalloc

    Minchan Kim <minchan@kernel.org>:
      mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING

Subsystem: mm/swap

    Qian Cai <qcai@redhat.com>:
      mm/swapfile: do not sleep with a spin lock held

Subsystem: mailmap

    Uwe Kleine-König <u.kleine-koenig@pengutronix.de>:
      mailmap: add two more addresses of Uwe Kleine-König

Subsystem: mm/selftests

    Xingxing Su <suxingxing@loongson.cn>:
      tools/testing/selftests/vm: fix build error

    Axel Rasmussen <axelrasmussen@google.com>:
      userfaultfd: selftests: fix SIGSEGV if huge mmap fails

Subsystem: mm/pagecache

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/filemap: add static for function __add_to_page_cache_locked

Subsystem: mm/hugetlb

    Mike Kravetz <mike.kravetz@oracle.com>:
      hugetlb_cgroup: fix offline of hugetlb cgroup with reservations

Subsystem: mm/pagemap

    Liu Zixian <liuzixian4@huawei.com>:
      mm/mmap.c: fix mmap return value when vma is merged after call_mmap()

 .mailmap                                 |    2 +
 arch/arm/configs/omap2plus_defconfig     |    1 
 fs/coredump.c                            |    3 +
 include/linux/zsmalloc.h                 |    1 
 lib/zlib_dfltcc/dfltcc_inflate.c         |    3 +
 mm/Kconfig                               |   13 -------
 mm/filemap.c                             |    2 -
 mm/hugetlb_cgroup.c                      |    8 +---
 mm/list_lru.c                            |   10 ++---
 mm/mmap.c                                |   26 ++++++--------
 mm/slab.h                                |   40 +++++++++++++---------
 mm/swapfile.c                            |    4 +-
 mm/zsmalloc.c                            |   54 -------------------------------
 tools/testing/selftests/vm/Makefile      |    4 ++
 tools/testing/selftests/vm/userfaultfd.c |   25 +++++++++-----
 15 files changed, 75 insertions(+), 121 deletions(-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 01/12] zlib: export S390 symbols for zlib modules
  2020-12-06  6:14 incoming Andrew Morton
@ 2020-12-06  6:14 ` Andrew Morton
  2020-12-07  9:03   ` Zaslonko Mikhail
  2020-12-06  6:14 ` [patch 02/12] coredump: fix core_pattern parse error Andrew Morton
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:14 UTC (permalink / raw)
  To: akpm, borntraeger, gor, hca, iii, linux-mm, lkp, mm-commits,
	rdunlap, torvalds, zaslonko

From: Randy Dunlap <rdunlap@infradead.org>
Subject: zlib: export S390 symbols for zlib modules

Fix build errors when ZLIB_INFLATE=m and ZLIB_DEFLATE=m and ZLIB_DFLTCC=y
by exporting the 2 needed symbols in dfltcc_inflate.c.

Fixes these build errors:

ERROR: modpost: "dfltcc_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
ERROR: modpost: "dfltcc_can_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!

Link: https://lkml.kernel.org/r/20201123191712.4882-1-rdunlap@infradead.org
Fixes: 126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/zlib_dfltcc/dfltcc_inflate.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/lib/zlib_dfltcc/dfltcc_inflate.c~zlib-export-s390-symbols-for-zlib-modules
+++ a/lib/zlib_dfltcc/dfltcc_inflate.c
@@ -4,6 +4,7 @@
 #include "dfltcc_util.h"
 #include "dfltcc.h"
 #include <asm/setup.h>
+#include <linux/export.h>
 #include <linux/zutil.h>
 
 /*
@@ -29,6 +30,7 @@ int dfltcc_can_inflate(
     return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) &&
                is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0);
 }
+EXPORT_SYMBOL(dfltcc_can_inflate);
 
 static int dfltcc_was_inflate_used(
     z_streamp strm
@@ -147,3 +149,4 @@ dfltcc_inflate_action dfltcc_inflate(
     return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ?
         DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE;
 }
+EXPORT_SYMBOL(dfltcc_inflate);
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 01/12] zlib: export S390 symbols for zlib modules
  2020-12-06  6:14 ` [patch 01/12] zlib: export S390 symbols for zlib modules Andrew Morton
@ 2020-12-07  9:03   ` Zaslonko Mikhail
  2020-12-08  0:45     ` Randy Dunlap
  0 siblings, 1 reply; 17+ messages in thread
From: Zaslonko Mikhail @ 2020-12-07  9:03 UTC (permalink / raw)
  To: Andrew Morton, borntraeger, gor, hca, iii, linux-mm, lkp,
	mm-commits, rdunlap, torvalds

Hello,

should we probably put these to lib/zlib_dfltcc/dfltcc_syms.c along with the other EXPORT_SYMBOL statements?

Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>

On 06.12.2020 07:14, Andrew Morton wrote:
> From: Randy Dunlap <rdunlap@infradead.org>
> Subject: zlib: export S390 symbols for zlib modules
> 
> Fix build errors when ZLIB_INFLATE=m and ZLIB_DEFLATE=m and ZLIB_DFLTCC=y
> by exporting the 2 needed symbols in dfltcc_inflate.c.
> 
> Fixes these build errors:
> 
> ERROR: modpost: "dfltcc_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
> ERROR: modpost: "dfltcc_can_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
> 
> Link: https://lkml.kernel.org/r/20201123191712.4882-1-rdunlap@infradead.org
> Fixes: 126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> Reported-by: kernel test robot <lkp@intel.com>
> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  lib/zlib_dfltcc/dfltcc_inflate.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> --- a/lib/zlib_dfltcc/dfltcc_inflate.c~zlib-export-s390-symbols-for-zlib-modules
> +++ a/lib/zlib_dfltcc/dfltcc_inflate.c
> @@ -4,6 +4,7 @@
>  #include "dfltcc_util.h"
>  #include "dfltcc.h"
>  #include <asm/setup.h>
> +#include <linux/export.h>
>  #include <linux/zutil.h>
>  
>  /*
> @@ -29,6 +30,7 @@ int dfltcc_can_inflate(
>      return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) &&
>                 is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0);
>  }
> +EXPORT_SYMBOL(dfltcc_can_inflate);
>  
>  static int dfltcc_was_inflate_used(
>      z_streamp strm
> @@ -147,3 +149,4 @@ dfltcc_inflate_action dfltcc_inflate(
>      return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ?
>          DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE;
>  }
> +EXPORT_SYMBOL(dfltcc_inflate);
> _
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 01/12] zlib: export S390 symbols for zlib modules
  2020-12-07  9:03   ` Zaslonko Mikhail
@ 2020-12-08  0:45     ` Randy Dunlap
  2020-12-08  8:19       ` Zaslonko Mikhail
  0 siblings, 1 reply; 17+ messages in thread
From: Randy Dunlap @ 2020-12-08  0:45 UTC (permalink / raw)
  To: Zaslonko Mikhail, Andrew Morton, borntraeger, gor, hca, iii,
	linux-mm, lkp, mm-commits, torvalds

On 12/7/20 1:03 AM, Zaslonko Mikhail wrote:
> Hello,
> 
> should we probably put these to lib/zlib_dfltcc/dfltcc_syms.c along with the other EXPORT_SYMBOL statements?

Hi,
I didn't know about that file. :)

That change makes sense. At least that's how some modules used
to do it. Not so much nowadays, but this one might as well
consistent within itself.

Do you want to fix it or should I?

Thanks.

> Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> 
> On 06.12.2020 07:14, Andrew Morton wrote:
>> From: Randy Dunlap <rdunlap@infradead.org>
>> Subject: zlib: export S390 symbols for zlib modules
>>
>> Fix build errors when ZLIB_INFLATE=m and ZLIB_DEFLATE=m and ZLIB_DFLTCC=y
>> by exporting the 2 needed symbols in dfltcc_inflate.c.
>>
>> Fixes these build errors:
>>
>> ERROR: modpost: "dfltcc_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
>> ERROR: modpost: "dfltcc_can_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
>>
>> Link: https://lkml.kernel.org/r/20201123191712.4882-1-rdunlap@infradead.org
>> Fixes: 126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
>> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
>> Reported-by: kernel test robot <lkp@intel.com>
>> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
>> Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>> Cc: Heiko Carstens <hca@linux.ibm.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>
>>  lib/zlib_dfltcc/dfltcc_inflate.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> --- a/lib/zlib_dfltcc/dfltcc_inflate.c~zlib-export-s390-symbols-for-zlib-modules
>> +++ a/lib/zlib_dfltcc/dfltcc_inflate.c
>> @@ -4,6 +4,7 @@
>>  #include "dfltcc_util.h"
>>  #include "dfltcc.h"
>>  #include <asm/setup.h>
>> +#include <linux/export.h>
>>  #include <linux/zutil.h>
>>  
>>  /*
>> @@ -29,6 +30,7 @@ int dfltcc_can_inflate(
>>      return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) &&
>>                 is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0);
>>  }
>> +EXPORT_SYMBOL(dfltcc_can_inflate);
>>  
>>  static int dfltcc_was_inflate_used(
>>      z_streamp strm
>> @@ -147,3 +149,4 @@ dfltcc_inflate_action dfltcc_inflate(
>>      return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ?
>>          DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE;
>>  }
>> +EXPORT_SYMBOL(dfltcc_inflate);
>> _
>>


-- 
~Randy
Reported-by: Randy Dunlap <rdunlap@infradead.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 01/12] zlib: export S390 symbols for zlib modules
  2020-12-08  0:45     ` Randy Dunlap
@ 2020-12-08  8:19       ` Zaslonko Mikhail
  2020-12-18 22:18         ` Randy Dunlap
  0 siblings, 1 reply; 17+ messages in thread
From: Zaslonko Mikhail @ 2020-12-08  8:19 UTC (permalink / raw)
  To: Randy Dunlap, Andrew Morton, borntraeger, gor, hca, iii,
	linux-mm, lkp, mm-commits, torvalds



On 08.12.2020 01:45, Randy Dunlap wrote:
> On 12/7/20 1:03 AM, Zaslonko Mikhail wrote:
>> Hello,
>>
>> should we probably put these to lib/zlib_dfltcc/dfltcc_syms.c along with the other EXPORT_SYMBOL statements?
> 
> Hi,
> I didn't know about that file. :)
> 
> That change makes sense. At least that's how some modules used
> to do it. Not so much nowadays, but this one might as well
> consistent within itself.
> 
> Do you want to fix it or should I?

Please, go ahead.

Thanks.

> 
> Thanks.
> 
>> Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>>
>> On 06.12.2020 07:14, Andrew Morton wrote:
>>> From: Randy Dunlap <rdunlap@infradead.org>
>>> Subject: zlib: export S390 symbols for zlib modules
>>>
>>> Fix build errors when ZLIB_INFLATE=m and ZLIB_DEFLATE=m and ZLIB_DFLTCC=y
>>> by exporting the 2 needed symbols in dfltcc_inflate.c.
>>>
>>> Fixes these build errors:
>>>
>>> ERROR: modpost: "dfltcc_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
>>> ERROR: modpost: "dfltcc_can_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
>>>
>>> Link: https://lkml.kernel.org/r/20201123191712.4882-1-rdunlap@infradead.org
>>> Fixes: 126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
>>> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
>>> Reported-by: kernel test robot <lkp@intel.com>
>>> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
>>> Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>>> Cc: Heiko Carstens <hca@linux.ibm.com>
>>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>> ---
>>>
>>>  lib/zlib_dfltcc/dfltcc_inflate.c |    3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> --- a/lib/zlib_dfltcc/dfltcc_inflate.c~zlib-export-s390-symbols-for-zlib-modules
>>> +++ a/lib/zlib_dfltcc/dfltcc_inflate.c
>>> @@ -4,6 +4,7 @@
>>>  #include "dfltcc_util.h"
>>>  #include "dfltcc.h"
>>>  #include <asm/setup.h>
>>> +#include <linux/export.h>
>>>  #include <linux/zutil.h>
>>>  
>>>  /*
>>> @@ -29,6 +30,7 @@ int dfltcc_can_inflate(
>>>      return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) &&
>>>                 is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0);
>>>  }
>>> +EXPORT_SYMBOL(dfltcc_can_inflate);
>>>  
>>>  static int dfltcc_was_inflate_used(
>>>      z_streamp strm
>>> @@ -147,3 +149,4 @@ dfltcc_inflate_action dfltcc_inflate(
>>>      return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ?
>>>          DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE;
>>>  }
>>> +EXPORT_SYMBOL(dfltcc_inflate);
>>> _
>>>
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 01/12] zlib: export S390 symbols for zlib modules
  2020-12-08  8:19       ` Zaslonko Mikhail
@ 2020-12-18 22:18         ` Randy Dunlap
  0 siblings, 0 replies; 17+ messages in thread
From: Randy Dunlap @ 2020-12-18 22:18 UTC (permalink / raw)
  To: Zaslonko Mikhail, Andrew Morton, borntraeger, gor, hca, iii,
	linux-mm, lkp, mm-commits, torvalds

On 12/8/20 12:19 AM, Zaslonko Mikhail wrote:
> 
> 
> On 08.12.2020 01:45, Randy Dunlap wrote:
>> On 12/7/20 1:03 AM, Zaslonko Mikhail wrote:
>>> Hello,
>>>
>>> should we probably put these to lib/zlib_dfltcc/dfltcc_syms.c along with the other EXPORT_SYMBOL statements?
>>
>> Hi,
>> I didn't know about that file. :)
>>
>> That change makes sense. At least that's how some modules used
>> to do it. Not so much nowadays, but this one might as well
>> consistent within itself.
>>
>> Do you want to fix it or should I?
> 
> Please, go ahead.
> 
> Thanks.

Hi,

How do you feel about eliminating the dfltcc_syms.c file and
putting the EXPORT_SYMBOL()s in the source file(s) where they
occur?  as expressed in coding-style.rst:

In source files, separate functions with one blank line.  If the function is
exported, the **EXPORT** macro for it should follow immediately after the
closing function brace line.

(also move the MODULE_LICENSE() to a different source file.)

>>
>> Thanks.
>>
>>> Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>>>
>>> On 06.12.2020 07:14, Andrew Morton wrote:
>>>> From: Randy Dunlap <rdunlap@infradead.org>
>>>> Subject: zlib: export S390 symbols for zlib modules
>>>>
>>>> Fix build errors when ZLIB_INFLATE=m and ZLIB_DEFLATE=m and ZLIB_DFLTCC=y
>>>> by exporting the 2 needed symbols in dfltcc_inflate.c.


thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 02/12] coredump: fix core_pattern parse error
  2020-12-06  6:14 incoming Andrew Morton
  2020-12-06  6:14 ` [patch 01/12] zlib: export S390 symbols for zlib modules Andrew Morton
@ 2020-12-06  6:14 ` Andrew Morton
  2020-12-06  6:14 ` [patch 03/12] mm: memcg/slab: fix obj_cgroup_charge() return value handling Andrew Morton
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:14 UTC (permalink / raw)
  To: akpm, dong.menglong, jwilk, linux-mm, mm-commits, nhorman, pabs3,
	stable, torvalds

From: Menglong Dong <dong.menglong@zte.com.cn>
Subject: coredump: fix core_pattern parse error

'format_corename()' will splite 'core_pattern' on spaces when it is in
pipe mode, and take helper_argv[0] as the path to usermode executable.

It works fine in most cases. However, if there is a space between
'|' and '/file/path', such as
'| /usr/lib/systemd/systemd-coredump %P %u %g',
helper_argv[0] will be parsed as '', and users will get a
'Core dump to | disabled'.

It is not friendly to users, as the pattern above was valid previously. 
Fix this by ignoring the spaces between '|' and '/file/path'.

Link: https://lkml.kernel.org/r/5fb62870.1c69fb81.8ef5d.af76@mx.google.com
Fixes: 315c69261dd3 ("coredump: split pipe command whitespace before expanding template")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Cc: Paul Wise <pabs3@bonedaddy.net>
Cc: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398]
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/coredump.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/coredump.c~coredump-fix-core_pattern-parse-error
+++ a/fs/coredump.c
@@ -229,7 +229,8 @@ static int format_corename(struct core_n
 		 */
 		if (ispipe) {
 			if (isspace(*pat_ptr)) {
-				was_space = true;
+				if (cn->used != 0)
+					was_space = true;
 				pat_ptr++;
 				continue;
 			} else if (was_space) {
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 03/12] mm: memcg/slab: fix obj_cgroup_charge() return value handling
  2020-12-06  6:14 incoming Andrew Morton
  2020-12-06  6:14 ` [patch 01/12] zlib: export S390 symbols for zlib modules Andrew Morton
  2020-12-06  6:14 ` [patch 02/12] coredump: fix core_pattern parse error Andrew Morton
@ 2020-12-06  6:14 ` Andrew Morton
  2020-12-06  6:14 ` [patch 04/12] mm: list_lru: set shrinker map bit when child nr_items is not zero Andrew Morton
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:14 UTC (permalink / raw)
  To: akpm, guro, hannes, linux-mm, mhocko, mm-commits, shakeelb,
	stable, torvalds

From: Roman Gushchin <guro@fb.com>
Subject: mm: memcg/slab: fix obj_cgroup_charge() return value handling

Commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for
all allocations") introduced a regression into the handling of the
obj_cgroup_charge() return value.  If a non-zero value is returned
(indicating of exceeding one of memory.max limits), the allocation should
fail, instead of falling back to non-accounted mode.

To make the code more readable, move memcg_slab_pre_alloc_hook() and
memcg_slab_post_alloc_hook() calling conditions into bodies of these
hooks.

Link: https://lkml.kernel.org/r/20201127161828.GD840171@carbon.dhcp.thefacebook.com
Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slab.h |   40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

--- a/mm/slab.h~mm-memcg-slab-fix-obj_cgroup_charge-return-value-handling
+++ a/mm/slab.h
@@ -274,22 +274,32 @@ static inline size_t obj_full_size(struc
 	return s->size + sizeof(struct obj_cgroup *);
 }
 
-static inline struct obj_cgroup *memcg_slab_pre_alloc_hook(struct kmem_cache *s,
-							   size_t objects,
-							   gfp_t flags)
+/*
+ * Returns false if the allocation should fail.
+ */
+static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+					     struct obj_cgroup **objcgp,
+					     size_t objects, gfp_t flags)
 {
 	struct obj_cgroup *objcg;
 
+	if (!memcg_kmem_enabled())
+		return true;
+
+	if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))
+		return true;
+
 	objcg = get_obj_cgroup_from_current();
 	if (!objcg)
-		return NULL;
+		return true;
 
 	if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) {
 		obj_cgroup_put(objcg);
-		return NULL;
+		return false;
 	}
 
-	return objcg;
+	*objcgp = objcg;
+	return true;
 }
 
 static inline void mod_objcg_state(struct obj_cgroup *objcg,
@@ -315,7 +325,7 @@ static inline void memcg_slab_post_alloc
 	unsigned long off;
 	size_t i;
 
-	if (!objcg)
+	if (!memcg_kmem_enabled() || !objcg)
 		return;
 
 	flags &= ~__GFP_ACCOUNT;
@@ -400,11 +410,11 @@ static inline void memcg_free_page_obj_c
 {
 }
 
-static inline struct obj_cgroup *memcg_slab_pre_alloc_hook(struct kmem_cache *s,
-							   size_t objects,
-							   gfp_t flags)
+static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+					     struct obj_cgroup **objcgp,
+					     size_t objects, gfp_t flags)
 {
-	return NULL;
+	return true;
 }
 
 static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
@@ -508,9 +518,8 @@ static inline struct kmem_cache *slab_pr
 	if (should_failslab(s, flags))
 		return NULL;
 
-	if (memcg_kmem_enabled() &&
-	    ((flags & __GFP_ACCOUNT) || (s->flags & SLAB_ACCOUNT)))
-		*objcgp = memcg_slab_pre_alloc_hook(s, size, flags);
+	if (!memcg_slab_pre_alloc_hook(s, objcgp, size, flags))
+		return NULL;
 
 	return s;
 }
@@ -529,8 +538,7 @@ static inline void slab_post_alloc_hook(
 					 s->flags, flags);
 	}
 
-	if (memcg_kmem_enabled())
-		memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
+	memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
 }
 
 #ifndef CONFIG_SLOB
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 04/12] mm: list_lru: set shrinker map bit when child nr_items is not zero
  2020-12-06  6:14 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2020-12-06  6:14 ` [patch 03/12] mm: memcg/slab: fix obj_cgroup_charge() return value handling Andrew Morton
@ 2020-12-06  6:14 ` Andrew Morton
  2020-12-06  6:14 ` [patch 05/12] mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING Andrew Morton
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:14 UTC (permalink / raw)
  To: akpm, guro, ktkhai, linux-mm, mm-commits, shakeelb, shy828301,
	stable, torvalds, vdavydov.dev

From: Yang Shi <shy828301@gmail.com>
Subject: mm: list_lru: set shrinker map bit when child nr_items is not zero

When investigating a slab cache bloat problem, significant amount of
negative dentry cache was seen, but confusingly they neither got shrunk
by reclaimer (the host has very tight memory) nor be shrunk by dropping
cache.  The vmcore shows there are over 14M negative dentry objects on lru,
but tracing result shows they were even not scanned at all.  The further
investigation shows the memcg's vfs shrinker_map bit is not set.  So the
reclaimer or dropping cache just skip calling vfs shrinker.  So we have
to reboot the hosts to get the memory back.

I didn't manage to come up with a reproducer in test environment, and the
problem can't be reproduced after rebooting.  But it seems there is race
between shrinker map bit clear and reparenting by code inspection.  The
hypothesis is elaborated as below.

The memcg hierarchy on our production environment looks like:
                root
               /    \
          system   user

The main workloads are running under user slice's children, and it creates
and removes memcg frequently.  So reparenting happens very often under user
slice, but no task is under user slice directly.

So with the frequent reparenting and tight memory pressure, the below
hypothetical race condition may happen:

       CPU A                            CPU B
reparent
    dst->nr_items == 0
                                 shrinker:
                                     total_objects == 0
    add src->nr_items to dst
    set_bit
                                     return SHRINK_EMPTY
                                     clear_bit
child memcg offline
    replace child's kmemcg_id with
    parent's (in memcg_offline_kmem())
                                  list_lru_del() between shrinker runs
                                     see parent's kmemcg_id
                                     dec dst->nr_items
reparent again
    dst->nr_items may go negative
    due to concurrent list_lru_del()

                                 The second run of shrinker:
                                     read nr_items without any
                                     synchronization, so it may
                                     see intermediate negative
                                     nr_items then total_objects
                                     may return 0 coincidently

                                     keep the bit cleared
    dst->nr_items != 0
    skip set_bit
    add scr->nr_item to dst

After this point dst->nr_item may never go zero, so reparenting will not
set shrinker_map bit anymore.  And since there is no task under user
slice directly, so no new object will be added to its lru to set the
shrinker map bit either.  That bit is kept cleared forever.

How does list_lru_del() race with reparenting?  It is because
reparenting replaces children's kmemcg_id to parent's without protecting
from nlru->lock, so list_lru_del() may see parent's kmemcg_id but
actually deleting items from child's lru, but dec'ing parent's nr_items,
so the parent's nr_items may go negative as commit
2788cf0c401c268b4819c5407493a8769b7007aa ("memcg: reparent list_lrus and
free kmemcg_id on css offline") says.

Since it is impossible that dst->nr_items goes negative and
src->nr_items goes zero at the same time, so it seems we could set the
shrinker map bit iff src->nr_items != 0.  We could synchronize
list_lru_count_one() and reparenting with nlru->lock, but it seems
checking src->nr_items in reparenting is the simplest and avoids lock
contention.

Link: https://lkml.kernel.org/r/20201202171749.264354-1-shy828301@gmail.com
Fixes: fae91d6d8be5 ("mm/list_lru.c: set bit in memcg shrinker bitmap on first list_lru item appearance")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Suggested-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>	[4.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/list_lru.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/mm/list_lru.c~mm-list_lru-set-shrinker-map-bit-when-child-nr_items-is-not-zero
+++ a/mm/list_lru.c
@@ -534,7 +534,6 @@ static void memcg_drain_list_lru_node(st
 	struct list_lru_node *nlru = &lru->node[nid];
 	int dst_idx = dst_memcg->kmemcg_id;
 	struct list_lru_one *src, *dst;
-	bool set;

 	/*
 	 * Since list_lru_{add,del} may be called under an IRQ-safe lock,
@@ -546,11 +545,12 @@ static void memcg_drain_list_lru_node(st
 	dst = list_lru_from_memcg_idx(nlru, dst_idx);

 	list_splice_init(&src->list, &dst->list);
-	set = (!dst->nr_items && src->nr_items);
-	dst->nr_items += src->nr_items;
-	if (set)
+
+	if (src->nr_items) {
+		dst->nr_items += src->nr_items;
 		memcg_set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru));
-	src->nr_items = 0;
+		src->nr_items = 0;
+	}

 	spin_unlock_irq(&nlru->lock);
 }
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 05/12] mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING
  2020-12-06  6:14 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2020-12-06  6:14 ` [patch 04/12] mm: list_lru: set shrinker map bit when child nr_items is not zero Andrew Morton
@ 2020-12-06  6:14 ` Andrew Morton
  2020-12-06  6:14 ` [patch 06/12] mm/swapfile: do not sleep with a spin lock held Andrew Morton
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:14 UTC (permalink / raw)
  To: akpm, harish, hch, linux-mm, minchan, mm-commits,
	sergey.senozhatsky, stable, tony, torvalds, urezki

From: Minchan Kim <minchan@kernel.org>
Subject: mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING

While I was doing zram testing, I found sometimes decompression failed
since the compression buffer was corrupted.  With investigation, I found
below commit calls cond_resched unconditionally so it could make a problem
in atomic context if the task is reschedule.

[   55.109012] BUG: sleeping function called from invalid context at mm/vmalloc.c:108
[   55.110774] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog
[   55.111973] 3 locks held by memhog/946:
[   55.112807]  #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160
[   55.114151]  #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160
[   55.115848]  #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0
[   55.118947] CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316
[   55.121265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[   55.122540] Call Trace:
[   55.122974]  dump_stack+0x8b/0xb8
[   55.123588]  ___might_sleep.cold+0xb6/0xc6
[   55.124328]  unmap_kernel_range_noflush+0x2eb/0x350
[   55.125198]  unmap_kernel_range+0x14/0x30
[   55.125920]  zs_unmap_object+0xd5/0xe0
[   55.126604]  zram_bvec_rw.isra.0+0x38c/0x8e0
[   55.127462]  zram_rw_page+0x90/0x101
[   55.128199]  bdev_write_page+0x92/0xe0
[   55.128957]  ? swap_slot_free_notify+0xb0/0xb0
[   55.129841]  __swap_writepage+0x94/0x4a0
[   55.130636]  ? do_raw_spin_unlock+0x4b/0xa0
[   55.131462]  ? _raw_spin_unlock+0x1f/0x30
[   55.132261]  ? page_swapcount+0x6c/0x90
[   55.133038]  pageout+0xe3/0x3a0
[   55.133702]  shrink_page_list+0xb94/0xd60
[   55.134626]  shrink_inactive_list+0x158/0x460

We can fix this by removing the ZSMALLOC_PGTABLE_MAPPING feature (which
contains the offending calling code) from zsmalloc.

Even though this option showed some amount improvement(e.g., 30%) in some
arm32 platforms, it has been headache to maintain since it have abused
APIs[1](e.g., unmap_kernel_range in atomic context).

Since we are approaching to deprecate 32bit machines and already made the
config option available for only builtin build since v5.8, lastly it has
been not default option in zsmalloc, it's time to drop the option for
better maintenance.

[1] http://lore.kernel.org/linux-mm/20201105170249.387069-1-minchan@kernel.org

Link: https://lkml.kernel.org/r/20201117202916.GA3856507@google.com
Fixes: e47110e90584 ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Harish Sriram <harish@linux.ibm.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm/configs/omap2plus_defconfig |    1 
 include/linux/zsmalloc.h             |    1 
 mm/Kconfig                           |   13 ------
 mm/zsmalloc.c                        |   54 -------------------------
 4 files changed, 69 deletions(-)

--- a/arch/arm/configs/omap2plus_defconfig~mm-zsmallocc-drop-zsmalloc_pgtable_mapping
+++ a/arch/arm/configs/omap2plus_defconfig
@@ -81,7 +81,6 @@ CONFIG_PARTITION_ADVANCED=y
 CONFIG_BINFMT_MISC=y
 CONFIG_CMA=y
 CONFIG_ZSMALLOC=m
-CONFIG_ZSMALLOC_PGTABLE_MAPPING=y
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
--- a/include/linux/zsmalloc.h~mm-zsmallocc-drop-zsmalloc_pgtable_mapping
+++ a/include/linux/zsmalloc.h
@@ -20,7 +20,6 @@
  * zsmalloc mapping modes
  *
  * NOTE: These only make a difference when a mapped object spans pages.
- * They also have no effect when ZSMALLOC_PGTABLE_MAPPING is selected.
  */
 enum zs_mapmode {
 	ZS_MM_RW, /* normal read-write mapping */
--- a/mm/Kconfig~mm-zsmallocc-drop-zsmalloc_pgtable_mapping
+++ a/mm/Kconfig
@@ -707,19 +707,6 @@ config ZSMALLOC
 	  returned by an alloc().  This handle must be mapped in order to
 	  access the allocated space.
 
-config ZSMALLOC_PGTABLE_MAPPING
-	bool "Use page table mapping to access object in zsmalloc"
-	depends on ZSMALLOC=y
-	help
-	  By default, zsmalloc uses a copy-based object mapping method to
-	  access allocations that span two pages. However, if a particular
-	  architecture (ex, ARM) performs VM mapping faster than copying,
-	  then you should select this. This causes zsmalloc to use page table
-	  mapping rather than copying for object mapping.
-
-	  You can check speed with zsmalloc benchmark:
-	  https://github.com/spartacus06/zsmapbench
-
 config ZSMALLOC_STAT
 	bool "Export zsmalloc statistics"
 	depends on ZSMALLOC
--- a/mm/zsmalloc.c~mm-zsmallocc-drop-zsmalloc_pgtable_mapping
+++ a/mm/zsmalloc.c
@@ -293,11 +293,7 @@ struct zspage {
 };
 
 struct mapping_area {
-#ifdef CONFIG_ZSMALLOC_PGTABLE_MAPPING
-	struct vm_struct *vm; /* vm area for mapping object that span pages */
-#else
 	char *vm_buf; /* copy buffer for objects that span pages */
-#endif
 	char *vm_addr; /* address of kmap_atomic()'ed pages */
 	enum zs_mapmode vm_mm; /* mapping mode */
 };
@@ -1113,54 +1109,6 @@ static struct zspage *find_get_zspage(st
 	return zspage;
 }
 
-#ifdef CONFIG_ZSMALLOC_PGTABLE_MAPPING
-static inline int __zs_cpu_up(struct mapping_area *area)
-{
-	/*
-	 * Make sure we don't leak memory if a cpu UP notification
-	 * and zs_init() race and both call zs_cpu_up() on the same cpu
-	 */
-	if (area->vm)
-		return 0;
-	area->vm = get_vm_area(PAGE_SIZE * 2, 0);
-	if (!area->vm)
-		return -ENOMEM;
-
-	/*
-	 * Populate ptes in advance to avoid pte allocation with GFP_KERNEL
-	 * in non-preemtible context of zs_map_object.
-	 */
-	return apply_to_page_range(&init_mm, (unsigned long)area->vm->addr,
-			PAGE_SIZE * 2, NULL, NULL);
-}
-
-static inline void __zs_cpu_down(struct mapping_area *area)
-{
-	if (area->vm)
-		free_vm_area(area->vm);
-	area->vm = NULL;
-}
-
-static inline void *__zs_map_object(struct mapping_area *area,
-				struct page *pages[2], int off, int size)
-{
-	unsigned long addr = (unsigned long)area->vm->addr;
-
-	BUG_ON(map_kernel_range(addr, PAGE_SIZE * 2, PAGE_KERNEL, pages) < 0);
-	area->vm_addr = area->vm->addr;
-	return area->vm_addr + off;
-}
-
-static inline void __zs_unmap_object(struct mapping_area *area,
-				struct page *pages[2], int off, int size)
-{
-	unsigned long addr = (unsigned long)area->vm_addr;
-
-	unmap_kernel_range(addr, PAGE_SIZE * 2);
-}
-
-#else /* CONFIG_ZSMALLOC_PGTABLE_MAPPING */
-
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
 	/*
@@ -1241,8 +1189,6 @@ out:
 	pagefault_enable();
 }
 
-#endif /* CONFIG_ZSMALLOC_PGTABLE_MAPPING */
-
 static int zs_cpu_prepare(unsigned int cpu)
 {
 	struct mapping_area *area;
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 06/12] mm/swapfile: do not sleep with a spin lock held
  2020-12-06  6:14 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2020-12-06  6:14 ` [patch 05/12] mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING Andrew Morton
@ 2020-12-06  6:14 ` Andrew Morton
  2020-12-06  6:14 ` [patch 07/12] mailmap: add two more addresses of Uwe Kleine-König Andrew Morton
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:14 UTC (permalink / raw)
  To: akpm, hughd, linux-mm, mm-commits, qcai, stable, torvalds

From: Qian Cai <qcai@redhat.com>
Subject: mm/swapfile: do not sleep with a spin lock held

We can't call kvfree() with a spin lock held, so defer it.  Fixes a
might_sleep() runtime warning.

Link: https://lkml.kernel.org/r/20201202151549.10350-1-qcai@redhat.com
Fixes: 873d7bcfd066 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation")
Signed-off-by: Qian Cai <qcai@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/swapfile.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/swapfile.c~mm-swapfile-do-not-sleep-with-a-spin-lock-held
+++ a/mm/swapfile.c
@@ -2867,6 +2867,7 @@ late_initcall(max_swapfiles_check);
 static struct swap_info_struct *alloc_swap_info(void)
 {
 	struct swap_info_struct *p;
+	struct swap_info_struct *defer = NULL;
 	unsigned int type;
 	int i;
 
@@ -2895,7 +2896,7 @@ static struct swap_info_struct *alloc_sw
 		smp_wmb();
 		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
 	} else {
-		kvfree(p);
+		defer = p;
 		p = swap_info[type];
 		/*
 		 * Do not memset this entry: a racing procfs swap_next()
@@ -2908,6 +2909,7 @@ static struct swap_info_struct *alloc_sw
 		plist_node_init(&p->avail_lists[i], 0);
 	p->flags = SWP_USED;
 	spin_unlock(&swap_lock);
+	kvfree(defer);
 	spin_lock_init(&p->lock);
 	spin_lock_init(&p->cont_lock);
 
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 07/12] mailmap: add two more addresses of Uwe Kleine-König
  2020-12-06  6:14 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2020-12-06  6:14 ` [patch 06/12] mm/swapfile: do not sleep with a spin lock held Andrew Morton
@ 2020-12-06  6:14 ` Andrew Morton
  2020-12-06  6:15 ` [patch 08/12] tools/testing/selftests/vm: fix build error Andrew Morton
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:14 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, torvalds, u.kleine-koenig

From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Subject: mailmap: add two more addresses of Uwe Kleine-König

This fixes attribution for the commits (among others)

- d4097456cd1d ("video/framebuffer: move the probe func into
  .devinit.text in Blackfin LCD driver")

- 0312e024d6cd ("mfd: mc13xxx: Add support for mc34708")

Link: https://lkml.kernel.org/r/20201127213358.3440830-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .mailmap |    2 ++
 1 file changed, 2 insertions(+)

--- a/.mailmap~mailmap-add-two-more-addresses-of-uwe-kleine-oenig
+++ a/.mailmap
@@ -322,6 +322,8 @@ TripleX Chung <xxx.phy@gmail.com> <zhong
 Tsuneo Yoshioka <Tsuneo.Yoshioka@f-secure.com>
 Tycho Andersen <tycho@tycho.pizza> <tycho@tycho.ws>
 Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
+Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
+Uwe Kleine-König <ukleinek@strlen.de>
 Uwe Kleine-König <ukl@pengutronix.de>
 Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
 Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 08/12] tools/testing/selftests/vm: fix build error
  2020-12-06  6:14 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2020-12-06  6:14 ` [patch 07/12] mailmap: add two more addresses of Uwe Kleine-König Andrew Morton
@ 2020-12-06  6:15 ` Andrew Morton
  2020-12-06  6:15 ` [patch 09/12] userfaultfd: selftests: fix SIGSEGV if huge mmap fails Andrew Morton
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:15 UTC (permalink / raw)
  To: akpm, almasrymina, bgeffon, dave.hansen, jhubbard,
	kirill.shutemov, linux-mm, mm-commits, sandipan, shuah,
	suxingxing, torvalds

From: Xingxing Su <suxingxing@loongson.cn>
Subject: tools/testing/selftests/vm: fix build error

Only x86 and PowerPC implement the pkey-xxx.h, and an error was reported
when compiling protection_keys.c.

Add a Arch judgment to compile "protection_keys" in the Makefile.

If other arch implement this, add the arch name to the Makefile.
eg:
    ifneq (,$(findstring $(ARCH),powerpc mips ... ))

Following build errors:
    pkey-helpers.h:93:2: error: #error Architecture not supported
     #error Architecture not supported
    pkey-helpers.h:96:20: error: `PKEY_DISABLE_ACCESS' undeclared
     #define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
                        ^
    protection_keys.c:218:45: error: `PKEY_DISABLE_WRITE' undeclared
     pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
                                                ^

Link: https://lkml.kernel.org/r/1606826876-30656-1-git-send-email-suxingxing@loongson.cn
Signed-off-by: Xingxing Su <suxingxing@loongson.cn>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/vm/Makefile |    4 ++++
 1 file changed, 4 insertions(+)

--- a/tools/testing/selftests/vm/Makefile~selftest-vm-fix-build-error
+++ a/tools/testing/selftests/vm/Makefile
@@ -60,9 +60,13 @@ ifeq ($(CAN_BUILD_X86_64),1)
 TEST_GEN_FILES += $(BINARIES_64)
 endif
 else
+
+ifneq (,$(findstring $(ARCH),powerpc))
 TEST_GEN_FILES += protection_keys
 endif
 
+endif
+
 ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 ppc64le riscv64 s390x sh64 sparc64 x86_64))
 TEST_GEN_FILES += va_128TBswitch
 TEST_GEN_FILES += virtual_address_range
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 09/12] userfaultfd: selftests: fix SIGSEGV if huge mmap fails
  2020-12-06  6:14 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2020-12-06  6:15 ` [patch 08/12] tools/testing/selftests/vm: fix build error Andrew Morton
@ 2020-12-06  6:15 ` Andrew Morton
  2020-12-06  6:15 ` [patch 10/12] mm/filemap: add static for function __add_to_page_cache_locked Andrew Morton
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:15 UTC (permalink / raw)
  To: aarcange, akpm, axelrasmussen, dgilbert, joe, linux-mm,
	mm-commits, peterx, rppt, shuah, torvalds

From: Axel Rasmussen <axelrasmussen@google.com>
Subject: userfaultfd: selftests: fix SIGSEGV if huge mmap fails

The error handling in hugetlb_allocate_area() was incorrect for the
hugetlb_shared test case.

Previously the behavior was:

- mmap a hugetlb area
  - If this fails, set the pointer to NULL, and carry on
- mmap an alias of the same hugetlb fd
  - If this fails, munmap the original area

If the original mmap failed, it's likely the second one did too.  If both
failed, we'd blindly try to munmap a NULL pointer, causing a SIGSEGV. 
Instead, "goto fail" so we return before trying to mmap the alias.

This issue can be hit "in real life" by forgetting to set
/proc/sys/vm/nr_hugepages (leaving it at 0), and then trying to run the
hugetlb_shared test.

Another small improvement is, when the original mmap fails, don't just
print "it failed": perror(), so we can see *why*.  :)

Link: https://lkml.kernel.org/r/20201204203443.2714693-1-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/vm/userfaultfd.c |   25 +++++++++++++--------
 1 file changed, 16 insertions(+), 9 deletions(-)

--- a/tools/testing/selftests/vm/userfaultfd.c~userfaultfd-selftests-fix-sigsegv-if-huge-mmap-fails
+++ a/tools/testing/selftests/vm/userfaultfd.c
@@ -206,19 +206,19 @@ static int hugetlb_release_pages(char *r
 	return ret;
 }
 
-
 static void hugetlb_allocate_area(void **alloc_area)
 {
 	void *area_alias = NULL;
 	char **alloc_area_alias;
+
 	*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
 			   (map_shared ? MAP_SHARED : MAP_PRIVATE) |
 			   MAP_HUGETLB,
 			   huge_fd, *alloc_area == area_src ? 0 :
 			   nr_pages * page_size);
 	if (*alloc_area == MAP_FAILED) {
-		fprintf(stderr, "mmap of hugetlbfs file failed\n");
-		*alloc_area = NULL;
+		perror("mmap of hugetlbfs file failed");
+		goto fail;
 	}
 
 	if (map_shared) {
@@ -227,14 +227,11 @@ static void hugetlb_allocate_area(void *
 				  huge_fd, *alloc_area == area_src ? 0 :
 				  nr_pages * page_size);
 		if (area_alias == MAP_FAILED) {
-			if (munmap(*alloc_area, nr_pages * page_size) < 0) {
-				perror("hugetlb munmap");
-				exit(1);
-			}
-			*alloc_area = NULL;
-			return;
+			perror("mmap of hugetlb file alias failed");
+			goto fail_munmap;
 		}
 	}
+
 	if (*alloc_area == area_src) {
 		huge_fd_off0 = *alloc_area;
 		alloc_area_alias = &area_src_alias;
@@ -243,6 +240,16 @@ static void hugetlb_allocate_area(void *
 	}
 	if (area_alias)
 		*alloc_area_alias = area_alias;
+
+	return;
+
+fail_munmap:
+	if (munmap(*alloc_area, nr_pages * page_size) < 0) {
+		perror("hugetlb munmap");
+		exit(1);
+	}
+fail:
+	*alloc_area = NULL;
 }
 
 static void hugetlb_alias_mapping(__u64 *start, size_t len, unsigned long offset)
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 10/12] mm/filemap: add static for function __add_to_page_cache_locked
  2020-12-06  6:14 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2020-12-06  6:15 ` [patch 09/12] userfaultfd: selftests: fix SIGSEGV if huge mmap fails Andrew Morton
@ 2020-12-06  6:15 ` Andrew Morton
  2020-12-06  6:15 ` [patch 11/12] hugetlb_cgroup: fix offline of hugetlb cgroup with reservations Andrew Morton
  2020-12-06  6:15 ` [patch 12/12] mm/mmap.c: fix mmap return value when vma is merged after call_mmap() Andrew Morton
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:15 UTC (permalink / raw)
  To: akpm, alex.shi, jrdr.linux, linux-mm, mm-commits, torvalds

From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/filemap: add static for function __add_to_page_cache_locked

../mm/filemap.c:830:14: warning: no previous prototype for
`__add_to_page_cache_locked' [-Wmissing-prototypes]
 noinline int __add_to_page_cache_locked(struct page *page,

Link: https://lkml.kernel.org/r/1604661895-5495-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/filemap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/filemap.c~mm-filemap-add-static-for-function-__add_to_page_cache_locked
+++ a/mm/filemap.c
@@ -827,7 +827,7 @@ int replace_page_cache_page(struct page
 }
 EXPORT_SYMBOL_GPL(replace_page_cache_page);
 
-noinline int __add_to_page_cache_locked(struct page *page,
+static noinline int __add_to_page_cache_locked(struct page *page,
 					struct address_space *mapping,
 					pgoff_t offset, gfp_t gfp,
 					void **shadowp)
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 11/12] hugetlb_cgroup: fix offline of hugetlb cgroup with reservations
  2020-12-06  6:14 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2020-12-06  6:15 ` [patch 10/12] mm/filemap: add static for function __add_to_page_cache_locked Andrew Morton
@ 2020-12-06  6:15 ` Andrew Morton
  2020-12-06  6:15 ` [patch 12/12] mm/mmap.c: fix mmap return value when vma is merged after call_mmap() Andrew Morton
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:15 UTC (permalink / raw)
  To: akpm, almasrymina, amorenoz, gthelen, linux-mm, mike.kravetz,
	mm-commits, rientjes, sandipan, shakeelb, shuah, stable,
	torvalds

From: Mike Kravetz <mike.kravetz@oracle.com>
Subject: hugetlb_cgroup: fix offline of hugetlb cgroup with reservations

Adrian Moreno was ruuning a kubernetes 1.19 + containerd/docker workload
using hugetlbfs.  In this environment the issue is reproduced by:
1 - Start a simple pod that uses the recently added HugePages medium
    feature (pod yaml attached)
2 - Start a DPDK app. It doesn't need to run successfully (as in transfer
    packets) nor interact with real hardware. It seems just initializing
    the EAL layer (which handles hugepage reservation and locking) is
    enough to trigger the issue
3 - Delete the Pod (or let it "Complete").

This would result in a kworker thread going into a tight loop (top output):
 1425 root      20   0       0      0      0 R  99.7   0.0   5:22.45
kworker/28:7+cgroup_destroy

'perf top -g' reports:
-   63.28%     0.01%  [kernel]                    [k] worker_thread
   - 49.97% worker_thread
      - 52.64% process_one_work
         - 62.08% css_killed_work_fn
            - hugetlb_cgroup_css_offline
                 41.52% _raw_spin_lock
               - 2.82% _cond_resched
                    rcu_all_qs
                 2.66% PageHuge
      - 0.57% schedule
         - 0.57% __schedule

We are spinning in the do-while loop in hugetlb_cgroup_css_offline.
Worse yet, we are holding the master cgroup lock (cgroup_mutex) while
infinitely spinning.  Little else can be done on the system as the
cgroup_mutex can not be acquired.

Do note that the issue can be reproduced by simply offlining a hugetlb
cgroup containing pages with reservation counts.

The loop in hugetlb_cgroup_css_offline is moving page counts from the
cgroup being offlined to the parent cgroup.  This is done for each hstate,
and is repeated until hugetlb_cgroup_have_usage returns false.  The routine
moving counts (hugetlb_cgroup_move_parent) is only moving 'usage' counts.
The routine hugetlb_cgroup_have_usage is checking for both 'usage' and
'reservation' counts.  Discussion about what to do with reservation
counts when reparenting was discussed here:

https://lore.kernel.org/linux-kselftest/CAHS8izMFAYTgxym-Hzb_JmkTK1N_S9tGN71uS6MFV+R7swYu5A@mail.gmail.com/

The decision was made to leave a zombie cgroup for with reservation
counts.  Unfortunately, the code checking reservation counts was
incorrectly added to hugetlb_cgroup_have_usage.

To fix the issue, simply remove the check for reservation counts.  While
fixing this issue, a related bug in hugetlb_cgroup_css_offline was noticed.
The hstate index is not reinitialized each time through the do-while loop.
Fix this as well.

Link: https://lkml.kernel.org/r/20201203220242.158165-1-mike.kravetz@oracle.com
Fixes: 1adc4d419aa2 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations")
Reported-by: Adrian Moreno <amorenoz@redhat.com>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hugetlb_cgroup.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

--- a/mm/hugetlb_cgroup.c~hugetlb_cgroup-fix-offline-of-hugetlb-cgroup-with-reservations
+++ a/mm/hugetlb_cgroup.c
@@ -82,11 +82,8 @@ static inline bool hugetlb_cgroup_have_u
 
 	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
 		if (page_counter_read(
-			    hugetlb_cgroup_counter_from_cgroup(h_cg, idx)) ||
-		    page_counter_read(hugetlb_cgroup_counter_from_cgroup_rsvd(
-			    h_cg, idx))) {
+				hugetlb_cgroup_counter_from_cgroup(h_cg, idx)))
 			return true;
-		}
 	}
 	return false;
 }
@@ -202,9 +199,10 @@ static void hugetlb_cgroup_css_offline(s
 	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(css);
 	struct hstate *h;
 	struct page *page;
-	int idx = 0;
+	int idx;
 
 	do {
+		idx = 0;
 		for_each_hstate(h) {
 			spin_lock(&hugetlb_lock);
 			list_for_each_entry(page, &h->hugepage_activelist, lru)
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 12/12] mm/mmap.c: fix mmap return value when vma is merged after call_mmap()
  2020-12-06  6:14 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2020-12-06  6:15 ` [patch 11/12] hugetlb_cgroup: fix offline of hugetlb cgroup with reservations Andrew Morton
@ 2020-12-06  6:15 ` Andrew Morton
  11 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2020-12-06  6:15 UTC (permalink / raw)
  To: akpm, david, hushiyuan, jgg, linmiaohe, linux-mm, liuzixian4,
	louhongxiang, mm-commits, torvalds, willy

From: Liu Zixian <liuzixian4@huawei.com>
Subject: mm/mmap.c: fix mmap return value when vma is merged after call_mmap()

On success, mmap should return the begin address of newly mapped area, but
patch "mm: mmap: merge vma after call_mmap() if possible" set vm_start of
newly merged vma to return value addr.  Users of mmap will get wrong
address if vma is merged after call_mmap().  We fix this by moving the
assignment to addr before merging vma.

We have a driver which changes vm_flags, and this bug is found by our
testcases.

Link: https://lkml.kernel.org/r/20201203085350.22624-1-liuzixian4@huawei.com
Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible")
Signed-off-by: Liu Zixian <liuzixian4@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hongxiang Lou <louhongxiang@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mmap.c |   26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

--- a/mm/mmap.c~fix-mmap-return-value-when-vma-is-merged-after-call_mmap
+++ a/mm/mmap.c
@@ -1808,6 +1808,17 @@ unsigned long mmap_region(struct file *f
 		if (error)
 			goto unmap_and_free_vma;
 
+		/* Can addr have changed??
+		 *
+		 * Answer: Yes, several device drivers can do it in their
+		 *         f_op->mmap method. -DaveM
+		 * Bug: If addr is changed, prev, rb_link, rb_parent should
+		 *      be updated for vma_link()
+		 */
+		WARN_ON_ONCE(addr != vma->vm_start);
+
+		addr = vma->vm_start;
+
 		/* If vm_flags changed after call_mmap(), we should try merge vma again
 		 * as we may succeed this time.
 		 */
@@ -1822,25 +1833,12 @@ unsigned long mmap_region(struct file *f
 				fput(vma->vm_file);
 				vm_area_free(vma);
 				vma = merge;
-				/* Update vm_flags and possible addr to pick up the change. We don't
-				 * warn here if addr changed as the vma is not linked by vma_link().
-				 */
-				addr = vma->vm_start;
+				/* Update vm_flags to pick up the change. */
 				vm_flags = vma->vm_flags;
 				goto unmap_writable;
 			}
 		}
 
-		/* Can addr have changed??
-		 *
-		 * Answer: Yes, several device drivers can do it in their
-		 *         f_op->mmap method. -DaveM
-		 * Bug: If addr is changed, prev, rb_link, rb_parent should
-		 *      be updated for vma_link()
-		 */
-		WARN_ON_ONCE(addr != vma->vm_start);
-
-		addr = vma->vm_start;
 		vm_flags = vma->vm_flags;
 	} else if (vm_flags & VM_SHARED) {
 		error = shmem_zero_setup(vma);
_

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-12-18 22:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-06  6:14 incoming Andrew Morton
2020-12-06  6:14 ` [patch 01/12] zlib: export S390 symbols for zlib modules Andrew Morton
2020-12-07  9:03   ` Zaslonko Mikhail
2020-12-08  0:45     ` Randy Dunlap
2020-12-08  8:19       ` Zaslonko Mikhail
2020-12-18 22:18         ` Randy Dunlap
2020-12-06  6:14 ` [patch 02/12] coredump: fix core_pattern parse error Andrew Morton
2020-12-06  6:14 ` [patch 03/12] mm: memcg/slab: fix obj_cgroup_charge() return value handling Andrew Morton
2020-12-06  6:14 ` [patch 04/12] mm: list_lru: set shrinker map bit when child nr_items is not zero Andrew Morton
2020-12-06  6:14 ` [patch 05/12] mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING Andrew Morton
2020-12-06  6:14 ` [patch 06/12] mm/swapfile: do not sleep with a spin lock held Andrew Morton
2020-12-06  6:14 ` [patch 07/12] mailmap: add two more addresses of Uwe Kleine-König Andrew Morton
2020-12-06  6:15 ` [patch 08/12] tools/testing/selftests/vm: fix build error Andrew Morton
2020-12-06  6:15 ` [patch 09/12] userfaultfd: selftests: fix SIGSEGV if huge mmap fails Andrew Morton
2020-12-06  6:15 ` [patch 10/12] mm/filemap: add static for function __add_to_page_cache_locked Andrew Morton
2020-12-06  6:15 ` [patch 11/12] hugetlb_cgroup: fix offline of hugetlb cgroup with reservations Andrew Morton
2020-12-06  6:15 ` [patch 12/12] mm/mmap.c: fix mmap return value when vma is merged after call_mmap() Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).