All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
@ 2021-10-17 13:36 kernel test robot
  0 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-10-17 13:36 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 15194 bytes --]

CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
In-Reply-To: <87mtnavszx.fsf_-_@disp2133>
References: <87mtnavszx.fsf_-_@disp2133>
TO: "Eric W. Biederman" <ebiederm@xmission.com>
TO: Rune Kleveland <rune.kleveland@infomedia.dk>
CC: Yu Zhao <yuzhao@google.com>
CC: Alexey Gladkov <legion@kernel.org>
CC: Jordan Glover <Golden_Miller83@protonmail.ch>
CC: LKML <linux-kernel@vger.kernel.org>
CC: linux-mm(a)kvack.org
CC: containers(a)lists.linux-foundation.org

Hi "Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.15-rc5 next-20211015]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8fe31e0995f048d16b378b90926793a0aa4af1e5
:::::: branch date: 2 days ago
:::::: commit date: 2 days ago
config: arm-randconfig-c002-20211017 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/e042a898defa264b6a95a439b8570486b47bcd49
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
        git checkout e042a898defa264b6a95a439b8570486b47bcd49
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm clang-analyzer 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


clang-analyzer warnings: (new ones prefixed by >>)
   include/linux/compiler_types.h:302:3: note: expanded from macro '__compiletime_assert'
                   if (!(condition))                                       \
                   ^
   fs/gfs2/recovery.c:114:8: note: Loop condition is false.  Exiting loop
                   rr = list_first_entry(head, struct gfs2_revoke_replay, rr_list);
                        ^
   include/linux/list.h:522:2: note: expanded from macro 'list_first_entry'
           list_entry((ptr)->next, type, member)
           ^
   include/linux/list.h:511:2: note: expanded from macro 'list_entry'
           container_of(ptr, type, member)
           ^
   include/linux/kernel.h:495:2: note: expanded from macro 'container_of'
           BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&   \
           ^
   note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:300:2: note: expanded from macro '__compiletime_assert'
           do {                                                            \
           ^
   fs/gfs2/recovery.c:115:3: note: Calling 'list_del'
                   list_del(&rr->rr_list);
                   ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/list.h:146:2: note: Calling '__list_del_entry'
           __list_del_entry(entry);
           ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/list.h:132:2: note: Taking false branch
           if (!__list_del_entry_valid(entry))
           ^
   include/linux/list.h:135:13: note: Use of memory after it is freed
           __list_del(entry->prev, entry->next);
                      ^~~~~~~~~~~
   Suppressed 6 warnings (6 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   6 warnings generated.
   Suppressed 6 warnings (6 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   6 warnings generated.
   Suppressed 6 warnings (6 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   6 warnings generated.
   Suppressed 6 warnings (6 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   6 warnings generated.
   Suppressed 6 warnings (6 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   2 warnings generated.
   Suppressed 2 warnings (2 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   2 warnings generated.
   Suppressed 2 warnings (2 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   5 warnings generated.
   include/linux/list.h:808:10: warning: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h') [clang-analyzer-core.NullDereference]
           return !h->pprev;
                   ^
   kernel/ucount.c:251:23: note: Assuming pointer value is null
           for (iter = ucounts; iter; iter = iter->ns->ucounts) {
                                ^~~~
   kernel/ucount.c:251:2: note: Loop condition is false. Execution continues on line 255
           for (iter = ucounts; iter; iter = iter->ns->ucounts) {
           ^
   kernel/ucount.c:255:14: note: Passing null pointer value via 1st parameter 'ucounts'
           put_ucounts(ucounts);
                       ^~~~~~~
   kernel/ucount.c:255:2: note: Calling 'put_ucounts'
           put_ucounts(ucounts);
           ^~~~~~~~~~~~~~~~~~~~
   kernel/ucount.c:204:6: note: Assuming the condition is true
           if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
               ^
   include/linux/spinlock.h:490:21: note: expanded from macro 'atomic_dec_and_lock_irqsave'
                   __cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags)))
                   ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler_types.h:48:28: note: expanded from macro '__cond_lock'
   # define __cond_lock(x,c) (c)
                              ^
   kernel/ucount.c:204:2: note: Taking true branch
           if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
           ^
   kernel/ucount.c:205:18: note: Passing null pointer value via 1st parameter 'n'
                   hlist_del_init(&ucounts->node);
                                  ^~~~~~~~~~~~~~
   kernel/ucount.c:205:3: note: Calling 'hlist_del_init'
                   hlist_del_init(&ucounts->node);
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/list.h:865:22: note: Passing null pointer value via 1st parameter 'h'
           if (!hlist_unhashed(n)) {
                               ^
   include/linux/list.h:865:7: note: Calling 'hlist_unhashed'
           if (!hlist_unhashed(n)) {
                ^~~~~~~~~~~~~~~~~
   include/linux/list.h:808:10: note: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h')
           return !h->pprev;
                   ^
>> kernel/ucount.c:291:44: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
           for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
                                                     ^
   kernel/ucount.c:309:2: note: Loop condition is true.  Entering loop body
           for (iter = ucounts; iter; iter = iter->ns->ucounts) {
           ^
   kernel/ucount.c:310:14: note: Left side of '||' is false
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/ucount.c:310:14: note: Left side of '||' is false
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/ucount.c:310:14: note: Left side of '||' is true
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:291:28: note: expanded from macro '__native_word'
            sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
                                     ^
   kernel/ucount.c:310:14: note: Taking false branch
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
           ^
   include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:302:3: note: expanded from macro '__compiletime_assert'
                   if (!(condition))                                       \
                   ^
   kernel/ucount.c:310:14: note: Loop condition is false.  Exiting loop
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
           ^
   include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:300:2: note: expanded from macro '__compiletime_assert'
           do {                                                            \
           ^
   kernel/ucount.c:312:7: note: Assuming 'new' is >= 0
                   if (new < 0 || new > max)
                       ^~~~~~~
   kernel/ucount.c:312:7: note: Left side of '||' is false
   kernel/ucount.c:312:18: note: Assuming 'new' is <= 'max'
                   if (new < 0 || new > max)
                                  ^~~~~~~~~
   kernel/ucount.c:312:3: note: Taking false branch
                   if (new < 0 || new > max)
                   ^
   kernel/ucount.c:314:12: note: 'iter' is equal to 'ucounts'
                   else if (iter == ucounts)
                            ^~~~
   kernel/ucount.c:314:8: note: Taking true branch
                   else if (iter == ucounts)
                        ^
   kernel/ucount.c:316:8: note: Assuming 'new' is not equal to 1
                   if ((new == 1) && (get_ucounts(iter) != iter))
                        ^~~~~~~~
   kernel/ucount.c:316:18: note: Left side of '&&' is false
                   if ((new == 1) && (get_ucounts(iter) != iter))
                                  ^
   kernel/ucount.c:309:2: note: Loop condition is true.  Entering loop body

vim +291 kernel/ucount.c

21d1c5e386bc75 Alexey Gladkov    2021-04-22  286  
e042a898defa26 Eric W. Biederman 2021-10-15  287  static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
e042a898defa26 Eric W. Biederman 2021-10-15  288  				struct ucounts *last, enum ucount_type type)
e042a898defa26 Eric W. Biederman 2021-10-15  289  {
e042a898defa26 Eric W. Biederman 2021-10-15  290  	struct ucounts *iter;
e042a898defa26 Eric W. Biederman 2021-10-15 @291  	for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
e042a898defa26 Eric W. Biederman 2021-10-15  292  		long dec = atomic_long_add_return(-1, &iter->ucount[type]);
e042a898defa26 Eric W. Biederman 2021-10-15  293  		WARN_ON_ONCE(dec < 0);
e042a898defa26 Eric W. Biederman 2021-10-15  294  		if (dec == 0)
e042a898defa26 Eric W. Biederman 2021-10-15  295  			put_ucounts(iter);
e042a898defa26 Eric W. Biederman 2021-10-15  296  	}
e042a898defa26 Eric W. Biederman 2021-10-15  297  }
e042a898defa26 Eric W. Biederman 2021-10-15  298  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33037 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
@ 2021-11-27  1:35 kernel test robot
  0 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-11-27  1:35 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 15037 bytes --]

CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
In-Reply-To: <87mtnavszx.fsf_-_@disp2133>
References: <87mtnavszx.fsf_-_@disp2133>
TO: "Eric W. Biederman" <ebiederm@xmission.com>

Hi "Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.16-rc2 next-20211126]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8fe31e0995f048d16b378b90926793a0aa4af1e5
:::::: branch date: 6 weeks ago
:::::: commit date: 6 weeks ago
config: arm-randconfig-c002-20211017 (https://download.01.org/0day-ci/archive/20211127/202111270907.h62AppsO-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/e042a898defa264b6a95a439b8570486b47bcd49
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
        git checkout e042a898defa264b6a95a439b8570486b47bcd49
        # save the config file to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm clang-analyzer 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


clang-analyzer warnings: (new ones prefixed by >>)
   fs/xfs/libxfs/xfs_rmap.c:207:6: note: Left side of '||' is false
   fs/xfs/libxfs/xfs_rmap.c:207:15: note: Assuming the condition is false
           if (error || !*stat)
                        ^~~~~~
   fs/xfs/libxfs/xfs_rmap.c:207:2: note: Taking false branch
           if (error || !*stat)
           ^
   fs/xfs/libxfs/xfs_rmap.c:210:6: note: Calling 'xfs_rmap_btrec_to_irec'
           if (xfs_rmap_btrec_to_irec(rec, irec))
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/xfs/libxfs/xfs_rmap.c:188:9: note: Calling 'xfs_rmap_irec_offset_unpack'
           return xfs_rmap_irec_offset_unpack(be64_to_cpu(rec->rmap.rm_offset),
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/xfs/libxfs/xfs_rmap.h:70:6: note: Assuming the condition is true
           if (offset & ~(XFS_RMAP_OFF_MASK | XFS_RMAP_OFF_FLAGS))
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/xfs/libxfs/xfs_rmap.h:70:2: note: Taking true branch
           if (offset & ~(XFS_RMAP_OFF_MASK | XFS_RMAP_OFF_FLAGS))
           ^
   fs/xfs/libxfs/xfs_rmap.h:71:3: note: Returning without writing to 'irec->rm_flags'
                   return -EFSCORRUPTED;
                   ^
   fs/xfs/libxfs/xfs_rmap.h:71:3: note: Returning the value -117, which participates in a condition later
                   return -EFSCORRUPTED;
                   ^~~~~~~~~~~~~~~~~~~~
   fs/xfs/libxfs/xfs_rmap.c:188:9: note: Returning from 'xfs_rmap_irec_offset_unpack'
           return xfs_rmap_irec_offset_unpack(be64_to_cpu(rec->rmap.rm_offset),
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/xfs/libxfs/xfs_rmap.c:188:2: note: Returning without writing to 'irec->rm_flags'
           return xfs_rmap_irec_offset_unpack(be64_to_cpu(rec->rmap.rm_offset),
           ^
   fs/xfs/libxfs/xfs_rmap.c:188:2: note: Returning the value -117, which participates in a condition later
           return xfs_rmap_irec_offset_unpack(be64_to_cpu(rec->rmap.rm_offset),
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/xfs/libxfs/xfs_rmap.c:210:6: note: Returning from 'xfs_rmap_btrec_to_irec'
           if (xfs_rmap_btrec_to_irec(rec, irec))
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/xfs/libxfs/xfs_rmap.c:210:2: note: Taking true branch
           if (xfs_rmap_btrec_to_irec(rec, irec))
           ^
   fs/xfs/libxfs/xfs_rmap.c:211:3: note: Control jumps to line 239
                   goto out_bad_rec;
                   ^
   fs/xfs/libxfs/xfs_rmap.c:242:2: note: 4th function call argument is an uninitialized value
           xfs_warn(mp,
           ^
   Suppressed 7 warnings (7 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   6 warnings generated.
   Suppressed 6 warnings (6 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   2 warnings generated.
   Suppressed 2 warnings (2 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   2 warnings generated.
   Suppressed 2 warnings (2 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   5 warnings generated.
   include/linux/list.h:808:10: warning: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h') [clang-analyzer-core.NullDereference]
           return !h->pprev;
                   ^
   kernel/ucount.c:251:23: note: Assuming pointer value is null
           for (iter = ucounts; iter; iter = iter->ns->ucounts) {
                                ^~~~
   kernel/ucount.c:251:2: note: Loop condition is false. Execution continues on line 255
           for (iter = ucounts; iter; iter = iter->ns->ucounts) {
           ^
   kernel/ucount.c:255:14: note: Passing null pointer value via 1st parameter 'ucounts'
           put_ucounts(ucounts);
                       ^~~~~~~
   kernel/ucount.c:255:2: note: Calling 'put_ucounts'
           put_ucounts(ucounts);
           ^~~~~~~~~~~~~~~~~~~~
   kernel/ucount.c:204:6: note: Assuming the condition is true
           if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
               ^
   include/linux/spinlock.h:490:21: note: expanded from macro 'atomic_dec_and_lock_irqsave'
                   __cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags)))
                   ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler_types.h:48:28: note: expanded from macro '__cond_lock'
   # define __cond_lock(x,c) (c)
                              ^
   kernel/ucount.c:204:2: note: Taking true branch
           if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
           ^
   kernel/ucount.c:205:18: note: Passing null pointer value via 1st parameter 'n'
                   hlist_del_init(&ucounts->node);
                                  ^~~~~~~~~~~~~~
   kernel/ucount.c:205:3: note: Calling 'hlist_del_init'
                   hlist_del_init(&ucounts->node);
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/list.h:865:22: note: Passing null pointer value via 1st parameter 'h'
           if (!hlist_unhashed(n)) {
                               ^
   include/linux/list.h:865:7: note: Calling 'hlist_unhashed'
           if (!hlist_unhashed(n)) {
                ^~~~~~~~~~~~~~~~~
   include/linux/list.h:808:10: note: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h')
           return !h->pprev;
                   ^
>> kernel/ucount.c:291:44: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
           for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
                                                     ^
   kernel/ucount.c:309:2: note: Loop condition is true.  Entering loop body
           for (iter = ucounts; iter; iter = iter->ns->ucounts) {
           ^
   kernel/ucount.c:310:14: note: Left side of '||' is false
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/ucount.c:310:14: note: Left side of '||' is false
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/ucount.c:310:14: note: Left side of '||' is true
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:291:28: note: expanded from macro '__native_word'
            sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
                                     ^
   kernel/ucount.c:310:14: note: Taking false branch
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
           ^
   include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:302:3: note: expanded from macro '__compiletime_assert'
                   if (!(condition))                                       \
                   ^
   kernel/ucount.c:310:14: note: Loop condition is false.  Exiting loop
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
           ^
   include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:300:2: note: expanded from macro '__compiletime_assert'
           do {                                                            \
           ^
   kernel/ucount.c:312:7: note: Assuming 'new' is >= 0
                   if (new < 0 || new > max)
                       ^~~~~~~
   kernel/ucount.c:312:7: note: Left side of '||' is false
   kernel/ucount.c:312:18: note: Assuming 'new' is <= 'max'
                   if (new < 0 || new > max)
                                  ^~~~~~~~~
   kernel/ucount.c:312:3: note: Taking false branch
                   if (new < 0 || new > max)
                   ^
   kernel/ucount.c:314:12: note: 'iter' is equal to 'ucounts'
                   else if (iter == ucounts)
                            ^~~~
   kernel/ucount.c:314:8: note: Taking true branch
                   else if (iter == ucounts)
                        ^
   kernel/ucount.c:316:8: note: Assuming 'new' is not equal to 1
                   if ((new == 1) && (get_ucounts(iter) != iter))
                        ^~~~~~~~
   kernel/ucount.c:316:18: note: Left side of '&&' is false
                   if ((new == 1) && (get_ucounts(iter) != iter))
                                  ^
   kernel/ucount.c:309:2: note: Loop condition is true.  Entering loop body

vim +291 kernel/ucount.c

21d1c5e386bc751 Alexey Gladkov    2021-04-22  286  
e042a898defa264 Eric W. Biederman 2021-10-15  287  static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
e042a898defa264 Eric W. Biederman 2021-10-15  288  				struct ucounts *last, enum ucount_type type)
e042a898defa264 Eric W. Biederman 2021-10-15  289  {
e042a898defa264 Eric W. Biederman 2021-10-15  290  	struct ucounts *iter;
e042a898defa264 Eric W. Biederman 2021-10-15 @291  	for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
e042a898defa264 Eric W. Biederman 2021-10-15  292  		long dec = atomic_long_add_return(-1, &iter->ucount[type]);
e042a898defa264 Eric W. Biederman 2021-10-15  293  		WARN_ON_ONCE(dec < 0);
e042a898defa264 Eric W. Biederman 2021-10-15  294  		if (dec == 0)
e042a898defa264 Eric W. Biederman 2021-10-15  295  			put_ucounts(iter);
e042a898defa264 Eric W. Biederman 2021-10-15  296  	}
e042a898defa264 Eric W. Biederman 2021-10-15  297  }
e042a898defa264 Eric W. Biederman 2021-10-15  298  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
@ 2021-11-26 15:09 kernel test robot
  0 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-11-26 15:09 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 15108 bytes --]

CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
In-Reply-To: <87mtnavszx.fsf_-_@disp2133>
References: <87mtnavszx.fsf_-_@disp2133>
TO: "Eric W. Biederman" <ebiederm@xmission.com>

Hi "Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.16-rc2 next-20211126]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8fe31e0995f048d16b378b90926793a0aa4af1e5
:::::: branch date: 6 weeks ago
:::::: commit date: 6 weeks ago
config: arm-randconfig-c002-20211017 (https://download.01.org/0day-ci/archive/20211126/202111262308.9Mq1UEM2-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/e042a898defa264b6a95a439b8570486b47bcd49
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
        git checkout e042a898defa264b6a95a439b8570486b47bcd49
        # save the config file to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm clang-analyzer 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


clang-analyzer warnings: (new ones prefixed by >>)
   fs/notify/fsnotify.c:212:15: note: Left side of '&&' is true
           if (unlikely(parent_watched && !p_mask))
                        ^
   fs/notify/fsnotify.c:212:33: note: Assuming 'p_mask' is not equal to 0
           if (unlikely(parent_watched && !p_mask))
                                          ^
   include/linux/compiler.h:78:42: note: expanded from macro 'unlikely'
   # define unlikely(x)    __builtin_expect(!!(x), 0)
                                               ^
   fs/notify/fsnotify.c:212:2: note: Taking false branch
           if (unlikely(parent_watched && !p_mask))
           ^
   fs/notify/fsnotify.c:220:6: note: 'parent_needed' is false
           if (parent_needed || parent_interested) {
               ^~~~~~~~~~~~~
   fs/notify/fsnotify.c:220:6: note: Left side of '||' is false
   fs/notify/fsnotify.c:220:23: note: Assuming 'parent_interested' is true
           if (parent_needed || parent_interested) {
                                ^~~~~~~~~~~~~~~~~
   fs/notify/fsnotify.c:220:2: note: Taking true branch
           if (parent_needed || parent_interested) {
           ^
   fs/notify/fsnotify.c:222:45: note: Passing null pointer value via 1st parameter 'data'
                   WARN_ON_ONCE(inode != fsnotify_data_inode(data, data_type));
                                                             ^
   include/asm-generic/bug.h:146:18: note: expanded from macro 'WARN_ON_ONCE'
           DO_ONCE_LITE_IF(condition, WARN_ON, 1)
                           ^~~~~~~~~
   include/linux/once_lite.h:15:27: note: expanded from macro 'DO_ONCE_LITE_IF'
                   bool __ret_do_once = !!(condition);                     \
                                           ^~~~~~~~~
   fs/notify/fsnotify.c:222:25: note: Calling 'fsnotify_data_inode'
                   WARN_ON_ONCE(inode != fsnotify_data_inode(data, data_type));
                                         ^
   include/asm-generic/bug.h:146:18: note: expanded from macro 'WARN_ON_ONCE'
           DO_ONCE_LITE_IF(condition, WARN_ON, 1)
                           ^~~~~~~~~
   include/linux/once_lite.h:15:27: note: expanded from macro 'DO_ONCE_LITE_IF'
                   bool __ret_do_once = !!(condition);                     \
                                           ^~~~~~~~~
   include/linux/fsnotify_backend.h:255:2: note: Control jumps to 'case FSNOTIFY_EVENT_PATH:'  at line 258
           switch (data_type) {
           ^
   include/linux/fsnotify_backend.h:259:18: note: Access to field 'dentry' results in a dereference of a null pointer (loaded from variable 'data')
                   return d_inode(((const struct path *)data)->dentry);
                                  ^                     ~~~~
   Suppressed 6 warnings (6 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   6 warnings generated.
   Suppressed 6 warnings (6 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   2 warnings generated.
   Suppressed 2 warnings (2 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   2 warnings generated.
   Suppressed 2 warnings (2 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   5 warnings generated.
   include/linux/list.h:808:10: warning: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h') [clang-analyzer-core.NullDereference]
           return !h->pprev;
                   ^
   kernel/ucount.c:251:23: note: Assuming pointer value is null
           for (iter = ucounts; iter; iter = iter->ns->ucounts) {
                                ^~~~
   kernel/ucount.c:251:2: note: Loop condition is false. Execution continues on line 255
           for (iter = ucounts; iter; iter = iter->ns->ucounts) {
           ^
   kernel/ucount.c:255:14: note: Passing null pointer value via 1st parameter 'ucounts'
           put_ucounts(ucounts);
                       ^~~~~~~
   kernel/ucount.c:255:2: note: Calling 'put_ucounts'
           put_ucounts(ucounts);
           ^~~~~~~~~~~~~~~~~~~~
   kernel/ucount.c:204:6: note: Assuming the condition is true
           if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
               ^
   include/linux/spinlock.h:490:21: note: expanded from macro 'atomic_dec_and_lock_irqsave'
                   __cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags)))
                   ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler_types.h:48:28: note: expanded from macro '__cond_lock'
   # define __cond_lock(x,c) (c)
                              ^
   kernel/ucount.c:204:2: note: Taking true branch
           if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
           ^
   kernel/ucount.c:205:18: note: Passing null pointer value via 1st parameter 'n'
                   hlist_del_init(&ucounts->node);
                                  ^~~~~~~~~~~~~~
   kernel/ucount.c:205:3: note: Calling 'hlist_del_init'
                   hlist_del_init(&ucounts->node);
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/list.h:865:22: note: Passing null pointer value via 1st parameter 'h'
           if (!hlist_unhashed(n)) {
                               ^
   include/linux/list.h:865:7: note: Calling 'hlist_unhashed'
           if (!hlist_unhashed(n)) {
                ^~~~~~~~~~~~~~~~~
   include/linux/list.h:808:10: note: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h')
           return !h->pprev;
                   ^
>> kernel/ucount.c:291:44: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
           for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
                                                     ^
   kernel/ucount.c:309:2: note: Loop condition is true.  Entering loop body
           for (iter = ucounts; iter; iter = iter->ns->ucounts) {
           ^
   kernel/ucount.c:310:14: note: Left side of '||' is false
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/ucount.c:310:14: note: Left side of '||' is false
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/ucount.c:310:14: note: Left side of '||' is true
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:291:28: note: expanded from macro '__native_word'
            sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
                                     ^
   kernel/ucount.c:310:14: note: Taking false branch
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
           ^
   include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:302:3: note: expanded from macro '__compiletime_assert'
                   if (!(condition))                                       \
                   ^
   kernel/ucount.c:310:14: note: Loop condition is false.  Exiting loop
                   long max = READ_ONCE(iter->ns->ucount_max[type]);
                              ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
           ^
   include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:300:2: note: expanded from macro '__compiletime_assert'
           do {                                                            \
           ^
   kernel/ucount.c:312:7: note: Assuming 'new' is >= 0
                   if (new < 0 || new > max)
                       ^~~~~~~
   kernel/ucount.c:312:7: note: Left side of '||' is false
   kernel/ucount.c:312:18: note: Assuming 'new' is <= 'max'
                   if (new < 0 || new > max)
                                  ^~~~~~~~~
   kernel/ucount.c:312:3: note: Taking false branch
                   if (new < 0 || new > max)
                   ^
   kernel/ucount.c:314:12: note: 'iter' is equal to 'ucounts'
                   else if (iter == ucounts)
                            ^~~~
   kernel/ucount.c:314:8: note: Taking true branch
                   else if (iter == ucounts)
                        ^
   kernel/ucount.c:316:8: note: Assuming 'new' is not equal to 1
                   if ((new == 1) && (get_ucounts(iter) != iter))
                        ^~~~~~~~
   kernel/ucount.c:316:18: note: Left side of '&&' is false
                   if ((new == 1) && (get_ucounts(iter) != iter))
                                  ^
   kernel/ucount.c:309:2: note: Loop condition is true.  Entering loop body

vim +291 kernel/ucount.c

21d1c5e386bc751 Alexey Gladkov    2021-04-22  286  
e042a898defa264 Eric W. Biederman 2021-10-15  287  static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
e042a898defa264 Eric W. Biederman 2021-10-15  288  				struct ucounts *last, enum ucount_type type)
e042a898defa264 Eric W. Biederman 2021-10-15  289  {
e042a898defa264 Eric W. Biederman 2021-10-15  290  	struct ucounts *iter;
e042a898defa264 Eric W. Biederman 2021-10-15 @291  	for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
e042a898defa264 Eric W. Biederman 2021-10-15  292  		long dec = atomic_long_add_return(-1, &iter->ucount[type]);
e042a898defa264 Eric W. Biederman 2021-10-15  293  		WARN_ON_ONCE(dec < 0);
e042a898defa264 Eric W. Biederman 2021-10-15  294  		if (dec == 0)
e042a898defa264 Eric W. Biederman 2021-10-15  295  			put_ucounts(iter);
e042a898defa264 Eric W. Biederman 2021-10-15  296  	}
e042a898defa264 Eric W. Biederman 2021-10-15  297  }
e042a898defa264 Eric W. Biederman 2021-10-15  298  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-17 19:35                               ` Yu Zhao
@ 2021-10-18 15:35                                 ` Eric W. Biederman
  0 siblings, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-18 15:35 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Alexey Gladkov, Rune Kleveland, Jordan Glover, LKML, Linux-MM,
	containers

Yu Zhao <yuzhao@google.com> writes:

> On Sat, Oct 16, 2021 at 11:35 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Alexey Gladkov <legion@kernel.org> writes:
>>
>> > On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote:
>> >> +                    goto dec_unwind;
>> >> +    }
>> >> +    return ret;
>> >> +dec_unwind:
>> >> +    dec = atomic_long_add_return(1, &iter->ucount[type]);
>> >
>> > Should be -1 ?
>>
>> Yes it should.  I will fix and resend.
>
> Or just atomic_long_dec_return().

It would have to be atomic_long_sub_return().

Even then I would want to change all of kernel/ucount.c to use
the same helper function so discrepancies can easily be spotted.

It is a good idea, just not I think for this particular patch.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-18  6:25                             ` Yu Zhao
@ 2021-10-18 10:31                               ` Jordan Glover
  0 siblings, 0 replies; 13+ messages in thread
From: Jordan Glover @ 2021-10-18 10:31 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Rune Kleveland, Eric W. Biederman, Alexey Gladkov, LKML,
	Linux-MM, containers\\@lists.linux-foundation.org

On Monday, October 18th, 2021 at 6:25 AM, Yu Zhao <yuzhao@google.com> wrote:

> On Sun, Oct 17, 2021 at 10:47 AM Rune Kleveland
>
> rune.kleveland@infomedia.dk wrote:
>
> > Hi!
> >
> > After applying the below patch, the 5 most problematic servers have run
> >
> > without any issues for 23 hours. That never happened before the patch on
> >
> > 5.14, so the patch seems to have fixed the issue for me.
>
> Confirm. I couldn't reproduce the problem on 5.14 either.
>

I'm also unable to reproduce the crash as for now. Thx for the patch.

Jordan

> > On Monday there will be more load on the servers, which caused them to
> >
> > crash faster without the patch. I will let you know if it happens again.
> >
> > Best regards,
> >
> > Rune
> >
> > On 16/10/2021 00:10, Eric W. Biederman wrote:
> >
> > > In commit fda31c50292a ("signal: avoid double atomic counter
> > >
> > > increments for user accounting") Linus made a clever optimization to
> > >
> > > how rlimits and the struct user_struct. Unfortunately that
> > >
> > > optimization does not work in the obvious way when moved to nested
> > >
> > > rlimits. The problem is that the last decrement of the per user
> > >
> > > namespace per user sigpending counter might also be the last decrement
> > >
> > > of the sigpending counter in the parent user namespace as well. Which
> > >
> > > means that simply freeing the leaf ucount in __free_sigqueue is not
> > >
> > > enough.
> > >
> > > Maintain the optimization and handle the tricky cases by introducing
> > >
> > > inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
> > >
> > > By moving the entire optimization into functions that perform all of
> > >
> > > the work it becomes possible to ensure that every level is handled
> > >
> > > properly.
> > >
> > > I wish we had a single user across all of the threads whose rlimit
> > >
> > > could be charged so we did not need this complexity.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-17 16:47                           ` Rune Kleveland
@ 2021-10-18  6:25                             ` Yu Zhao
  2021-10-18 10:31                               ` Jordan Glover
  0 siblings, 1 reply; 13+ messages in thread
From: Yu Zhao @ 2021-10-18  6:25 UTC (permalink / raw)
  To: Rune Kleveland, Eric W. Biederman
  Cc: Alexey Gladkov, Jordan Glover, LKML, Linux-MM,
	containers\@lists.linux-foundation.org

On Sun, Oct 17, 2021 at 10:47 AM Rune Kleveland
<rune.kleveland@infomedia.dk> wrote:
>
> Hi!
>
> After applying the below patch, the 5 most problematic servers have run
> without any issues for 23 hours. That never happened before the patch on
> 5.14, so the patch seems to have fixed the issue for me.

Confirm. I couldn't reproduce the problem on 5.14 either.

> On Monday there will be more load on the servers, which caused them to
> crash faster without the patch. I will let you know if it happens again.
>
> Best regards,
> Rune
>
> On 16/10/2021 00:10, Eric W. Biederman wrote:
> >
> > In commit fda31c50292a ("signal: avoid double atomic counter
> > increments for user accounting") Linus made a clever optimization to
> > how rlimits and the struct user_struct.  Unfortunately that
> > optimization does not work in the obvious way when moved to nested
> > rlimits.  The problem is that the last decrement of the per user
> > namespace per user sigpending counter might also be the last decrement
> > of the sigpending counter in the parent user namespace as well.  Which
> > means that simply freeing the leaf ucount in __free_sigqueue is not
> > enough.
> >
> > Maintain the optimization and handle the tricky cases by introducing
> > inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
> >
> > By moving the entire optimization into functions that perform all of
> > the work it becomes possible to ensure that every level is handled
> > properly.
> >
> > I wish we had a single user across all of the threads whose rlimit
> > could be charged so we did not need this complexity.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-16 17:34                             ` Eric W. Biederman
@ 2021-10-17 19:35                               ` Yu Zhao
  2021-10-18 15:35                                 ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: Yu Zhao @ 2021-10-17 19:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexey Gladkov, Rune Kleveland, Jordan Glover, LKML, Linux-MM,
	containers\@lists.linux-foundation.org

On Sat, Oct 16, 2021 at 11:35 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Alexey Gladkov <legion@kernel.org> writes:
>
> > On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote:
> >>
> >> In commit fda31c50292a ("signal: avoid double atomic counter
> >> increments for user accounting") Linus made a clever optimization to
> >> how rlimits and the struct user_struct.  Unfortunately that
> >> optimization does not work in the obvious way when moved to nested
> >> rlimits.  The problem is that the last decrement of the per user
> >> namespace per user sigpending counter might also be the last decrement
> >> of the sigpending counter in the parent user namespace as well.  Which
> >> means that simply freeing the leaf ucount in __free_sigqueue is not
> >> enough.
> >>
> >> Maintain the optimization and handle the tricky cases by introducing
> >> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
> >>
> >> By moving the entire optimization into functions that perform all of
> >> the work it becomes possible to ensure that every level is handled
> >> properly.
> >>
> >> I wish we had a single user across all of the threads whose rlimit
> >> could be charged so we did not need this complexity.
> >>
> >> Cc: stable@vger.kernel.org
> >> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >> ---
> >>
> >> With a lot of help from Alex who found a way I could reproduce this
> >> I believe I have found the issue.
> >>
> >> Could people who are seeing this issue test and verify this solves the
> >> problem for them?
> >>
> >>  include/linux/user_namespace.h |  2 ++
> >>  kernel/signal.c                | 25 +++++----------------
> >>  kernel/ucount.c                | 41 ++++++++++++++++++++++++++++++++++
> >>  3 files changed, 49 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> index eb70cabe6e7f..33a4240e6a6f 100644
> >> --- a/include/linux/user_namespace.h
> >> +++ b/include/linux/user_namespace.h
> >> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
> >>
> >>  long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> >>  bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> >> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
> >> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
> >>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
> >>
> >>  static inline void set_rlimit_ucount_max(struct user_namespace *ns,
> >> diff --git a/kernel/signal.c b/kernel/signal.c
> >> index a3229add4455..762de58c6e76 100644
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
> >>       */
> >>      rcu_read_lock();
> >>      ucounts = task_ucounts(t);
> >> -    sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> >> -    switch (sigpending) {
> >> -    case 1:
> >> -            if (likely(get_ucounts(ucounts)))
> >> -                    break;
> >> -            fallthrough;
> >> -    case LONG_MAX:
> >> -            /*
> >> -             * we need to decrease the ucount in the userns tree on any
> >> -             * failure to avoid counts leaking.
> >> -             */
> >> -            dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> >> -            rcu_read_unlock();
> >> -            return NULL;
> >> -    }
> >> +    sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
> >>      rcu_read_unlock();
> >> +    if (sigpending == LONG_MAX)
> >> +            return NULL;
> >>
> >>      if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
> >>              q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
> >> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
> >>      }
> >>
> >>      if (unlikely(q == NULL)) {
> >> -            if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
> >> -                    put_ucounts(ucounts);
> >> +            dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
> >>      } else {
> >>              INIT_LIST_HEAD(&q->list);
> >>              q->flags = sigqueue_flags;
> >> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
> >>  {
> >>      if (q->flags & SIGQUEUE_PREALLOC)
> >>              return;
> >> -    if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
> >> -            put_ucounts(q->ucounts);
> >> +    if (q->ucounts) {
> >> +            dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
> >>              q->ucounts = NULL;
> >>      }
> >>      kmem_cache_free(sigqueue_cachep, q);
> >> diff --git a/kernel/ucount.c b/kernel/ucount.c
> >> index 3b7e176cf7a2..687d77aa66bb 100644
> >> --- a/kernel/ucount.c
> >> +++ b/kernel/ucount.c
> >> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
> >>      return (new == 0);
> >>  }
> >>
> >> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
> >> +                            struct ucounts *last, enum ucount_type type)
> >> +{
> >> +    struct ucounts *iter;
> >> +    for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
> >> +            long dec = atomic_long_add_return(-1, &iter->ucount[type]);
> >> +            WARN_ON_ONCE(dec < 0);
> >> +            if (dec == 0)
> >> +                    put_ucounts(iter);
> >> +    }
> >> +}
> >> +
> >> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
> >> +{
> >> +    do_dec_rlimit_put_ucounts(ucounts, NULL, type);
> >> +}
> >> +
> >> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
> >> +{
> >> +    struct ucounts *iter;
> >> +    long dec, ret = 0;
> >> +
> >> +    for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> >> +            long max = READ_ONCE(iter->ns->ucount_max[type]);
> >> +            long new = atomic_long_add_return(1, &iter->ucount[type]);
> >> +            if (new < 0 || new > max)
> >> +                    goto unwind;
> >> +            else if (iter == ucounts)
> >> +                    ret = new;
> >> +            if ((new == 1) && (get_ucounts(iter) != iter))
> >
> > get_ucounts can do put_ucounts. Are you sure it's correct to use
> > get_ucounts here?
>
> My only concern would be if we could not run inc_rlimit_get_ucounts
> would not be safe to call under rcu_read_lock().  I don't see anything
> in get_ucounts or put_ucounts that would not be safe under
> rcu_read_lock().
>
> For get_ucounts we do need to test to see if it fails.  Either by
> testing for NULL or testing to see if it does not return the expected
> ucount.
>
> Does that make sense or do you have another concern?
>
>
> >> +                    goto dec_unwind;
> >> +    }
> >> +    return ret;
> >> +dec_unwind:
> >> +    dec = atomic_long_add_return(1, &iter->ucount[type]);
> >
> > Should be -1 ?
>
> Yes it should.  I will fix and resend.

Or just atomic_long_dec_return().

> >> +    WARN_ON_ONCE(dec < 0);
> >> +unwind:
> >> +    do_dec_rlimit_put_ucounts(ucounts, iter, type);
> >> +    return LONG_MAX;
> >> +}
> >> +
> >>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
> >>  {
> >>      struct ucounts *iter;
> >> --
> >> 2.20.1
> >>
>
> Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-15 22:10                         ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
  2021-10-15 23:09                           ` Alexey Gladkov
  2021-10-16  2:08                           ` Hillf Danton
@ 2021-10-17 16:47                           ` Rune Kleveland
  2021-10-18  6:25                             ` Yu Zhao
  2 siblings, 1 reply; 13+ messages in thread
From: Rune Kleveland @ 2021-10-17 16:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yu Zhao, Alexey Gladkov, Jordan Glover, LKML, linux-mm, containers

Hi!

After applying the below patch, the 5 most problematic servers have run 
without any issues for 23 hours. That never happened before the patch on 
5.14, so the patch seems to have fixed the issue for me.

On Monday there will be more load on the servers, which caused them to 
crash faster without the patch. I will let you know if it happens again.

Best regards,
Rune

On 16/10/2021 00:10, Eric W. Biederman wrote:
> 
> In commit fda31c50292a ("signal: avoid double atomic counter
> increments for user accounting") Linus made a clever optimization to
> how rlimits and the struct user_struct.  Unfortunately that
> optimization does not work in the obvious way when moved to nested
> rlimits.  The problem is that the last decrement of the per user
> namespace per user sigpending counter might also be the last decrement
> of the sigpending counter in the parent user namespace as well.  Which
> means that simply freeing the leaf ucount in __free_sigqueue is not
> enough.
> 
> Maintain the optimization and handle the tricky cases by introducing
> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
> 
> By moving the entire optimization into functions that perform all of
> the work it becomes possible to ensure that every level is handled
> properly.
> 
> I wish we had a single user across all of the threads whose rlimit
> could be charged so we did not need this complexity.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-16  2:08                           ` Hillf Danton
@ 2021-10-16 18:00                             ` Eric W. Biederman
  0 siblings, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-16 18:00 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Rune Kleveland, Yu Zhao, Alexey Gladkov, Jordan Glover, LKML,
	linux-mm, containers

Hillf Danton <hdanton@sina.com> writes:

> On Fri, 15 Oct 2021 17:10:58 -0500 Eric W. Biederman wrote:
>> 
>> In commit fda31c50292a ("signal: avoid double atomic counter
>> increments for user accounting") Linus made a clever optimization to
>> how rlimits and the struct user_struct.  Unfortunately that
>> optimization does not work in the obvious way when moved to nested
>> rlimits.  The problem is that the last decrement of the per user
>> namespace per user sigpending counter might also be the last decrement
>> of the sigpending counter in the parent user namespace as well.  Which
>> means that simply freeing the leaf ucount in __free_sigqueue is not
>> enough.
>> 
>> Maintain the optimization and handle the tricky cases by introducing
>> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
>> 
>> By moving the entire optimization into functions that perform all of
>> the work it becomes possible to ensure that every level is handled
>> properly.
>> 
>> I wish we had a single user across all of the threads whose rlimit
>> could be charged so we did not need this complexity.
>> 
>> Cc: stable@vger.kernel.org
>> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> 
>> With a lot of help from Alex who found a way I could reproduce this
>> I believe I have found the issue.
>> 
>> Could people who are seeing this issue test and verify this solves the
>> problem for them?
>> 
>>  include/linux/user_namespace.h |  2 ++
>>  kernel/signal.c                | 25 +++++----------------
>>  kernel/ucount.c                | 41 ++++++++++++++++++++++++++++++++++
>>  3 files changed, 49 insertions(+), 19 deletions(-)
>> 
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index eb70cabe6e7f..33a4240e6a6f 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
>>  
>>  long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>>  bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
>>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
>>  
>>  static inline void set_rlimit_ucount_max(struct user_namespace *ns,
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index a3229add4455..762de58c6e76 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>>  	 */
>>  	rcu_read_lock();
>>  	ucounts = task_ucounts(t);
>> -	sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> -	switch (sigpending) {
>> -	case 1:
>> -		if (likely(get_ucounts(ucounts)))
>> -			break;
>> -		fallthrough;
>> -	case LONG_MAX:
>> -		/*
>> -		 * we need to decrease the ucount in the userns tree on any
>> -		 * failure to avoid counts leaking.
>> -		 */
>> -		dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> -		rcu_read_unlock();
>> -		return NULL;
>> -	}
>> +	sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>>  	rcu_read_unlock();
>> +	if (sigpending == LONG_MAX)
>> +		return NULL;
>>  
>>  	if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
>>  		q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
>> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>>  	}
>>  
>>  	if (unlikely(q == NULL)) {
>> -		if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
>> -			put_ucounts(ucounts);
>> +		dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>>  	} else {
>>  		INIT_LIST_HEAD(&q->list);
>>  		q->flags = sigqueue_flags;
>> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
>>  {
>>  	if (q->flags & SIGQUEUE_PREALLOC)
>>  		return;
>> -	if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
>> -		put_ucounts(q->ucounts);
>> +	if (q->ucounts) {
>> +		dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
>>  		q->ucounts = NULL;
>>  	}
>>  	kmem_cache_free(sigqueue_cachep, q);
>> diff --git a/kernel/ucount.c b/kernel/ucount.c
>> index 3b7e176cf7a2..687d77aa66bb 100644
>> --- a/kernel/ucount.c
>> +++ b/kernel/ucount.c
>> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
>>  	return (new == 0);
>>  }
>>  
>> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
>> +				struct ucounts *last, enum ucount_type type)
>> +{
>> +	struct ucounts *iter;
>> +	for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
>> +		long dec = atomic_long_add_return(-1, &iter->ucount[type]);
>> +		WARN_ON_ONCE(dec < 0);
>> +		if (dec == 0)
>> +			put_ucounts(iter);
>> +	}
>
> Given kfree in put_ucounts(), this has difficulty surviving tests like
> kasan if the put pairs with the get in the below
> inc_rlimit_get_ucounts().

I don't know if this is what you are thinking about but there is indeed
a bug in that loop caused by kfree.

The problem is that iter->ns->ucounts is read after put_ucounts.  It
just needs to be read before hand.


>> +}
>> +
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> +	do_dec_rlimit_put_ucounts(ucounts, NULL, type);
>> +}
>> +
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> +	struct ucounts *iter;
>> +	long dec, ret = 0;
>> +
>> +	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> +		long max = READ_ONCE(iter->ns->ucount_max[type]);
>> +		long new = atomic_long_add_return(1, &iter->ucount[type]);
>> +		if (new < 0 || new > max)
>> +			goto unwind;
>> +		else if (iter == ucounts)
>> +			ret = new;
>> +		if ((new == 1) && (get_ucounts(iter) != iter))
>> +			goto dec_unwind;
>
> Add a line of comment for get to ease readers.

/* you are not expected to understand this */

I think that is the classic comment from unix source.  Seriously I can't
think of any comment that will make the situation more comprehensible.


> Hillf
>
>> +	}
>> +	return ret;
>> +dec_unwind:
>> +	dec = atomic_long_add_return(1, &iter->ucount[type]);
>> +	WARN_ON_ONCE(dec < 0);
>> +unwind:
>> +	do_dec_rlimit_put_ucounts(ucounts, iter, type);
>> +	return LONG_MAX;
>> +}
>> +
>>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
>>  {
>>  	struct ucounts *iter;
>> -- 
>> 2.20.1

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-15 23:09                           ` Alexey Gladkov
@ 2021-10-16 17:34                             ` Eric W. Biederman
  2021-10-17 19:35                               ` Yu Zhao
  0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-16 17:34 UTC (permalink / raw)
  To: Alexey Gladkov
  Cc: Rune Kleveland, Yu Zhao, Jordan Glover, LKML, linux-mm, containers

Alexey Gladkov <legion@kernel.org> writes:

> On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote:
>> 
>> In commit fda31c50292a ("signal: avoid double atomic counter
>> increments for user accounting") Linus made a clever optimization to
>> how rlimits and the struct user_struct.  Unfortunately that
>> optimization does not work in the obvious way when moved to nested
>> rlimits.  The problem is that the last decrement of the per user
>> namespace per user sigpending counter might also be the last decrement
>> of the sigpending counter in the parent user namespace as well.  Which
>> means that simply freeing the leaf ucount in __free_sigqueue is not
>> enough.
>> 
>> Maintain the optimization and handle the tricky cases by introducing
>> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
>> 
>> By moving the entire optimization into functions that perform all of
>> the work it becomes possible to ensure that every level is handled
>> properly.
>> 
>> I wish we had a single user across all of the threads whose rlimit
>> could be charged so we did not need this complexity.
>> 
>> Cc: stable@vger.kernel.org
>> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> 
>> With a lot of help from Alex who found a way I could reproduce this
>> I believe I have found the issue.
>> 
>> Could people who are seeing this issue test and verify this solves the
>> problem for them?
>> 
>>  include/linux/user_namespace.h |  2 ++
>>  kernel/signal.c                | 25 +++++----------------
>>  kernel/ucount.c                | 41 ++++++++++++++++++++++++++++++++++
>>  3 files changed, 49 insertions(+), 19 deletions(-)
>> 
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index eb70cabe6e7f..33a4240e6a6f 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
>>  
>>  long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>>  bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
>>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
>>  
>>  static inline void set_rlimit_ucount_max(struct user_namespace *ns,
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index a3229add4455..762de58c6e76 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>>  	 */
>>  	rcu_read_lock();
>>  	ucounts = task_ucounts(t);
>> -	sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> -	switch (sigpending) {
>> -	case 1:
>> -		if (likely(get_ucounts(ucounts)))
>> -			break;
>> -		fallthrough;
>> -	case LONG_MAX:
>> -		/*
>> -		 * we need to decrease the ucount in the userns tree on any
>> -		 * failure to avoid counts leaking.
>> -		 */
>> -		dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> -		rcu_read_unlock();
>> -		return NULL;
>> -	}
>> +	sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>>  	rcu_read_unlock();
>> +	if (sigpending == LONG_MAX)
>> +		return NULL;
>>  
>>  	if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
>>  		q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
>> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>>  	}
>>  
>>  	if (unlikely(q == NULL)) {
>> -		if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
>> -			put_ucounts(ucounts);
>> +		dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>>  	} else {
>>  		INIT_LIST_HEAD(&q->list);
>>  		q->flags = sigqueue_flags;
>> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
>>  {
>>  	if (q->flags & SIGQUEUE_PREALLOC)
>>  		return;
>> -	if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
>> -		put_ucounts(q->ucounts);
>> +	if (q->ucounts) {
>> +		dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
>>  		q->ucounts = NULL;
>>  	}
>>  	kmem_cache_free(sigqueue_cachep, q);
>> diff --git a/kernel/ucount.c b/kernel/ucount.c
>> index 3b7e176cf7a2..687d77aa66bb 100644
>> --- a/kernel/ucount.c
>> +++ b/kernel/ucount.c
>> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
>>  	return (new == 0);
>>  }
>>  
>> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
>> +				struct ucounts *last, enum ucount_type type)
>> +{
>> +	struct ucounts *iter;
>> +	for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
>> +		long dec = atomic_long_add_return(-1, &iter->ucount[type]);
>> +		WARN_ON_ONCE(dec < 0);
>> +		if (dec == 0)
>> +			put_ucounts(iter);
>> +	}
>> +}
>> +
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> +	do_dec_rlimit_put_ucounts(ucounts, NULL, type);
>> +}
>> +
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> +	struct ucounts *iter;
>> +	long dec, ret = 0;
>> +
>> +	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> +		long max = READ_ONCE(iter->ns->ucount_max[type]);
>> +		long new = atomic_long_add_return(1, &iter->ucount[type]);
>> +		if (new < 0 || new > max)
>> +			goto unwind;
>> +		else if (iter == ucounts)
>> +			ret = new;
>> +		if ((new == 1) && (get_ucounts(iter) != iter))
>
> get_ucounts can do put_ucounts. Are you sure it's correct to use
> get_ucounts here?

My only concern would be if we could not run inc_rlimit_get_ucounts
would not be safe to call under rcu_read_lock().  I don't see anything
in get_ucounts or put_ucounts that would not be safe under
rcu_read_lock().

For get_ucounts we do need to test to see if it fails.  Either by
testing for NULL or testing to see if it does not return the expected
ucount.

Does that make sense or do you have another concern?


>> +			goto dec_unwind;
>> +	}
>> +	return ret;
>> +dec_unwind:
>> +	dec = atomic_long_add_return(1, &iter->ucount[type]);
>
> Should be -1 ?

Yes it should.  I will fix and resend.

>> +	WARN_ON_ONCE(dec < 0);
>> +unwind:
>> +	do_dec_rlimit_put_ucounts(ucounts, iter, type);
>> +	return LONG_MAX;
>> +}
>> +
>>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
>>  {
>>  	struct ucounts *iter;
>> -- 
>> 2.20.1
>> 

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-15 22:10                         ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
  2021-10-15 23:09                           ` Alexey Gladkov
@ 2021-10-16  2:08                           ` Hillf Danton
  2021-10-16 18:00                             ` Eric W. Biederman
  2021-10-17 16:47                           ` Rune Kleveland
  2 siblings, 1 reply; 13+ messages in thread
From: Hillf Danton @ 2021-10-16  2:08 UTC (permalink / raw)
  To: Eric W . Biederman
  Cc: Rune Kleveland, Yu Zhao, Alexey Gladkov, Jordan Glover, LKML,
	linux-mm, containers

On Fri, 15 Oct 2021 17:10:58 -0500 Eric W. Biederman wrote:
> 
> In commit fda31c50292a ("signal: avoid double atomic counter
> increments for user accounting") Linus made a clever optimization to
> how rlimits and the struct user_struct.  Unfortunately that
> optimization does not work in the obvious way when moved to nested
> rlimits.  The problem is that the last decrement of the per user
> namespace per user sigpending counter might also be the last decrement
> of the sigpending counter in the parent user namespace as well.  Which
> means that simply freeing the leaf ucount in __free_sigqueue is not
> enough.
> 
> Maintain the optimization and handle the tricky cases by introducing
> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
> 
> By moving the entire optimization into functions that perform all of
> the work it becomes possible to ensure that every level is handled
> properly.
> 
> I wish we had a single user across all of the threads whose rlimit
> could be charged so we did not need this complexity.
> 
> Cc: stable@vger.kernel.org
> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> With a lot of help from Alex who found a way I could reproduce this
> I believe I have found the issue.
> 
> Could people who are seeing this issue test and verify this solves the
> problem for them?
> 
>  include/linux/user_namespace.h |  2 ++
>  kernel/signal.c                | 25 +++++----------------
>  kernel/ucount.c                | 41 ++++++++++++++++++++++++++++++++++
>  3 files changed, 49 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index eb70cabe6e7f..33a4240e6a6f 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
>  
>  long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>  bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
>  
>  static inline void set_rlimit_ucount_max(struct user_namespace *ns,
> diff --git a/kernel/signal.c b/kernel/signal.c
> index a3229add4455..762de58c6e76 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>  	 */
>  	rcu_read_lock();
>  	ucounts = task_ucounts(t);
> -	sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> -	switch (sigpending) {
> -	case 1:
> -		if (likely(get_ucounts(ucounts)))
> -			break;
> -		fallthrough;
> -	case LONG_MAX:
> -		/*
> -		 * we need to decrease the ucount in the userns tree on any
> -		 * failure to avoid counts leaking.
> -		 */
> -		dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> -		rcu_read_unlock();
> -		return NULL;
> -	}
> +	sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>  	rcu_read_unlock();
> +	if (sigpending == LONG_MAX)
> +		return NULL;
>  
>  	if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
>  		q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>  	}
>  
>  	if (unlikely(q == NULL)) {
> -		if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
> -			put_ucounts(ucounts);
> +		dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>  	} else {
>  		INIT_LIST_HEAD(&q->list);
>  		q->flags = sigqueue_flags;
> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
>  {
>  	if (q->flags & SIGQUEUE_PREALLOC)
>  		return;
> -	if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
> -		put_ucounts(q->ucounts);
> +	if (q->ucounts) {
> +		dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
>  		q->ucounts = NULL;
>  	}
>  	kmem_cache_free(sigqueue_cachep, q);
> diff --git a/kernel/ucount.c b/kernel/ucount.c
> index 3b7e176cf7a2..687d77aa66bb 100644
> --- a/kernel/ucount.c
> +++ b/kernel/ucount.c
> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
>  	return (new == 0);
>  }
>  
> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
> +				struct ucounts *last, enum ucount_type type)
> +{
> +	struct ucounts *iter;
> +	for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
> +		long dec = atomic_long_add_return(-1, &iter->ucount[type]);
> +		WARN_ON_ONCE(dec < 0);
> +		if (dec == 0)
> +			put_ucounts(iter);
> +	}

Given kfree in put_ucounts(), this has difficulty surviving tests like
kasan if the put pairs with the get in the below inc_rlimit_get_ucounts().

> +}
> +
> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
> +{
> +	do_dec_rlimit_put_ucounts(ucounts, NULL, type);
> +}
> +
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
> +{
> +	struct ucounts *iter;
> +	long dec, ret = 0;
> +
> +	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> +		long max = READ_ONCE(iter->ns->ucount_max[type]);
> +		long new = atomic_long_add_return(1, &iter->ucount[type]);
> +		if (new < 0 || new > max)
> +			goto unwind;
> +		else if (iter == ucounts)
> +			ret = new;
> +		if ((new == 1) && (get_ucounts(iter) != iter))
> +			goto dec_unwind;

Add a line of comment for get to ease readers.

Hillf

> +	}
> +	return ret;
> +dec_unwind:
> +	dec = atomic_long_add_return(1, &iter->ucount[type]);
> +	WARN_ON_ONCE(dec < 0);
> +unwind:
> +	do_dec_rlimit_put_ucounts(ucounts, iter, type);
> +	return LONG_MAX;
> +}
> +
>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
>  {
>  	struct ucounts *iter;
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-15 22:10                         ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
@ 2021-10-15 23:09                           ` Alexey Gladkov
  2021-10-16 17:34                             ` Eric W. Biederman
  2021-10-16  2:08                           ` Hillf Danton
  2021-10-17 16:47                           ` Rune Kleveland
  2 siblings, 1 reply; 13+ messages in thread
From: Alexey Gladkov @ 2021-10-15 23:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rune Kleveland, Yu Zhao, Jordan Glover, LKML, linux-mm, containers

On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote:
> 
> In commit fda31c50292a ("signal: avoid double atomic counter
> increments for user accounting") Linus made a clever optimization to
> how rlimits and the struct user_struct.  Unfortunately that
> optimization does not work in the obvious way when moved to nested
> rlimits.  The problem is that the last decrement of the per user
> namespace per user sigpending counter might also be the last decrement
> of the sigpending counter in the parent user namespace as well.  Which
> means that simply freeing the leaf ucount in __free_sigqueue is not
> enough.
> 
> Maintain the optimization and handle the tricky cases by introducing
> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
> 
> By moving the entire optimization into functions that perform all of
> the work it becomes possible to ensure that every level is handled
> properly.
> 
> I wish we had a single user across all of the threads whose rlimit
> could be charged so we did not need this complexity.
> 
> Cc: stable@vger.kernel.org
> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> With a lot of help from Alex who found a way I could reproduce this
> I believe I have found the issue.
> 
> Could people who are seeing this issue test and verify this solves the
> problem for them?
> 
>  include/linux/user_namespace.h |  2 ++
>  kernel/signal.c                | 25 +++++----------------
>  kernel/ucount.c                | 41 ++++++++++++++++++++++++++++++++++
>  3 files changed, 49 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index eb70cabe6e7f..33a4240e6a6f 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
>  
>  long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>  bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
>  
>  static inline void set_rlimit_ucount_max(struct user_namespace *ns,
> diff --git a/kernel/signal.c b/kernel/signal.c
> index a3229add4455..762de58c6e76 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>  	 */
>  	rcu_read_lock();
>  	ucounts = task_ucounts(t);
> -	sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> -	switch (sigpending) {
> -	case 1:
> -		if (likely(get_ucounts(ucounts)))
> -			break;
> -		fallthrough;
> -	case LONG_MAX:
> -		/*
> -		 * we need to decrease the ucount in the userns tree on any
> -		 * failure to avoid counts leaking.
> -		 */
> -		dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> -		rcu_read_unlock();
> -		return NULL;
> -	}
> +	sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>  	rcu_read_unlock();
> +	if (sigpending == LONG_MAX)
> +		return NULL;
>  
>  	if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
>  		q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>  	}
>  
>  	if (unlikely(q == NULL)) {
> -		if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
> -			put_ucounts(ucounts);
> +		dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>  	} else {
>  		INIT_LIST_HEAD(&q->list);
>  		q->flags = sigqueue_flags;
> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
>  {
>  	if (q->flags & SIGQUEUE_PREALLOC)
>  		return;
> -	if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
> -		put_ucounts(q->ucounts);
> +	if (q->ucounts) {
> +		dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
>  		q->ucounts = NULL;
>  	}
>  	kmem_cache_free(sigqueue_cachep, q);
> diff --git a/kernel/ucount.c b/kernel/ucount.c
> index 3b7e176cf7a2..687d77aa66bb 100644
> --- a/kernel/ucount.c
> +++ b/kernel/ucount.c
> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
>  	return (new == 0);
>  }
>  
> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
> +				struct ucounts *last, enum ucount_type type)
> +{
> +	struct ucounts *iter;
> +	for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
> +		long dec = atomic_long_add_return(-1, &iter->ucount[type]);
> +		WARN_ON_ONCE(dec < 0);
> +		if (dec == 0)
> +			put_ucounts(iter);
> +	}
> +}
> +
> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
> +{
> +	do_dec_rlimit_put_ucounts(ucounts, NULL, type);
> +}
> +
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
> +{
> +	struct ucounts *iter;
> +	long dec, ret = 0;
> +
> +	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> +		long max = READ_ONCE(iter->ns->ucount_max[type]);
> +		long new = atomic_long_add_return(1, &iter->ucount[type]);
> +		if (new < 0 || new > max)
> +			goto unwind;
> +		else if (iter == ucounts)
> +			ret = new;
> +		if ((new == 1) && (get_ucounts(iter) != iter))

get_ucounts can do put_ucounts. Are you sure it's correct to use
get_ucounts here?

> +			goto dec_unwind;
> +	}
> +	return ret;
> +dec_unwind:
> +	dec = atomic_long_add_return(1, &iter->ucount[type]);

Should be -1 ?

> +	WARN_ON_ONCE(dec < 0);
> +unwind:
> +	do_dec_rlimit_put_ucounts(ucounts, iter, type);
> +	return LONG_MAX;
> +}
> +
>  bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
>  {
>  	struct ucounts *iter;
> -- 
> 2.20.1
> 

-- 
Rgrds, legion


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [CFT][PATCH] ucounts: Fix signal ucount refcounting
  2021-10-10  8:59                       ` Rune Kleveland
@ 2021-10-15 22:10                         ` Eric W. Biederman
  2021-10-15 23:09                           ` Alexey Gladkov
                                             ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-15 22:10 UTC (permalink / raw)
  To: Rune Kleveland
  Cc: Yu Zhao, Alexey Gladkov, Jordan Glover, LKML, linux-mm, containers


In commit fda31c50292a ("signal: avoid double atomic counter
increments for user accounting") Linus made a clever optimization to
how rlimits and the struct user_struct.  Unfortunately that
optimization does not work in the obvious way when moved to nested
rlimits.  The problem is that the last decrement of the per user
namespace per user sigpending counter might also be the last decrement
of the sigpending counter in the parent user namespace as well.  Which
means that simply freeing the leaf ucount in __free_sigqueue is not
enough.

Maintain the optimization and handle the tricky cases by introducing
inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.

By moving the entire optimization into functions that perform all of
the work it becomes possible to ensure that every level is handled
properly.

I wish we had a single user across all of the threads whose rlimit
could be charged so we did not need this complexity.

Cc: stable@vger.kernel.org
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

With a lot of help from Alex who found a way I could reproduce this
I believe I have found the issue.

Could people who are seeing this issue test and verify this solves the
problem for them?

 include/linux/user_namespace.h |  2 ++
 kernel/signal.c                | 25 +++++----------------
 kernel/ucount.c                | 41 ++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index eb70cabe6e7f..33a4240e6a6f 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
 
 long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
 bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
+void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
 bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
 
 static inline void set_rlimit_ucount_max(struct user_namespace *ns,
diff --git a/kernel/signal.c b/kernel/signal.c
index a3229add4455..762de58c6e76 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
 	 */
 	rcu_read_lock();
 	ucounts = task_ucounts(t);
-	sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
-	switch (sigpending) {
-	case 1:
-		if (likely(get_ucounts(ucounts)))
-			break;
-		fallthrough;
-	case LONG_MAX:
-		/*
-		 * we need to decrease the ucount in the userns tree on any
-		 * failure to avoid counts leaking.
-		 */
-		dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
-		rcu_read_unlock();
-		return NULL;
-	}
+	sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
 	rcu_read_unlock();
+	if (sigpending == LONG_MAX)
+		return NULL;
 
 	if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
 		q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
@@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
 	}
 
 	if (unlikely(q == NULL)) {
-		if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
-			put_ucounts(ucounts);
+		dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
 	} else {
 		INIT_LIST_HEAD(&q->list);
 		q->flags = sigqueue_flags;
@@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
 {
 	if (q->flags & SIGQUEUE_PREALLOC)
 		return;
-	if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
-		put_ucounts(q->ucounts);
+	if (q->ucounts) {
+		dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
 		q->ucounts = NULL;
 	}
 	kmem_cache_free(sigqueue_cachep, q);
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 3b7e176cf7a2..687d77aa66bb 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
 	return (new == 0);
 }
 
+static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
+				struct ucounts *last, enum ucount_type type)
+{
+	struct ucounts *iter;
+	for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
+		long dec = atomic_long_add_return(-1, &iter->ucount[type]);
+		WARN_ON_ONCE(dec < 0);
+		if (dec == 0)
+			put_ucounts(iter);
+	}
+}
+
+void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
+{
+	do_dec_rlimit_put_ucounts(ucounts, NULL, type);
+}
+
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
+{
+	struct ucounts *iter;
+	long dec, ret = 0;
+
+	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
+		long max = READ_ONCE(iter->ns->ucount_max[type]);
+		long new = atomic_long_add_return(1, &iter->ucount[type]);
+		if (new < 0 || new > max)
+			goto unwind;
+		else if (iter == ucounts)
+			ret = new;
+		if ((new == 1) && (get_ucounts(iter) != iter))
+			goto dec_unwind;
+	}
+	return ret;
+dec_unwind:
+	dec = atomic_long_add_return(1, &iter->ucount[type]);
+	WARN_ON_ONCE(dec < 0);
+unwind:
+	do_dec_rlimit_put_ucounts(ucounts, iter, type);
+	return LONG_MAX;
+}
+
 bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
 {
 	struct ucounts *iter;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-11-27  1:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-17 13:36 [CFT][PATCH] ucounts: Fix signal ucount refcounting kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2021-11-27  1:35 kernel test robot
2021-11-26 15:09 kernel test robot
2021-09-15 19:49 linux 5.14.3: free_user_ns causes NULL pointer dereference Jordan Glover
2021-09-15 21:02 ` Eric W. Biederman
2021-09-15 22:42   ` Jordan Glover
2021-09-15 23:47     ` Jordan Glover
2021-09-16 17:30       ` Eric W. Biederman
2021-09-28 13:40         ` Jordan Glover
2021-09-29 17:36           ` Alexey Gladkov
2021-09-29 21:39             ` Jordan Glover
2021-09-30 13:06               ` Alexey Gladkov
2021-09-30 22:27                 ` Yu Zhao
2021-10-04 17:10                   ` Eric W. Biederman
2021-10-04 17:19                     ` Eric W. Biederman
2021-10-10  8:59                       ` Rune Kleveland
2021-10-15 22:10                         ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
2021-10-15 23:09                           ` Alexey Gladkov
2021-10-16 17:34                             ` Eric W. Biederman
2021-10-17 19:35                               ` Yu Zhao
2021-10-18 15:35                                 ` Eric W. Biederman
2021-10-16  2:08                           ` Hillf Danton
2021-10-16 18:00                             ` Eric W. Biederman
2021-10-17 16:47                           ` Rune Kleveland
2021-10-18  6:25                             ` Yu Zhao
2021-10-18 10:31                               ` Jordan Glover

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.